modlwip.c does not properly return POLL_HUP and POLL_ERR socket errors #5172

t35tB0t · 2019-10-02T07:14:59Z

extmod/modlwip.c only returns POLL_HUP and POLL_ERR if the events flags specify them. AFIAK this is not POSIX compliant which specifies these return events shall be unsolicited. e.g. see http://man7.org/linux/man-pages/man2/poll.2.html

The impact is that uasyncio (which does not set POLL_HUP and POLL_ERR flags) will not see these failed socket connections and co-routines will hang and consume memory. Eventually micropython will run out of memory. Adding these flags to uasyncio is a work-around to getting the socket errors returned but is not the expected behavior of these ioctl polls. The modlwip.c changes below were tested and, work when appropriate exception handling is added to uasync (specifically in the start_server() loop). Socket errors will also now be returned the the yielding co-routined where appropriate exception handling including stream read.aclose() and write.aclose() must be called to clean up upon pre-mature connection termination.

extmod_modlwip_c.diff.txt

note: In diff, (flags,ret) renamed to (events,revents) to align with common parlance in ioctl polling

peterhinch · 2019-10-04T11:26:55Z

Compare #4290

t35tB0t · 2019-10-04T17:03:24Z

@peterhinch - Kudos to you for the finding this issue way back in #4290 and documenting better than I did here. I'm very certain that its the very same issue and fixed for STM32s with the mods made to the related modules. I have not dug into the ESP code base to understand the related behaviors. However, I'm not sure I agree with the expected error returns. Reading the related linux man page, it is pretty clear that the revents shall NOT return the callers event flags. So we shouldn't be getting POLL_IN + POLL_HUP (17) returned. If so, then the ESP stacks are also broken in a non-compliant way. Of course we can deal with that in the upper layers but it isn't a good direction to be moving in. One thing to try with the ESPs is simply adding the POLL_HUP | POLL_ERR flags to the uasyncio poll read and write calls and see how that changes your results. This is a socket ioctl API issue that may be impacting multiple targets. However, the lack of error handling in uasyncio's start_server loop seems to be a global issue. Yes, one can trap that error higher up, but crashing and restarting a server on every connection dropped before accept is not OK.

As soon as I have a break here, I will clean all the excess debug notes in the related files and post a complete working example which is AFAIK, tested and ready for merge consideration.

peterhinch · 2019-10-05T10:57:33Z

As a very general comment the testing I and @kevinkk525 performed was to determine the response to dropped connections and, critically, to WiFi dropouts. The latter are commonplace. Any updates should be tested against these conditions in addition to testing against malicious peers.

It's possible that uasncio.start_server is only used by Picoweb. We avoid it because of the fragility mentioned above.

t35tB0t · 2019-10-06T10:04:47Z

After a few tweaks - there is nothing wrong with uasyncio.start_server, it is no longer fragile.

Here's some simple testing tools crafted generate abandoned connections, reset connections etc. The sock_test.py script should be launched as a multiple subprocesses. This will generate a flood of overlapping reset HTTP connections. With randomized delays and multiple generators, all the code race conditions are eventually hit. To simulate the Wifi dropped connections, the scapy library is used with its lower level packet manipulations. The command-line launch info is commented in the scapy-cAbandon.py script. One of the options here will provide a simple SYN flood attack. But I suggest using the -v True mode for our purposes here.

scapy-cAbandon.zip
socket_test.zip

t35tB0t · 2019-10-06T10:05:46Z

Here's a complete working test case with enhanced exception handling. Note that extra poll flags are included to force modlwip.c ioctl poll to return the POLL_HUP and POLL_ERR revents just in case you haven't re-compiled a fixed modlwip.c yet. This makes it "just work" in either case but should be considered debug code and removed at some point. The test case below was run on a STM32F767. Prior testing has been performed with a STM32F429. Scripting and object coding errors were correctly trapped and reported from all layers. Socket state errors were correctly handled (AFAICT) for floods of connection abandons, resets, and simple SYN attacks. In all cases browser was able to access the demo webapp GET / info page unless the stack was "jammed" where it would then wait until the nasty connections flood subsided and then would resume. The memory stats reporting isn't high-fidelity but it does at least show that there isn't any leakage after a lengthy barrage of bad connection behaviors. Compared to where we started, this works for me and IMHO it also addresses the popular WiFi dropout issue in addition to connection resets (at least from the aspect of not chewing up memory and crashing). If we have any other connection states we need to consider, let me know.

As a sidenote - I noticed WireShark showing a browser (client) sending TCP keep-alive packets and LWIP (server) appeared to be responding correctly. I know there is an interest in using keep-alives to sustain connections. This is at least a positive indications that some of the mechanisms are in place and functioning. It may just be a matter of LWIP API to expose more of this within micropython environment?

PYBFLASH_20191006.zip

P.S. I apologise if there are any issues or offense with the copyright clauses or other header fluff - this is just text/example code.

peterhinch · 2019-10-06T11:32:28Z

I'm looking at your uasyncio changes with a view to implementing them in my fast_io fork. The only non-debug change seems to be in __init__.py in the start_server() function.

I appreciate it isn't final code but the following two lines in __init__.py seem to be redundant:

import sys

and the final pass statement in start_server.

The change seems benign: I can't see it breaking any existing code, although I appreciate it needs firmware changes to achieve its aim.

Please let me know if I'm missing anything.

I've also taken a quick look at PicoWeb. I notice that you issue

reader.aclose()

As I understand it StreamReader.aclose() is asynchronous so I'd expect to see

yield from reader.aclose()

t35tB0t · 2019-10-06T18:12:52Z

@peterhinch - correct, the sys import related to exception stack trace dumps debug code that I deleted. And, as I've been saying, the start_server() function has a race condition between its IORead(s) and s.accept that must be trapped at this layer or we end up crashing the entire server loop over spurious network behaviors. Sure, its a rare event but it certainly would be a head-scratcher for many users. Finally, the changes in PicoWeb appear simple but are a bit trickier.

And, AFAICT, you don't need to yield on closing reader polling objects. Actually, doing so in an exception will result in a dictionary key error because of how the yielding works. I wasn't that smart - I added all the yeilds and watched it fail. So, the constructs that are in place will properly handle the exceptions we were able to trigger without internal errors and without hanging the client due incomplete HTTP exchanges.

And its not perfect yet. After running all night, the enhanced exception trapping code tripped over an unusual exception type and it halted the execution. My original approach was to wrap all the errors in this middleware and blame the network but @pfalcon has raised the bar and we're trying to provide that solution.

Traceback (most recent call last):
File "main.py", line 18, in
File "web.py", line 28, in run
File "picoweb/init.py", line 309, in run
File "uasyncio/core.py", line 109, in run_forever
File "picoweb/init.py", line 210, in _handle
IndexError: tuple index out of range
MicroPython v1.11-292-g5641aa55d-dirty on 2019-09-25; NUCLEO-F767ZI with STM32F767
Type "help()" for more information.`

The offending PicoWeb Line 210 in the demo code is:
if not uerrno.errorcode.get(e.args[0],False):

I was just being curt when I wrote that line last night. So its vulnerable to e being an empty tuple. I suppose we need to beef this up a bit. Keep in mind that, when we simply pass on all errors here, we never crash. So this can't be a major resource issue and may very well just be another socket triggered error type that needs to be added to the exception handler here. I have an idea of how to beef this up and am running it now...

t35tB0t · 2019-10-06T18:37:44Z

UPDATE: Made this modification in PicoWeb (and similar in e.args length trap in uasyncio start_server). This should show us what rare event is throwing the error with zero length e.args tuple. It may take several hours or all day to trip this error. I'll post the results when that happens...

    except Exception as e:
        if len(e.args)==0:
            # Handle exception as an unexpected unknonwn error: collect details here then close try to continue running
            sys.print_exception(e)
            reader.aclose()
            yield from writer.aclose()
            raise
        elif not uerrno.errorcode.get(e.args[0],False):
            # Handle exception as internal error: close and terminate handler (user must trap or crash)
            reader.aclose()
            yield from writer.aclose()
            raise
        else:
            # Handle exception as bad socket state: close and continue
            close=True
            if self.debug:
                self.log.exc(e, "%.3f %s %s %r" % (utime.time(), req, writer, e))

t35tB0t · 2019-10-06T21:45:27Z

UPDATE: Now that we're discriminating on errors down in these modules, we're uncovering more structural bugs in uasyncio. The rare error I mentioned is in uasyncio StreamReader.readline() function where an IORead(s) followed by a none return on the socket readline results in an assertion failure.

peterhinch · 2019-10-07T03:50:28Z

Apologies if I'm missing something but I still think issuing

reader.aclose()

will never close the socket: it will simply instantiate a generator object. The code is:

    def aclose(self):
        yield IOReadDone(self.polls)
        self.ios.close()

Consider the following:

def foo():
    yield 1
    print('foo')

If I paste this at the REPL and issue

>>> foo()
<generator object 'foo' at 7f9b346d69a0>
>>>

it does not print anything, but merely returns a generator instance.

t35tB0t · 2019-10-07T05:06:41Z

Understood that it seems odd. But the latest code zip posted has been running without errors for many hours with no apparent memory leaks. And, while the test is running, poking at the server from a browser shows healthy memory stats reported back as a web page. So, my dumb answer is "well, it works so...." But yes, we need to parse thru the code and truly understand what's going on here. If you make it a generator up here with a yield in front of the IORead(), it will mess with the object map and this results in a "key missing error". Feel free to insert the yield and watch it fail. But without the read.aclose, the HTTP client gets hung waiting for sesssion to complete. The bare reader.aclose() made the server more polite. I haven't bothered to weed thru the spaghetti to pin down the exact statement . Right now, I'm looking at boring screens with no reports after hours of running - no reported unusual exceptions, no obvious issues. So, for tonight's entertainment, I've hacked some test code into start_server to show us all the possible exceptions that are being returned when doing a reader.readline() from a stack/socket connection that is being abused by the wire. The "assert res is not none" error being thrown is bothering me and I can't trigger it again so this test code should make it present itself easier. And why would the e.args tuple be coming back "empty"? It may be simpler to dig into the c-code and just read it. Curses to @pfalcon for raising the bar on exception handling in the middleware :-) No worries, with multiple folks looking at this now we'll make it right and proper. micropython/pycopy/fast_io will be all the better for it, IMHO.

peterhinch · 2019-10-07T05:59:14Z

I can't see that the line

reader.aclose()

is doing anything useful. It's instantiating a generator and discarding it. Commenting the line out would prove the point one way or the other.

t35tB0t · 2019-10-07T09:28:08Z

@peterhinch ...because you're correct. I had a prior test case where the web browser was hanging waiting for the server to finish. Adding these appeared to change the behavior and the remote client was able to happily complete its page rendering. Its likely that it they just slowed micropython down with enough of a delay for more bytes to get out on the wire before the writer.aclose killed the outbound packet in LWIP. I'll have to build back up to that more complex case and see if we can re-create it. Meanwhile, I've commented out the useless reader.aclose()s but will keep in mind that we may want to add a brief delay in the exception handler or more directly check status to permit awrite()en data to get out onto the wire before the socket evaporates. That's the best I've got on that at this time.

The zip'd project below is where I've ended up after a more careful trace of the exceptions thrown at the various levels and locations. The illusive e.args empty tuple error was only seen twice and I couldn't determine where it may have come from. So, I've added some traps (and made it non-fatal) so we get more details when it ever happens again. It doubt we're 100% done with this yet but its running annoyingly crash-free at the moment which may sound odd but is not.

Note: My primary long-run test at the moment is the socket reset because its pretty straight-forward. With the default STM32 socket limit of qty 5, I'm running three subprocesses of the socket_test randomly reset HTTP GETs. I tried adding more data after the HTTP header in the request but that didn't do anything interesting (ie no errors thrown and only slowed the connection rate down)

PYBFLASH_20191007.zip

(this now has more substantial changes plus some added cosmetic alterations for readabilty)

peterhinch · 2019-10-07T10:33:12Z

I've looked at the uasyncio changes.
Re core.py I'd be surprised if there's a significant bug in core.py: it has been extensively tested. The added int() statements seem to do nothing as the quantities to which they apply are already integers:

                delay = int(time.ticks_diff(t, tnow))

and

l = int(len(self.runq))

Does the added

except Exception as e:

ever get called? I appreciate this may be debug code but if it gets called I would suspect that the underlying cause is an error in __init__.py. If this code is to be permanent, we need to be sure it won't affect normal exception handling in user coroutines.

Re __init__.py You're clearly making good progress in addressing the socket handling.

In start_server you have:

        except Exception as e:
            if len(e.args)==0:

Are you getting exceptions without args? If so I think we need to find out where these are coming from as they may be indicative of a MicroPython bug. My understanding is that Python exceptions always have an args tuple with at least one element. Unless there is a legitimate case of which I'm unaware?

Your zipfiles don't include your changes to extmod/modlwip.c. I'm not the best person to review this, but maybe someone with appropriate experience would take a look. When you have a version which you regard as final I could test for breakages to my existing libraries.

t35tB0t · 2019-10-07T16:38:21Z

@peterhinch - Yes this is intended to be and example to explain in code what I did do well with words. I'm not trying to contribute to the actual code but to help find critical issues and fixes. Turning this into prose is ideally left to the core team here and is probably overall most effective use of our time since style and opinions can slow things down. Accordingly, I'm not submitting my heavily rewritten version. I agree, I don't believe any required changes were made to core.py. * The use of the single letter lower case "L" for variable names was changed in my copy to avoid it looking like number "1" in certain fonts. Just no. * The int() wrapper is left-over from something else - I forgot to delete it. Should remove it. * The fatal exception trap highlighted with ### is the one and only suggested non-bug fix in core.py. This clause is there to help me find really some of bad errors in co-routines written later. The idea (for myself) is to add a "decoration" to the stack trace that will be printed out further up the stack. This is so I know that a co-routine, being processed by the core.py run_forever loop, caught an exception. It could be prefaced with a log.debug(). But, I don't really want the spam of all the other debug messages. So perhaps another log level? Or delete it. I'll keep in in my own for use when integrating a lot of co-routines. There are many other log.debug messages in uasyncio that are, frankly, a bit too much chatter and seem to be suited to use by the original author (ie - need to be heavily pruned). Also, if they are to remain, then we'd need a lot more, because all code paths don't have the same level of debug messaging in all similar functions... But, there's my style statement creeping in- so I'll stay out of that... its not my repository; not my prose. * Yes, several times yesterday, the test bench threw errors in clauses that tried to determine the length of e.args() - stating that the tuple was none (ie empty). I was trying to home in on the elusive bug but with the code and timing alterations, it hasn't recurred. This is part of the reason I've rigged all those exception trip-wires and print statements. Unfortunately, the exception thrown by trying to len(e.args) masked the prior exception. Agreed that micropython itself may have erroneously tossed this at us and may do so again. Since its so rare and needs to be reported, perhaps we should just make it fatal and make the printed message clear about that? * The modlwip.c diff has been posted as an issue on both pycopy and micropython as properly as I could and as you suggested under micropython as a modlwip.c issue. I've compiled/tested the altered version for STM32F429 and F767 with the desired results. I did change the names of the variables (flags, res) to (events, revents) to match the Linux man pages and other similar systems. I prefer to have strong name/word connections so skimming docs is quicker. The best person for looking at modlwip.c is Damien. I'm not sure why he hasn't picked it the issue with the diff and a ribbon on it? Nevertheless, the changes are trivial. Because it is an LWIP api and and a critical ioctl, I walked through the logic line by line (offline) with someone else before compiling. I'm satisfied that the changes there are sufficiently tested and are merge-worthy, just need the author's blessing. * In my parallel but different version, I've added explicit gc.collection in the start_server loop. This seems to be the most efficient place to put an explicit collection. The entire memory fragmentation issue with micropython is on my list of major usefulness problems so this is some initial pokes to characterize some things. Later, I'll have a look at the memory issues with micropython. The impact on system robustness is severe. I've seen many of the prior discussions and understand that we can't pork up micropython with a lot of memory tracking structures or scrubbing code - not in these tiny IoT platforms. But we need practical solution. I may have to implement a stability improvement for my current application project. But I'll save discussing my ideas for memory stabilization for a later time and different thread. Just another puzzle to solve. * FYI - the zip example ran all night while being loaded with a simultaneous beating from three subprocessed reset attacks and a barrage of scapy socket abandons. I added code to blink the red led for errors and the green led for successful web pages served. With those, its easy to see when the stack is "jammed" and no sockets are available (they'll eventually timeout). Even with all that, I could still (with effort) get a browser to find a hole to get out the memory stats. So, with clean up and promoting of prints to log messages (or delete them), and with a bit more review, I think the uasyncio and PicoWeb mods are also close. I'd think @pfalcon will want to study the impact of the exception handling or at least exception sensing in the middleware to make sure it fits within the goals of pycopy project. * In my pocket-copy of PicoWeb, I added a caller option in picoweb.run(...., run_loop=True). This defaults to the published behavior but it let's me turn off the last ~four lines where the task loop is created. In my uses I have my own tasker.py module that auto-registers tasks from a yaml config file and the collection of picoweb/web-apps are just one of the tasks. I don't know if @pfalcon wants to go that way with his PicoWeb so I didn't push that tweak here. * I can't say this enough - it surprises me that so few seem to realize the power of uasyncio and PicoWeb combined on top of micropython/pycopy. And I'm thankful for the author and those who supported bringing these to the mpy world. These two modules make a micropython IoT device many times more useful. I've written similar-effect code for other applications of mpy but these modules make it far easier to craft co-routined applications. Future work on these could include speed-ups, dense/smaller code, better memory usage - or even moving the core.py into c-code (since a lot of execution time is spent in run_forever). That would be huge in many ways.

peterhinch · 2019-10-08T04:50:24Z

Thank you for that excellent summary. It would be good to know where in the firmware that strange exception originates, but I appreciate you may not have time to follow this up. Here is my take on what should happen next.

modlwip.c

Regarding the lack of official response to #5172 I know that @dpgeorge and @jimmo are concerned about this and I'm sure they would welcome a PR.

Submitting a PR is the way to get your code reviewed and implemented. I suggest reading the code conventions guide as Damien is keen to maintain consistency. If your changes are minor compliance will be easy.

uasyncio

I will implement the error trapping in my fast_io uasyncio fork.

I'm not sure what to suggest for official micropython-lib. We could submit a PR, but in the past these tended to be ignored. I suggest we await the response to a modlwip PR: on implementation we can see if a uasyncio PR would be welcome and we can raise it. I'd be happy to do this if you'd rather - but only if it seemed likely that it would get attention from the maintainers.

picoweb and pycopy

Paul is the owner/maintainer. Hopefully he will respond to PR's for picoweb, modlwip and uasyncio. It's entirely up to you how or if you pursue this. My interest is solely with official MicroPython.

t35tB0t · 2019-10-08T06:03:41Z

@peterhinch - Thank you for the link to the code conventions guide. I'll review and try to adjust accordingly. Not sure if I'll get to a PR soon on this due to project deadlines. We'll see... Keep in mind that these changes probably should be all at once or with modlwip.c last. Otherwise, activating unsolicited HUPs and ERRs will break unprotected uasyncio and PicoWebs. Adding the extra event flags in the uasyncio poller ioctl calls is a benign workaround to decouple the changes required and it could be commented for removal after modlwip.c is fixed. I'm hoping to get these into all three repositories. Actually, I have never submitted a PR - previously, @dpgeorge worked the patches with me testing. Your advise is well taken, unless you wish to do the PRs yourself, I'll give submitting via PRs a n00b shot. IMHO, it doesn't matter much to me who writes ths code - it matters more who's using it. ;-P P.S. I'm curious about your fast_io branch. Sorry, I didn't include it in the issues postings. It appears you're working the asyncio and other issues that are very interesting and useful.

jimmo · 2019-10-08T06:07:23Z

@t35tB0t As I mentioned on micropython/micropython-lib#353 (comment) I'm happy to do the work here to make this a PR and get it submitted. Thanks for posting the diff, I think I can take if from here if you want?

t35tB0t · 2019-10-08T06:14:28Z

@jimmo - Certainly yes! I was waiting until we had a clear path and solid mods before bothering you. Now we do - and there you are. I probably should get to doing PRs myself but maybe not this month (onerous looming project deadline). As you've said, this needs to be coordinated with each repository owner(s). And I'm happy to have the help. Tell me if you need anything more that we have here. The code is stable but, as @peterhinch has indicated, will need some debug code trimming plus some conventions editing. I will gladly test the final version(s) as best I can here.

peterhinch · 2019-10-08T10:41:24Z

I've started work on fast_io and have some questions of detail re __init__.py:

In StreamReader.readexactly you replaced self.ios.read(n) with self.ios.readline(n). In general data read by readexactly may not be line-oriented, especially as stream devices may not be sockets. What is the reason for this change?
You have calls to log.DEBUG. On a Pyboard a logger instance has a debug method but no DEBUG. I have used the lowercase method.
In the event of an exception your start_server method doesn't close the sockets. Paul's code does. I currently have the following, adapted from Paul's code and yours. Please let me know your thoughts.

    # Code omitted
    s.listen(backlog)
    try:
        while True:
            try:
                if DEBUG and __debug__:
                    log.debug("start_server: Before accept")
                yield IORead(s)
                if DEBUG and __debug__:
                    log.debug("start_server: After iowait")
                s2, client_addr = s.accept()
                s2.setblocking(False)
                if DEBUG and __debug__:
                    log.debug("start_server: After accept: %s", s2)
                extra = {"peername": client_addr}
                # Detach the client_coro: put it on runq
                yield client_coro(StreamReader(s2), StreamWriter(s2, extra))
                s2 = None  # From Paul's code.

            except Exception as e:
                if len(e.args)==0:
                    # This happens but shouldn't. Firmware bug?
                    # Handle exception as an unexpected unknown error:
                    # collect details here then close try to continue running
                    print('start_server:Unknown error: continuing')
                    sys.print_exception(e)
                if not uerrno.errorcode.get(e.args[0], False):
                    # Handle exception as internal error: close and terminate
                    # handler (user must trap or crash)
                    print('start_server:Unexpected error: terminating')
                    raise
    finally:  # From Paul's code
        if s2:
            s2.close()
        s.close()

As discussed your start_server exception handler implies you're getting odd errors which may indicate a firmware issue. Do you think it would help the developers to post a log of what you're getting?

[EDIT]
I have posted code here.
I appreciate you may not have time to look at this but any comments would be welcome. core.py is unchanged (from my fast_io version) apart from the version number. Changes to __init__.py are as discussed. I have done basic testing with a UART and with a version of PicoWeb.

t35tB0t · 2019-10-09T07:40:10Z

reply: 1. The StreamReader.readexactly() was not tested - the change from read(n) to readline(n) was a 4AM coding error - the readline() elsewhere was changed to read() for a socket behavior test but the manual reversion landed in the wrong place. Sorry for being sloppy. I shouldn't code when I'm drifting in and out of asleep... No - that's OK. I just shouldn't impose that upon you. 2. The log DEBUG calls are not related to the needed fixes - we'll want to change these to conform with the libraries as needed. 3. An exception in start_server is probably only happening before the socket is opened/accepted. So it wouldn't need to be closed. That's probably why the testing doesn't crash. But this relies on the callback functions properly closing the sockets so and extra-sure close here is a good idea as long as it doesn't throw an untended exception. The empty tuple e.args was actually thrown twice in picoweb:handle(). But I haven't been able to re-create it. And the actual exception stack trace was over-written by the error thrown when trying to access e.args[0] in the exception handler. All further attempts to trap this error failed to catch this illusive bug. If I ever have something more definitive, I'll post.

peterhinch · 2019-10-09T09:21:51Z

Looking at the docs for poll.register I see

Note that flags like uselect.POLLHUP and uselect.POLLERR are not valid as input eventmask (these are unsolicited events which will be returned from poll() regardless of whether they are asked for). This semantics is per POSIX.

It looks like we shouldn't be explicitly registering this eventmask.

Can I suggest that you just register select.POLLIN and select.POLLOUT as per the original code and verify that POLLUP and POLLERR still work?

t35tB0t · 2019-10-09T09:35:28Z

RE: Can I suggest that you just register |select.POLLIN| and |select.POLLOUT| as per the original code and verify that |POLLUP| and |POLLERR| still work? Well... that's the issue isn't it? modlwip.c as-is absolutely will not ever return unsolicited errors unless you stuff the HUP and ERR flags into events input. The reason I included the flags in the demo code is so that folks can test the fixe to uasyncio without modifying modlwip.c. I've run it both ways. Now that my modlwip.c is fixed, uasyncio works the same either way. But, certainly yes, once the PR to modlwip.c has been merged, then uasyncio shouldn't have those extra flags.

t35tB0t · 2019-10-09T09:40:55Z

To clarify: My original point and basis for *all* of these mods is that modlwip in main is not POSIX compliant. It will never return unsolicited HUPs and ERRs because of incorrect masking. The work-around to the non POSIX modlwip is a non POSIX uasyncio. In this case, two wrongs make it right and there really is no adverse effect whatsoever. For uasycnio to not be broken and still be POSIX compliant, modlwip must be fixed. This is the fundamental point I've been trying to broadcast. I'm trying to think of any good reason to leave modlwip as non-POSIX and use it sometimes with and sometimes without error return events... I can't.

peterhinch · 2019-10-09T11:39:58Z

OK, thanks for the clarification. I think you are absolutely right about modlwip.

t35tB0t · 2019-10-13T20:59:48Z

@jimmo - Do you think you'll be able get some attention on the recommended modlwip.c modifications and having them merged into micropython and pycopy? @peterhinch's fast_io includes a functional work-around which we'll be testing and using. Kudos to @peterhinch for adapting the debugging notes here and adopting this in the fast_io module. However, due the severity of these bugs, the community here would benefit from having both uasyncio and modlwip.c formally fixed.

peterhinch · 2019-10-14T08:46:32Z

Thank you for the vote of confidence :) I agree with the need for action on official MicroPython.

You may misunderstand the situation with pycopy which is an unofficial fork of MicroPython. Paul Sokolovsky (@pfalcon) is the sole maintainer of pycopy and its associated library. The best way to get that fixed is to submit PR's/issues yourself.

POSIX poll should always return POLLERR and POLLHUP in revents, regardless of whether they were requested in the input events flags. See issues micropython#4290 and micropython#5172.

dpgeorge · 2019-10-16T07:38:56Z

Thanks @t35tB0t and @peterhinch for the detailed report and discussion.

extmod/modlwip.c only returns POLL_HUP and POLL_ERR if the events flags specify them. AFIAK this is not POSIX compliant which specifies these return events shall be unsolicited.

I agree, it should be POSIX compliant. See #5222 for an attempted fix which is slightly different to the one submitted by @t35tB0t .

POSIX poll should always return POLLERR and POLLHUP in revents, regardless of whether they were requested in the input events flags. See issues #4290 and #5172.

dpgeorge · 2019-10-31T02:48:21Z

I'll close this issue because the original bug is fixed: POLLHUP and POLLERR (and now POLLNVAL) are unconditionally returned in socket polling. See 71401d5, feaa251 and 26d8fd2

dpgeorge · 2019-10-31T02:51:09Z

Note that current uasyncio may actually handle the case of unsolicited POLLERR/POLLHUP being returned from poll because the unix port already has this behaviour.

t35tB0t · 2019-11-01T05:29:03Z

@dpgeorge - Many thanks for diverting your attention to the minor but important modlwip.c issues and merging the code changes into main. As a caution to users... please be aware that uasyncio is now vulnerable to unsolicited POLLERR/POLLHUPs being returned from poll because it is lacking exception handling in critical sections (specifically the start_server loop). Some of the suggested modifications have been incorporated by @peterhinch into the fast_io version of asyncio. I have been running a self-modified version of uasyncio with the latest modlwip.c with good results. There are similar issues with PicoWeb and possible other modules that are using uasyncio; all will be affected by the modlwip change which now promiscuously reports unsolicited socket errors. The test scripts I've posted here, with the randomly varying delays are useful to expose vulnerabilities up through the exception stack into added modules and user code.

POSIX poll should always return POLLERR and POLLHUP in revents, regardless of whether they were requested in the input events flags. See issues micropython#4290 and micropython#5172.

…-08-18 update tinyusb

t35tB0t mentioned this issue Oct 4, 2019

Memory Leak in uasyncio micropython/micropython-lib#353

Open

dpgeorge mentioned this issue Oct 16, 2019

WIP: fix lwip polling to unconditionally return POLLHUP/ERR/NVAL #5222

Closed

dpgeorge closed this as completed Oct 31, 2019

jimmo mentioned this issue Nov 6, 2019

unix/moduselect: poll.unregister: Add optional throw=True param. #4217

Closed

dpgeorge mentioned this issue Nov 15, 2019

extmod: add new implementation of uasyncio #5332

Merged

tannewt pushed a commit to tannewt/circuitpython that referenced this issue Aug 19, 2021

Merge pull request micropython#5172 from dhalbert/tinyusb-update-2021…

0569c20

…-08-18 update tinyusb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

modlwip.c does not properly return POLL_HUP and POLL_ERR socket errors #5172

modlwip.c does not properly return POLL_HUP and POLL_ERR socket errors #5172

t35tB0t commented Oct 2, 2019 •

edited

Loading

peterhinch commented Oct 4, 2019

t35tB0t commented Oct 4, 2019

peterhinch commented Oct 5, 2019

t35tB0t commented Oct 6, 2019

t35tB0t commented Oct 6, 2019

peterhinch commented Oct 6, 2019

t35tB0t commented Oct 6, 2019

t35tB0t commented Oct 6, 2019 •

edited

Loading

t35tB0t commented Oct 6, 2019 via email •

edited

Loading

peterhinch commented Oct 7, 2019

t35tB0t commented Oct 7, 2019 via email

peterhinch commented Oct 7, 2019

t35tB0t commented Oct 7, 2019

peterhinch commented Oct 7, 2019 •

edited

Loading

t35tB0t commented Oct 7, 2019 via email

peterhinch commented Oct 8, 2019

t35tB0t commented Oct 8, 2019 via email

jimmo commented Oct 8, 2019

t35tB0t commented Oct 8, 2019 via email

peterhinch commented Oct 8, 2019 •

edited

Loading

t35tB0t commented Oct 9, 2019 via email

peterhinch commented Oct 9, 2019

t35tB0t commented Oct 9, 2019 via email

t35tB0t commented Oct 9, 2019 via email

peterhinch commented Oct 9, 2019

t35tB0t commented Oct 13, 2019 via email

peterhinch commented Oct 14, 2019

dpgeorge commented Oct 16, 2019

dpgeorge commented Oct 31, 2019

dpgeorge commented Oct 31, 2019

t35tB0t commented Nov 1, 2019 via email

modlwip.c does not properly return POLL_HUP and POLL_ERR socket errors #5172

modlwip.c does not properly return POLL_HUP and POLL_ERR socket errors #5172

Comments

t35tB0t commented Oct 2, 2019 • edited Loading

peterhinch commented Oct 4, 2019

t35tB0t commented Oct 4, 2019

peterhinch commented Oct 5, 2019

t35tB0t commented Oct 6, 2019

t35tB0t commented Oct 6, 2019

peterhinch commented Oct 6, 2019

t35tB0t commented Oct 6, 2019

t35tB0t commented Oct 6, 2019 • edited Loading

t35tB0t commented Oct 6, 2019 via email • edited Loading

peterhinch commented Oct 7, 2019

t35tB0t commented Oct 7, 2019 via email

peterhinch commented Oct 7, 2019

t35tB0t commented Oct 7, 2019

peterhinch commented Oct 7, 2019 • edited Loading

t35tB0t commented Oct 7, 2019 via email

peterhinch commented Oct 8, 2019

modlwip.c

uasyncio

picoweb and pycopy

t35tB0t commented Oct 8, 2019 via email

jimmo commented Oct 8, 2019

t35tB0t commented Oct 8, 2019 via email

peterhinch commented Oct 8, 2019 • edited Loading

t35tB0t commented Oct 9, 2019 via email

peterhinch commented Oct 9, 2019

t35tB0t commented Oct 9, 2019 via email

t35tB0t commented Oct 9, 2019 via email

peterhinch commented Oct 9, 2019

t35tB0t commented Oct 13, 2019 via email

peterhinch commented Oct 14, 2019

dpgeorge commented Oct 16, 2019

dpgeorge commented Oct 31, 2019

dpgeorge commented Oct 31, 2019

t35tB0t commented Nov 1, 2019 via email

t35tB0t commented Oct 2, 2019 •

edited

Loading

t35tB0t commented Oct 6, 2019 •

edited

Loading

t35tB0t commented Oct 6, 2019 via email •

edited

Loading

peterhinch commented Oct 7, 2019 •

edited

Loading

peterhinch commented Oct 8, 2019 •

edited

Loading