WIP: fix lwip polling to unconditionally return POLLHUP/ERR/NVAL #5222

dpgeorge · 2019-10-16T06:36:46Z

POSIX poll should always return POLLERR and POLLHUP in revents, regardless of whether they were requested in the input events flags. See issues micropython#4290 and micropython#5172.

dpgeorge · 2019-10-16T07:43:25Z

The hard part with this is to find tests that test the error paths. I've found 2 such tests (submitted here) for 2 of the cases returning HUP/ERR/NVAL (poll on a new socket, and poll after socket closure). A third WIP test is added for another case (peer did a RST). And there's still a 4th case which I did not fix yet because I couldn't find a test for it (other error not within one of the previous 3 classes).

Note that this PR also adds POLLNVAL internally to get compliance with POSIX behaviour.

dpgeorge · 2019-10-16T07:44:14Z

@t35tB0t it would be great if you could test this.

t35tB0t · 2019-10-17T08:12:41Z

@dpgeorge - thanks for opening the PR. I'll test this shortly. What are the test cases/error paths we're looking to exercise here? 1) STATE_NEW: ret |= MP_STREAM_POLL_HUP 2) STATE_PEER_CLOSED: ret |= flags & (MP_STREAM_POLL_RD | MP_STREAM_POLL_WR) 3) ERR_RST: ret |= flags & (MP_STREAM_POLL_RD | MP_STREAM_POLL_WR)) | MP_STREAM_POLL_HUP 4) _ERR_BADF: ret |= MP_STREAM_POLL_NVAL 5) socket->state < 0: ret |= flags & MP_STREAM_POLL_ERR Notes: I'm not so sure that condition #2 or #5 will return an error!?! I should easily be able to test for condition #'s: 1,2,3,4. What type of socket states or conditions would result in condition #5?

dpgeorge · 2019-10-17T12:12:30Z

@t35tB0t yes there are 5 paths to test. I tried to write minimal tests for these but it's not easy. It'd be great if you could try to test them in whatever way you have available.

This is easy, the modified tests/net_hosted/connect_poll.py in this PR tests this path.
As you say, this currently will not return HUP/ERR/NVAL, and probably it shouldn't, although if you can find a test that should (eg by using CPython, or unix MicroPython) then that would be valuable (but I wouldn't waste time on it).
I'm confident this part is now correct but I couldn't find a simple and reliable test for it, to show it's correct; tests/net_hosted/poll_errors.py in this PR is my attempt at a test.
This is a new else-if path that I added specifically for the case of EBADF, and the test is easy, see modification to tests/extmod/uselect_poll_basic.py in this PR. Probably this case doesn't affect the issues you were seeing.
I didn't fix this path (make it return POLLERR unconditionally) because I couldn't find a way to trigger this path with a test. If you have any ideas, please let us know!

t35tB0t · 2019-10-18T06:25:15Z

@dpgeorge - Agreed that condition #2 shouldn't return as an error because subsequent read or write will throw an appropriate exception. However, not returning an err on condition #5 is concerning. The clause as stated in line 1485 will mask out the MP_STREAM_POLL_RD or WR which was set up in line #1448. I am concerned that this could result in a hung condition where user code is waiting for a socket to be readable or writable and yet the socket.state<0 error condition keeps clearing out the RD or WR flags. IMHO, not returning these errors unsolicited has been resulting in user code hangs. As-is, condition #5 can is the last remaining condition that can hang user code. It might be safer to either return the error condition and let user exception handlers trap it (which they must now anyhow due to other unsolicited error returns) or mask out this error state entirely. In order to set up a bench condition which triggers a socket.state<0 condition, we'll have to better understand what code paths can set a socket to such a state. If it isn't ever supposed to happen, then it would be a very bad thing - IMHO we still should trap it here and report it up the call stack.

t35tB0t · 2019-10-18T06:46:47Z

@dpgeorge - re: socket reset to trigger PULLHUP error in modlwip. Testing from a remote host with the SO_LINGER socket option made it easy to generate socket reset errors in LWIP. The key is to get the test device to wait between its yielded async.IOread() and a socket.accept() or socket.read(). The the connecting client will have sufficient time from SYN to generate its RST. If we want a more specific and stripped down regression test here, then we'll want a pair of client and server scripts with simple loops. The test device can use a fat fixed delay between a yielded IOread() and socket.accept() or socket read(). And the packet generator host can have a short delay between its socket open (SYN) and socket reset. What will need to be validated is the actual behavior of the test host on the wire. Getting the socket to reset vs perform a friendly close when the user code drops the socket object may be dependent on the system used. Note that this uses the SO_LINGER socket option such that the socket.close() generates a RST packet on the wire. An alternative approach is SCAPY with a simple SYN/SYN-ACK/ACK/{delay}/RST sequence. Doing this kind of attack from a micropython device requires a method of avoiding socket close. We've discussed this previously regarding socket.abort(). I did a test patch to mPy which exposed a socket.abort() LWIP method. That worked handily. Generating a RST may be similarly easy. With both abort and reset exposed, this kind of in-kind testing would be much easier. However, you may not want to add features that are primarily useful. Enabling and exposing the SO_LINGER socket option would be another option. #>> import subprocess #>> proc = subprocess.Popen("python socket_test.py") #>> proc = subprocess.Popen("python socket_test.py") #>> proc = subprocess.Popen("python socket_test.py") # # Nasty HTTP Client which simply RESETs the connection some random time after sending GET request # # Running several concurrently increases the stress on LWIP # Running many concurrently tests backlog overload and socket timeout/recovery # def cRandomReset(host, port,min,max,iterations): from random import randint for i in range(iterations): try: s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0) s.connect((host, port)) lgr_onoff = 1 lgr_linger = 0 s.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER,struct.pack('ii', lgr_onoff, lgr_linger)) s.send(b'GET / HTTP/1.0\r\n\r\n') except Exception as e: print('Pass: ',i,e) finally: time.sleep(randint(min,max)/1000) s.close() # the close will actually issue a socket RESET since we've set the linger option pass cRandomReset('10.0.0.5',80,0,1000,1000000)

dpgeorge · 2019-10-18T08:37:56Z

However, not returning an err on condition #5 is concerning.
...
It might be safer to either return the error condition
...
If it isn't ever supposed to happen, then it would be a very bad thing - IMHO we still should trap it here and report it up the call stack.

Yes I agree that (5) should be fixed as well (to unconditionally return an error), and I'm happy to fix it even without having a test for it. It would just be nice to find such a test if possible (although not critical to move forward with this fix).

t35tB0t · 2019-10-19T08:30:15Z

@dpgeorge - Testing confirms unsolicited errors are returned when remote connections are abandoned or reset. It is important to note that unsolicited errors must now be handled in user code (this includes uasyncio and other modules). Specifically, existing modules have race conditions and missing exception handling which will now require fixing. Previously, these modules were prone to hanging when the socket errors were not returned by modlwip. This PR fixes the hanging and (with proper error handling), user apps can now tolerate fairly poor networking connections.

The POLLHUP response when socket is reset was tested with the server (mPy PR5222 on STM32) running the server script below. The remote client was running a variable delay connection open/close using the SO_LINGER socket option (see pollhup_client.py). We are looking for a simple way to stimulate the fifth error condition (negative state) in modlwip. In the meantime, it seems prudent to modify the fifth clause in modlwip. Providing complete condition coverage will prevent applications from hanging under any socket error state.

pollhup_server.zip
pollhup_client.zip

t35tB0t · 2019-10-20T09:33:16Z

For the modlwip.c testing, abandoned connections are simulate by stalling the client script with long sleeps to simulate a connection abandon. The following scripts exercise modlwip.c unsolicited exception handling with present coverage including RESET, None read (ABANDON), and empty read (EOF). These demonstrate a lot of what's going on with the modlwip.c poller calls and related exceptions returned. It would be nice to have the lwip socket.abandon() method exposed to mPy somehow. It would also be nice to have access socket.state() (what I did in the scripts was to get a the state was too awkward). run the clientConn.py on a unit mPy port. run the serverConn.py on s suitable device... [sandbox.zip](https://github.com/micropython/micropython/files/3747674/sandbox.zip) EXAMPLE OUTPUT ON SERVER SIDE: Wait for connection...socket state=1 Accepting connection..socket state=1 Waiting for data......socket state=3 Data received.........socket state=3 Reading data line.....socket state=3, data: b'TERMINATE BY CLOSE\r\n' Reading data line.....socket state=3, data: b'DATA LINE 2\r\n' Reading data line.....socket state=3, data: b'DATA LINE 3\r\n' Reading data line.....socket state=3, data: b'\r\n' Reading data line.....socket state=3, data: b'fill-1 fill-2 fill-3 fill-4(\r\n' Reading data line.....socket state=3, data: b'This is not complete until...' Reading data line.....socket state=3, data: None (PENDING CONNECTION ABANDON) Reading data line.....socket state=3, data: None (PENDING CONNECTION ABANDON) Reading data line.....socket state=3, data: b'(...continued with CRLF here->\r\n' Reading data line.....socket state=4, data: b'THIS IS LAST LINE SENT BY CLIENT\r\n' Reading data line.....socket state=4, data: b'' EOF (CLIENT CLOSED CONNECTION) Connection closed......socket state=-17 <socket state=1 timeout=0 incoming=20208 off=0> Wait for connection...socket state=1 Accepting connection..socket state=1 Waiting for data......socket state=3 Data received.........socket state=3 Reading data line.....socket state=3, data: b'TERMINATE BY RESET\r\n' Reading data line.....socket state=3, data: b'DATA LINE 2\r\n' Reading data line.....socket state=3, data: b'DATA LINE 3\r\n' Reading data line.....socket state=3, data: b'\r\n' Reading data line.....socket state=3, data: b'fill-1 fill-2 fill-3 fill-4(\r\n' Reading data line.....socket state=3, data: b'This is not complete until...' Reading data line.....socket state=3, data: None (PENDING CONNECTION ABANDON) Reading data line.....socket state=3, data: None (PENDING CONNECTION ABANDON) Reading data line.....socket state=3, data: b'(...continued with CRLF here->\r\n' Reading data line.....socket state=3, data: b'THIS IS LAST LINE SENT BY CLIENT\r\n' Reading data line.....socket state=-14, data: ECONNRESET (CLIENT RESET CONNECTION) Connection RESET......socket state=-14 <socket state=1 timeout=0 incoming=30308 off=0> Wait for connection...socket state=1 Accepting connection..socket state=1 Waiting for data......socket state=3 Data received.........socket state=3 Reading data line.....socket state=3, data: b'TERMINATE BY ABNDN\r\n' Reading data line.....socket state=3, data: b'DATA LINE 2\r\n' Reading data line.....socket state=3, data: b'DATA LINE 3\r\n' Reading data line.....socket state=3, data: b'\r\n' Reading data line.....socket state=3, data: b'fill-1 fill-2 fill-3 fill-4(\r\n' Reading data line.....socket state=3, data: b'This is not complete until...' Reading data line.....socket state=3, data: None (PENDING CONNECTION ABANDON) Reading data line.....socket state=3, data: None (PENDING CONNECTION ABANDON) Reading data line.....socket state=3, data: None (PENDING CONNECTION ABANDON) Connection ABANDONED..socket state=-17 <socket state=1 timeout=0 incoming=40408 off=0> Wait for connection...socket state=1 Accepting connection..socket state=1 Waiting for data......socket state=3 Data received.........socket state=3 Reading data line.....socket state=3, data: b'TERMINATE BY CLOSE\r\n' Reading data line.....socket state=3, data: b'DATA LINE 2\r\n' Reading data line.....socket state=3, data: b'DATA LINE 3\r\n' Reading data line.....socket state=3, data: b'\r\n' Reading data line.....socket state=3, data: b'fill-1 fill-2 fill-3 fill-4(\r\n' Reading data line.....socket state=3, data: b'This is not complete until...' Reading data line.....socket state=3, data: None (PENDING CONNECTION ABANDON) Reading data line.....socket state=3, data: None (PENDING CONNECTION ABANDON) Reading data line.....socket state=3, data: b'(...continued with CRLF here->\r\n' Reading data line.....socket state=4, data: b'THIS IS LAST LINE SENT BY CLIENT\r\n' Reading data line.....socket state=4, data: b'' EOF (CLIENT CLOSED CONNECTION) Connection closed......socket state=-17

dpgeorge · 2019-10-26T04:57:53Z

@t35tB0t thanks for the testing. I managed to use your SO_LINGER test to reliably trigger a TCP RST after connection, thus providing a test for case #3 above. In my tests, comparing to CPython behaviour, I found that poll should return (unsolicited) both of POLLHUP and POLLERR.

The POLLHUP response when socket is reset was tested with the server (mPy PR5222 on STM32) running the server script below. The remote client was running a variable delay connection open/close using the SO_LINGER socket option (see pollhup_client.py).

Ok, so we agree on this case #3 then. I've pushed a commit to add POLLERR to the return of case #3 and tested that it works with these test scripts of yours.

(I also found some other minor bugs with lwip, like abandoning queued incoming data when it gets a RST, but that's independent to the poll issues here and can be looked at later.)

dpgeorge · 2019-10-26T05:22:15Z

Using the sandbox.zip tests above I can trigger cases #2 and #3, and according to this test these cases are now handled correctly by the commits in this PR.

dpgeorge · 2019-10-26T05:25:16Z

And I've also pushed a final commit to this PR to unconditionally return POLLERR for case number #5 (general socket error). Even though we don't have a test for this I'm confident with this change based on how poll should behave.

dpgeorge · 2019-10-31T02:47:01Z

I slightly modified this PR when handling case #3, to not add a return of POLLERR, because in some cases it should not be returned in this path.

Merged in 71401d5 through 26d8fd2

…tout Make the board ID available in board and boot_out

dpgeorge added 4 commits October 16, 2019 16:26

extmod/modlwip: Unconditionally return POLLHUP when polling new socket.

c464a52

POSIX poll should always return POLLERR and POLLHUP in revents, regardless of whether they were requested in the input events flags. See issues micropython#4290 and micropython#5172.

py/stream.h: Add MP_STREAM_POLL_NVAL constant.

64d9785

extmod/modlwip: Make socket poll return POLLNVAL in case of bad file.

45249ec

extmod/modlwip: Unconditionally return POLLHUP if ERR_RST.

b873e14

dpgeorge mentioned this pull request Oct 16, 2019

modlwip.c does not properly return POLL_HUP and POLL_ERR socket errors #5172

Closed

dpgeorge added the extmod label Oct 16, 2019

extmod/modlwip: Fix previous commit to also return POLL_ERR.

54596b1

extmod/modlwip: Unconditionally return POLLERR for general socket error.

891bdde

jimmo mentioned this pull request Oct 29, 2019

Cancelling coroutines: can't pend throw to just-started generator. #5242

Closed

dpgeorge closed this Oct 31, 2019

dpgeorge deleted the extmod-lwip-poll-unconditional-error branch October 31, 2019 02:47

tannewt pushed a commit to tannewt/circuitpython that referenced this pull request Aug 27, 2021

Merge pull request micropython#5222 from Neradoc/nera-board-id-in-boo…

41168c8

…tout Make the board ID available in board and boot_out

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: fix lwip polling to unconditionally return POLLHUP/ERR/NVAL #5222

WIP: fix lwip polling to unconditionally return POLLHUP/ERR/NVAL #5222

dpgeorge commented Oct 16, 2019

dpgeorge commented Oct 16, 2019

dpgeorge commented Oct 16, 2019

t35tB0t commented Oct 17, 2019 via email

dpgeorge commented Oct 17, 2019

t35tB0t commented Oct 18, 2019 via email

t35tB0t commented Oct 18, 2019 via email

dpgeorge commented Oct 18, 2019

t35tB0t commented Oct 19, 2019 •

edited

t35tB0t commented Oct 20, 2019 via email •

edited

dpgeorge commented Oct 26, 2019

dpgeorge commented Oct 26, 2019

dpgeorge commented Oct 26, 2019

dpgeorge commented Oct 31, 2019

WIP: fix lwip polling to unconditionally return POLLHUP/ERR/NVAL #5222

WIP: fix lwip polling to unconditionally return POLLHUP/ERR/NVAL #5222

Conversation

dpgeorge commented Oct 16, 2019

dpgeorge commented Oct 16, 2019

dpgeorge commented Oct 16, 2019

t35tB0t commented Oct 17, 2019 via email

dpgeorge commented Oct 17, 2019

t35tB0t commented Oct 18, 2019 via email

t35tB0t commented Oct 18, 2019 via email

dpgeorge commented Oct 18, 2019

t35tB0t commented Oct 19, 2019 • edited

t35tB0t commented Oct 20, 2019 via email • edited

dpgeorge commented Oct 26, 2019

dpgeorge commented Oct 26, 2019

dpgeorge commented Oct 26, 2019

dpgeorge commented Oct 31, 2019

t35tB0t commented Oct 19, 2019 •

edited

t35tB0t commented Oct 20, 2019 via email •

edited