-
-
Notifications
You must be signed in to change notification settings - Fork 7.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: fix lwip polling to unconditionally return POLLHUP/ERR/NVAL #5222
WIP: fix lwip polling to unconditionally return POLLHUP/ERR/NVAL #5222
Conversation
POSIX poll should always return POLLERR and POLLHUP in revents, regardless of whether they were requested in the input events flags. See issues micropython#4290 and micropython#5172.
The hard part with this is to find tests that test the error paths. I've found 2 such tests (submitted here) for 2 of the cases returning HUP/ERR/NVAL (poll on a new socket, and poll after socket closure). A third WIP test is added for another case (peer did a RST). And there's still a 4th case which I did not fix yet because I couldn't find a test for it (other error not within one of the previous 3 classes). Note that this PR also adds POLLNVAL internally to get compliance with POSIX behaviour. |
@t35tB0t it would be great if you could test this. |
@dpgeorge - thanks for opening the PR. I'll test this shortly. What are
the test cases/error paths we're looking to exercise here?
1) STATE_NEW: ret |= MP_STREAM_POLL_HUP
2) STATE_PEER_CLOSED: ret |= flags & (MP_STREAM_POLL_RD |
MP_STREAM_POLL_WR)
3) ERR_RST: ret |= flags & (MP_STREAM_POLL_RD |
MP_STREAM_POLL_WR)) | MP_STREAM_POLL_HUP
4) _ERR_BADF: ret |= MP_STREAM_POLL_NVAL
5) socket->state < 0: ret |= flags & MP_STREAM_POLL_ERR
Notes:
I'm not so sure that condition #2 or #5 will return an error!?!
I should easily be able to test for condition #'s: 1,2,3,4.
What type of socket states or conditions would result in condition #5?
|
@t35tB0t yes there are 5 paths to test. I tried to write minimal tests for these but it's not easy. It'd be great if you could try to test them in whatever way you have available.
|
@dpgeorge - Agreed that condition #2 shouldn't return as an error
because subsequent read or write will throw an appropriate exception.
However, not returning an err on condition #5 is concerning. The clause
as stated in line 1485 will mask out the MP_STREAM_POLL_RD or WR which
was set up in line #1448. I am concerned that this could result in a
hung condition where user code is waiting for a socket to be readable or
writable and yet the socket.state<0 error condition keeps clearing out
the RD or WR flags.
IMHO, not returning these errors unsolicited has been resulting in user
code hangs. As-is, condition #5 can is the last remaining condition
that can hang user code. It might be safer to either return the error
condition and let user exception handlers trap it (which they must now
anyhow due to other unsolicited error returns) or mask out this error
state entirely.
In order to set up a bench condition which triggers a socket.state<0
condition, we'll have to better understand what code paths can set a
socket to such a state. If it isn't ever supposed to happen, then it
would be a very bad thing - IMHO we still should trap it here and report
it up the call stack.
|
@dpgeorge - re: socket reset to trigger PULLHUP error in modlwip.
Testing from a remote host with the SO_LINGER socket option made it easy
to generate socket reset errors in LWIP.
The key is to get the test device to wait between its yielded
async.IOread() and a socket.accept() or socket.read(). The the
connecting client will have sufficient time from SYN to generate its RST.
If we want a more specific and stripped down regression test here, then
we'll want a pair of client and server scripts with simple loops. The
test device can use a fat fixed delay between a yielded IOread() and
socket.accept() or socket read(). And the packet generator host can
have a short delay between its socket open (SYN) and socket reset. What
will need to be validated is the actual behavior of the test host on the
wire. Getting the socket to reset vs perform a friendly close when the
user code drops the socket object may be dependent on the system used.
Note that this uses the SO_LINGER socket option such that the
socket.close() generates a RST packet on the wire. An alternative
approach is SCAPY with a simple SYN/SYN-ACK/ACK/{delay}/RST sequence.
Doing this kind of attack from a micropython device requires a method of
avoiding socket close. We've discussed this previously regarding
socket.abort(). I did a test patch to mPy which exposed a
socket.abort() LWIP method. That worked handily. Generating a RST may
be similarly easy. With both abort and reset exposed, this kind of
in-kind testing would be much easier. However, you may not want to add
features that are primarily useful. Enabling and exposing the SO_LINGER
socket option would be another option.
#>> import subprocess
#>> proc = subprocess.Popen("python socket_test.py")
#>> proc = subprocess.Popen("python socket_test.py")
#>> proc = subprocess.Popen("python socket_test.py")
#
# Nasty HTTP Client which simply RESETs the connection some random time
after sending GET request
#
# Running several concurrently increases the stress on LWIP
# Running many concurrently tests backlog overload and socket
timeout/recovery
#
def cRandomReset(host, port,min,max,iterations):
from random import randint
for i in range(iterations):
try:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)
s.connect((host, port))
lgr_onoff = 1
lgr_linger = 0
s.setsockopt(socket.SOL_SOCKET,
socket.SO_LINGER,struct.pack('ii', lgr_onoff, lgr_linger))
s.send(b'GET / HTTP/1.0\r\n\r\n')
except Exception as e:
print('Pass: ',i,e)
finally:
time.sleep(randint(min,max)/1000)
s.close() # the close will actually issue a socket RESET
since we've set the linger option
pass
cRandomReset('10.0.0.5',80,0,1000,1000000)
|
Yes I agree that (5) should be fixed as well (to unconditionally return an error), and I'm happy to fix it even without having a test for it. It would just be nice to find such a test if possible (although not critical to move forward with this fix). |
@dpgeorge - Testing confirms unsolicited errors are returned when remote connections are abandoned or reset. It is important to note that unsolicited errors must now be handled in user code (this includes uasyncio and other modules). Specifically, existing modules have race conditions and missing exception handling which will now require fixing. Previously, these modules were prone to hanging when the socket errors were not returned by modlwip. This PR fixes the hanging and (with proper error handling), user apps can now tolerate fairly poor networking connections. The POLLHUP response when socket is reset was tested with the server (mPy PR5222 on STM32) running the server script below. The remote client was running a variable delay connection open/close using the SO_LINGER socket option (see pollhup_client.py). We are looking for a simple way to stimulate the fifth error condition (negative state) in modlwip. In the meantime, it seems prudent to modify the fifth clause in modlwip. Providing complete condition coverage will prevent applications from hanging under any socket error state. |
For the modlwip.c testing, abandoned connections are simulate by stalling the client script with long sleeps to simulate a connection abandon. The following scripts exercise modlwip.c unsolicited exception handling with present coverage including RESET, None read (ABANDON), and empty read (EOF). These demonstrate a lot of what's going on with the modlwip.c poller calls and related exceptions returned. It would be nice to have the lwip socket.abandon() method exposed to mPy somehow. It would also be nice to have access socket.state() (what I did in the scripts was to get a the state was too awkward).
run the clientConn.py on a unit mPy port. run the serverConn.py on s suitable device...
[sandbox.zip](https://github.com/micropython/micropython/files/3747674/sandbox.zip)
EXAMPLE OUTPUT ON SERVER SIDE:
Wait for connection...socket state=1
Accepting connection..socket state=1
Waiting for data......socket state=3
Data received.........socket state=3
Reading data line.....socket state=3, data: b'TERMINATE BY CLOSE\r\n'
Reading data line.....socket state=3, data: b'DATA LINE 2\r\n'
Reading data line.....socket state=3, data: b'DATA LINE 3\r\n'
Reading data line.....socket state=3, data: b'\r\n'
Reading data line.....socket state=3, data: b'fill-1 fill-2 fill-3 fill-4(\r\n'
Reading data line.....socket state=3, data: b'This is not complete until...'
Reading data line.....socket state=3, data: None (PENDING CONNECTION ABANDON)
Reading data line.....socket state=3, data: None (PENDING CONNECTION ABANDON)
Reading data line.....socket state=3, data: b'(...continued with CRLF here->\r\n'
Reading data line.....socket state=4, data: b'THIS IS LAST LINE SENT BY CLIENT\r\n'
Reading data line.....socket state=4, data: b'' EOF (CLIENT CLOSED CONNECTION)
Connection closed......socket state=-17
<socket state=1 timeout=0 incoming=20208 off=0>
Wait for connection...socket state=1
Accepting connection..socket state=1
Waiting for data......socket state=3
Data received.........socket state=3
Reading data line.....socket state=3, data: b'TERMINATE BY RESET\r\n'
Reading data line.....socket state=3, data: b'DATA LINE 2\r\n'
Reading data line.....socket state=3, data: b'DATA LINE 3\r\n'
Reading data line.....socket state=3, data: b'\r\n'
Reading data line.....socket state=3, data: b'fill-1 fill-2 fill-3 fill-4(\r\n'
Reading data line.....socket state=3, data: b'This is not complete until...'
Reading data line.....socket state=3, data: None (PENDING CONNECTION ABANDON)
Reading data line.....socket state=3, data: None (PENDING CONNECTION ABANDON)
Reading data line.....socket state=3, data: b'(...continued with CRLF here->\r\n'
Reading data line.....socket state=3, data: b'THIS IS LAST LINE SENT BY CLIENT\r\n'
Reading data line.....socket state=-14, data: ECONNRESET (CLIENT RESET CONNECTION)
Connection RESET......socket state=-14
<socket state=1 timeout=0 incoming=30308 off=0>
Wait for connection...socket state=1
Accepting connection..socket state=1
Waiting for data......socket state=3
Data received.........socket state=3
Reading data line.....socket state=3, data: b'TERMINATE BY ABNDN\r\n'
Reading data line.....socket state=3, data: b'DATA LINE 2\r\n'
Reading data line.....socket state=3, data: b'DATA LINE 3\r\n'
Reading data line.....socket state=3, data: b'\r\n'
Reading data line.....socket state=3, data: b'fill-1 fill-2 fill-3 fill-4(\r\n'
Reading data line.....socket state=3, data: b'This is not complete until...'
Reading data line.....socket state=3, data: None (PENDING CONNECTION ABANDON)
Reading data line.....socket state=3, data: None (PENDING CONNECTION ABANDON)
Reading data line.....socket state=3, data: None (PENDING CONNECTION ABANDON)
Connection ABANDONED..socket state=-17
<socket state=1 timeout=0 incoming=40408 off=0>
Wait for connection...socket state=1
Accepting connection..socket state=1
Waiting for data......socket state=3
Data received.........socket state=3
Reading data line.....socket state=3, data: b'TERMINATE BY CLOSE\r\n'
Reading data line.....socket state=3, data: b'DATA LINE 2\r\n'
Reading data line.....socket state=3, data: b'DATA LINE 3\r\n'
Reading data line.....socket state=3, data: b'\r\n'
Reading data line.....socket state=3, data: b'fill-1 fill-2 fill-3 fill-4(\r\n'
Reading data line.....socket state=3, data: b'This is not complete until...'
Reading data line.....socket state=3, data: None (PENDING CONNECTION ABANDON)
Reading data line.....socket state=3, data: None (PENDING CONNECTION ABANDON)
Reading data line.....socket state=3, data: b'(...continued with CRLF here->\r\n'
Reading data line.....socket state=4, data: b'THIS IS LAST LINE SENT BY CLIENT\r\n'
Reading data line.....socket state=4, data: b'' EOF (CLIENT CLOSED CONNECTION)
Connection closed......socket state=-17
|
@t35tB0t thanks for the testing. I managed to use your SO_LINGER test to reliably trigger a TCP RST after connection, thus providing a test for case #3 above. In my tests, comparing to CPython behaviour, I found that poll should return (unsolicited) both of POLLHUP and POLLERR.
Ok, so we agree on this case #3 then. I've pushed a commit to add POLLERR to the return of case #3 and tested that it works with these test scripts of yours. (I also found some other minor bugs with lwip, like abandoning queued incoming data when it gets a RST, but that's independent to the poll issues here and can be looked at later.) |
And I've also pushed a final commit to this PR to unconditionally return POLLERR for case number #5 (general socket error). Even though we don't have a test for this I'm confident with this change based on how poll should behave. |
…tout Make the board ID available in board and boot_out
See #4290 and #5172