Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scripts/libraries: Fix RSTP library for the winc. #1786

Merged
merged 1 commit into from Feb 22, 2023

Conversation

kwagyeman
Copy link
Member

@kwagyeman kwagyeman commented Feb 20, 2023

Fixes the RSTP driver for all OpenMV Cam boards using the winc (WiFi Shield). You can connect to the server multiple times and stream images without the camera crashing. Everything appears to work fine with the default UDP transport. With a TCP data transport it appears some issue happens in the shield firmware that will eventually cause the TCP socket to stop working (no errors are seen in wireshark though... the socket just stops). Anyway, on the socket being closed by the remote partner the camera will reset.

Luckily, the RTP protocol is almost always in UDP mode so the TCP support isn't important. However, It does show a bug in the firmware.

/ffplay.exe rtsp://x.x.x.x -fflags nobuffer - UDP transport (the default)
/ffplay.exe -rtsp_transport tcp rtsp://x.x.x.x -fflags nobuffer - TCP transport

Boards using the 1DX are still broken. It appears though that accept() does not work at all.

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(self.__myaddr)
s.listen(0)
s.settimeout(5)
self.__tcp__socket, self.__client_addr = s.accept()
s.close()

The above works fine on the winc and doesn't work on the 1dx.

Was not able to test ethernet on the Portenta yet.

@iabdalkader
Copy link
Member

You still missed closing server sockets, if accept fails on timeout (which happens every 5 seconds), the server socket is not closed, same for UDP

 self.__tcp__socket, self.__client_addr = s.accept()
s.close() <- never executed

The socket.socket call rarely fails (out of memory maybe or bad args), if you take it out of the try except you can close the socket in finally.

@iabdalkader
Copy link
Member

Boards using the 1DX are still broken. It appears though that accept() does not work at all.

Every time you create a socket with WINC it uses a new FD, other drivers reuse the same FD so you before bind() you need s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) you can check if the option exists with something like if hasattr(socket, "SO_REUSEADDR")

@kwagyeman
Copy link
Member Author

kwagyeman commented Feb 20, 2023

Okay, got it working on the Nicla. Dealing with a few bugs though that are in the WiFi Driver:

  1. The WiFi driver will return data back to the user application from recv() but won't ACK the TCP packet. Seems to happen randomly. This results in TCP re-transmissions being seen in wireshark. However, you can see the RTSP statemachine move forwards as if the application received the packet. Not sure what's causing this. It could be due to the very low timeout I've sent on receive, but, that shouldn't create a bug with acking TCP packets.

  2. The TCP connect is closed by the remote device this generates a RST packet which remotes in the socket returning the connection reset error code. [Errno 104] ECONNRESET. However, if the remote partner sends FIN instead the camera will ACK the packet but not close the socket and it will instead recv() will just turn a zero length byte array on each call.

I worked around this issue by just erroring out if recv() returns zero data instead of aborting. Not sure if that's what we should do though.

...

UDP mode for RSTP works perfectly and doesn't drop any data. It's solid.

As for TCP mode, from wireshark it doesn't appear that we are dropping any data being sent to the PC. It's getting all packets but there's a lot of packets that are being joined and etc. in wireshark. It appears the remote RTSP client on the PC gets confused by this and then starts sending the camera not implemented messages and then the camera responds with bad request messages in between sending data to the PC. Anyway, not sure what can be done here. It doesn't appear to be an issue without our code - FFMPEG just can't handle a stream of raw bytes filled with JPEG RTP packets unless they are broken up with a packet per header+data. When headers+data get merged into a single packet returned by the application layer in the PC FFPMEG get confused.

Maybe look into why this is happening? Can send ensure a packet is generated on it being called versus all the data being fifoed into a stream (is that useful though)? UDP seems to send a packet on send.

...

TCP mode on the WINC appears to work fine without FFPMEG getting confused. Sometimes there are multiple frames in a single packet to the application level and things are okay. However, the WINC eventually breaks and stops send data after a while like before.

This points to there being sometype of bug with the TCP mode in both drivers... different bugs though.

@kwagyeman
Copy link
Member Author

On the WINC shield the camera dies like this:

image

@kwagyeman kwagyeman force-pushed the kwabena/fix_rstp_on_winc1500 branch 2 times, most recently from 1753031 to 93b0f3d Compare February 20, 2023 22:41
@kwagyeman
Copy link
Member Author

Changes have been pushed. UDP mode is working and at high speeds for the WINC and 1DX.

TCP mode appears to work at high speeds on the WINC at first and then it dies after a minute or so.
TCP mode on the 1DX drops all kinds of data and confuses the PC.

Given the WINC TCP mode works fine until the crash the issue is the drivers for both.

@iabdalkader
Copy link
Member

This pull request has been mentioned on OpenMV Forums. There might be relevant details there:

https://forums.openmv.io/t/rtsp-in-nicla-vision/7884/16

@iabdalkader
Copy link
Member

TCP mode appears to work at high speeds on the WINC at first and then it dies after a minute or so.

For me it works fine, can run for an hour, but crashes as soon as the client (ffplay) is closed. It is a driver issue but the library is triggering it, seems you're calling recv() on a closed socket:

(gdb) backtrace 
#0  py_winc_socket_recv (socket=0x30005e60, buf=0x30007060 "", len=1380, _errno=0xfa8c)
    at ../ports/stm32/modules/py_winc.c:511
#1  0x0810c31a in socket_recv (self_in=0x30005e60, len_in=<optimized out>) at ../../extmod/modusocket.c:291
#2  0x080e0356 in mp_execute_bytecode (code_state=code_state@entry=0x30005df0, inject_exc=inject_exc@entry=0x0)
    at ../../py/vm.c:1026
(gdb) print socket
$1 = (mod_network_socket_obj_t *) 0x30005e60
(gdb) print *socket
$2 = {base = {type = 0x81c33f8 <socket_type>}, nic = 0x30000388 <winc_obj>, nic_type = 0x819a7e8 <mod_network_nic_type_winc>, 
  domain = 2, type = 1, proto = 0, bound = 0, fileno = -1, timeout = 9, callback = 0x0, state = 0, _private = 0x0}

Note fileno=-1 and _private=0x0 that's a closed socket. Anyway, I fixed it in both the driver and library, it still shouldn't crash the camera, so please apply this patch:

diff --git a/scripts/libraries/rtsp.py b/scripts/libraries/rtsp.py
index 27cb822e..9fa27480 100644
--- a/scripts/libraries/rtsp.py
+++ b/scripts/libraries/rtsp.py
@@ -284,7 +284,7 @@ class rtsp_server:
 
     def __close_socket(self):  # private
         if self.__transport_is_tcp:
-            pass
+            self.__close_tcp_socket()
         else:
             self.__close_udp_socket()
 
@@ -305,6 +305,7 @@ class rtsp_server:
             try:
                 self.__settimeout(0.1)
                 timestamp = (pyb.millis() * 90) & 0xFFFFFFFF
+                mv = memoryview(img)
                 while l:
                     rtp_header = struct.pack(
                         ">BBHII",
@@ -318,7 +319,7 @@ class rtsp_server:
                     jpeg_header = struct.pack(
                         ">IBBBB", i, 0, quality, int(img.width() // 8), int(img.height() // 8)
                     )
-                    img_data = memoryview(img.bytearray())[i : i + min(l, max_packet_size)]
+                    img_data = mv[i : i + min(l, max_packet_size)]
                     img_data_len = len(img_data)
                     data_len = self.__send(rtp_header + jpeg_header + img_data)
                     if not data_len:

It works fine now but the stream is heavily buffered, looks like a 2-3 seconds lag, the following flags help but don't completely remove the lag:

ffplay  -fflags nobuffer -framedrop -flags low_delay -rtsp_transport tcp rtsp://xxx.xxx.xxx.xxx

TCP mode on the 1DX drops all kinds of data and confuses the PC.

Did you attach the antenna ? It doesn't work very well without it.

@kwagyeman
Copy link
Member Author

Yeah, the antenna was attached for the Nicla. Let me retest with these changes. I think the nicla has different issues.

If you look at its packet stream versus the winc the data looks very different. Packets look correct with the winc and missing bytes in the nicla.

@iabdalkader
Copy link
Member

Yeah, the antenna was attached for the Nicla. Let me retest with these changes. I think the nicla has different issues.

Tested it, I can see the issue with TCP transport, have no clue what it is, but this PR + the patch above should be good to merge.

@kwagyeman
Copy link
Member Author

kwagyeman commented Feb 21, 2023

Okay, pushed the latest.

Also, seeing a new bug on the WINC1500. About 50% of the time the first packet from the TCP socket is never returned by recv(). E.g. I can see the connection handshake in wireshark. The server sends a packet. The WINC acks it. And then recv() returns nothing. The timeout being small seems to influence it. Also, my WINC1500 still crashes in TCP mode after about 1 minute of running.

I've got version:

Firmware version: (19, 6, 1, 19, 6, 1, 1377184)

Maybe I need to update it?

@kwagyeman
Copy link
Member Author

kwagyeman commented Feb 21, 2023

Just updated my WINC1500 and now connect doesn't work.

Mmm, seems like network_if.active(True) breaks the thing...

That's another bug with the winc.

@kwagyeman
Copy link
Member Author

Okay, after updating my firmware the first packet missing issue is resolved.

UDP mode seems to be solid still. In TCP mode though everything breaks once this starts happening:

image

@iabdalkader
Copy link
Member

iabdalkader commented Feb 21, 2023

Actually I found a serious issue in the library, you assume send() sends out all data before returning, but it doesn't because the sockets are non-blocking, it will just try to send out as much as possible and might not always succeed. Please switch to sendall(), also note the timeout is very short, it should be at least 1 second for send. Here's another patch, this is tested on WINC/CYW43 TCP/UDP works fine:

diff --git a/scripts/libraries/rtsp.py b/scripts/libraries/rtsp.py
index 7ee459d5..ecb689bb 100644
--- a/scripts/libraries/rtsp.py
+++ b/scripts/libraries/rtsp.py
@@ -273,9 +273,10 @@ class rtsp_server:
 
     def __send(self, data):  # private
         if self.__transport_is_tcp:
-            return self.__tcp__socket.send(
+            self.__tcp__socket.sendall(
                 struct.pack(">BBH", 0x24, self.__client_rtp_channel, len(data)) + data
             )
+            return True
         else:
             return self.__udp_rtp__socket.sendto(data, self.__client_rtp_addr)
 
@@ -306,7 +307,7 @@ class rtsp_server:
             max_packet_size -= 4
         if self.__valid_socket():
             try:
-                self.__settimeout(1)
+                self.__settimeout(5.0)
                 timestamp = (pyb.millis() * 90) & 0xFFFFFFFF
                 mv = memoryview(img)
                 while l:
@@ -324,8 +325,7 @@ class rtsp_server:
                     )
                     img_data = mv[i : i + min(l, max_packet_size)]
                     img_data_len = len(img_data)
-                    data_len = self.__send(rtp_header + jpeg_header + img_data)
-                    if not data_len:
+                    if not self.__send(rtp_header + jpeg_header + img_data):
                         break
                     i += img_data_len
                     l -= img_data_len

Note the command I use is:

ffplay  -fflags nobuffer -framedrop -flags low_delay -rtsp_transport tcp rtsp://xxx.xxx.xxx.xxx

@kwagyeman
Copy link
Member Author

Updated the code. Everything works. Ready to merge.

Will update other web servers with what we learned here. This code puts the driver through the ringer.

Observation:

  • TCP/UDP mode on the WINC have the same performance.
  • UDP mode is much faster than TCP mode on the 1DX. Probably should look into why the 1DX TCP performance is terrible.

img_data_len = len(img_data)
data_len = self.__send(rtp_header + jpeg_header + img_data)
if not data_len:
if not self.__send(rtp_header + jpeg_header + img_data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a chance here to optimize GC/RAM usage, if you split this into 3 send calls no new memory will be allocated, currently I think it makes a new object with headers + image data. Also for the above struct, if you use a bytearray instead and just change the i bytes, no new memory will be created every iteration. It's fine for now but might want to fix it in the future.

@iabdalkader iabdalkader merged commit 434e10c into openmv:master Feb 22, 2023
@Victor-Martinez-Pozos
Copy link

Victor-Martinez-Pozos commented Feb 23, 2023

Hi I'm trying to run the rtsp server example on a nicla vision, I was working on an old firmware and even though I couldn't obtain properly the frames from the device(it breaks on the opening part) the server appears to be up and running, now I'm running on the latest firmware, I guess, it's (OpenMV 4.4.1 and Micropython d8e38d0) and neither vlc or opencv can establish any connection, so should I clone and compile the repo to use your last charges? Or what should I do to fix the problem, I haven't change any major things in the example apart of waiting for the net_if connected.

Edit: Btw running ffplay with the configs above commented I get an error like this "rtsp://...:554: Invalid data found when ..."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants