Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timeout error running SoapySDRUtil --probe on LimeSDR #29

Closed
rkb68 opened this issue Jun 29, 2017 · 15 comments
Closed

timeout error running SoapySDRUtil --probe on LimeSDR #29

rkb68 opened this issue Jun 29, 2017 · 15 comments
Labels

Comments

@rkb68
Copy link

rkb68 commented Jun 29, 2017

With current release, a timeout error appears that did not appear in previous release; behavior is different depending on whether the host, a Raspberry Pi 3 (yeah, I know), is running Raspian (in which case at least the connection succeeds) or openSUSE (in which case it does not).

..................................................................................................................................

Raspian:

[server]
$ SoapySDRServer --bind
7701212d-a2ea-14cc-8567-0325007f0101
Launching the server... tcp://[::]:55132
Server bound to [::]:55132
Launching discovery server...
Press Ctrl+C to stop the server
SoapyServerListener::accept([::ffff:192.168.1.9]:50506)
SoapyServerListener::accept([::ffff:192.168.1.9]:50507)
SoapyServerListener::accept([::ffff:192.168.1.9]:50508)
SoapyServerListener::close()
SoapyServerListener::close()
SoapyServerListener::accept([::ffff:192.168.1.9]:50509)
SoapyServerListener::close()
SoapyServerListener::handlerLoop() FAIL: SoapyRPCUnpacker::recv(header) FAIL:
SoapyServerListener::close()

[client]
$ SoapySDRUtil --probe="remote=192.168.1.10"
Probe device remote=192.168.1.10
[INFO] Make connection: 'LimeSDR-USB [USB 2.0] 9060B00472212'
[INFO] Estimated reference clock 30.7198 MHz
[INFO] Selected reference clock 30.720 MHz
[INFO] Device name: LimeSDR-USB
[INFO] Reference: 30.72 MHz
[INFO] Init LMS7002M(0)
[INFO] LMS7002M cache /home/pi/.limesuite/LMS7002M_cache_values.db
[INFO] Ver=7, Rev=1, Mask=1
[INFO] LMS7002M calibration values caching Enable
Error probing device: SoapyRPCUnpacker::recv() TIMEOUT:

..................................................................................................................................

openSUSE:

[server]
$ SoapySDRServer --bind
79ac27fb-a35e-14cc-8567-04ada8c00a01
Launching the server... tcp://[::]:55132
Server bound to [::]:55132
Launching discovery server...
Press Ctrl+C to stop the server
SoapyServerListener::accept([::ffff:192.168.1.9]:50542)
[ERROR] ~SoapyRPCUnpacker: Unconsumed payload bytes -4
SoapyServerListener::handlerLoop() FAIL: SoapyRPCUnpacker::recv(header) FAIL:
SoapyServerListener::close()

[client]
$ SoapySDRUtil --probe="remote=192.168.1.10"
Probe device remote=192.168.1.10
[ERROR] SoapyRemote::find() -- connect(tcp://192.168.1.10:55132) FAIL: connect(tcp://192.168.1.10:55132) [110: Connection timed out]
[INFO] Make connection: ''
Error probing device: Failed to make connection with ''

@guruofquality
Copy link
Contributor

There were cases where we got locked up forever waiting for system tcp timeouts when hosts had been removed from the network suddenly. Its possible that these timeouts are too strict: 0ede4c4

Its probably the case that some calls can take a very long time. Can you try increasing this value above 3 seconds? https://github.com/pothosware/SoapyRemote/blob/master/common/SoapyRPCUnpacker.cpp#L23

The default timeout for TCP seems to be 1 or 2 minutes in which case the OS closes the connection. Obviously thats too high, we need to find something in the middle :-)

@guruofquality
Copy link
Contributor

I tried to improve the timeout check so we actually check if the server is down to allow for longer blocking calls: https://github.com/pothosware/SoapyRemote/tree/timeout_improved Give it try, thanks!

@rkb68
Copy link
Author

rkb68 commented Jun 29, 2017

Thanks, Josh. I tried the new tree, and initially the behavior against the two servers (Raspian and openSUSE) was the same as before, with the same error messages.

I then upped the big time limit from 30000000 to 35000000, at which point the command
SoapySDRUtil --probe="remote=192.168.1.10"
completed successfully against the Raspian server, though a bunch of these messages popped up on the client:
[ERROR] ~SoapyRPCUnpacker: Unconsumed payload bytes -4
with these corresponding complaints on the server:
SoapyServerListener::handlerLoop() FAIL: SoapyRPCUnpacker::recv(header) FAIL:

With the new tree and modified time limit, the behavior against the openSUSE server did not change; same errors that I had listed originally.

@guruofquality
Copy link
Contributor

  • I can confirm that the timeout on the master branch was a problem for the LimeSDR when it was performing a calibration which took a long time.
  • Also the first common on the timeout_improved branch causes an error to print when the remote test connection is made. I dont think it is harmful though.
  • I just pushed another commit to the timeout_improved branch, it seems that test connection needs to be properly "hung up" on. This prevents the error message.

@rkb68
Copy link
Author

rkb68 commented Jun 30, 2017

Thanks again. The "hang up" code does the trick- no error messages running "--probe" against the Raspian server. The error messages I got running against openSUSE seem to have a completely different origin.

@guruofquality
Copy link
Contributor

I merged that change btw. I'm not sure about the second set of errors on opensuse. I'm curious if you see the issue on the previous release 0.4.0 (https://github.com/pothosware/SoapyRemote/tree/soapy-remote-0.4.0). That would help pinpoint if there an additional issue with the timeout additions in 0.4.1.

@rkb68
Copy link
Author

rkb68 commented Jul 3, 2017

Just tested 0.4.0- it works like a charm with the Raspian server, and gives same old same old with the openSUSE server (no surprises there).

cheers,
robert

@coburnw
Copy link

coburnw commented Jul 5, 2017

I seem to be getting a related problem with SoapyLogAcceptor. When running a rate test with SoapySDRUtil, a single timeout occurs about 30 seconds into a run, once only, with the rate test continuing to run after the error with no apparent errors.

But when SoapySDRUtil is stopped with Ctrl-C, the remote server also closes and needs to be restarted.

If i put a sleep(40) immediately after make device in SoapySDRRateTest, the error occurs roughly 30seconds into the sleep, then continues clean until stopped with Ctrl-C resulting in the remote exiting also.

SoapySDRUtil --rate=39062.5 --channels=0 --direction=RX
'######################################################
'## Soapy SDR -- the SDR abstraction library
'######################################################

[INFO] SoapyRemote::setupRxStream(remoteFormat=CS32, localFormat=CS32, scaleFactor=5.53648e+07, mtu=1500, window=44040192)
[INFO] Client side stream bound to 192.168.2.131:43174
[INFO] Client side status bound to 192.168.2.131:50313
[INFO] Server side stream bound to [::ffff:192.168.2.101]:48581
[INFO] Server side stream connected to [::ffff:192.168.2.131]:43174
[INFO] Server side status connected to [::ffff:192.168.2.131]:50313
[INFO] Configured sender endpoint: dgram=1452 bytes, 178 elements @ 8 bytes, window=43008 KiB
[INFO] Client side stream connected to 192.168.2.101:48581
[INFO] Configured receiver endpoint: dgram=1452 bytes, 178 elements @ 8 bytes, window=43008 KiB
Stream format: CS32
Num channels: 1
Element size: 8 bytes
Begin RX rate test at 0.0390625 Msps
Starting stream loop, press Ctrl+C to exit...
[WARNING] Set thread priority 0.5 failed: Operation not permitted
0.0390631 Msps 0.312505 Bps
0.0390621 Msps 0.312497 Bps
0.039062 Msps 0.312496 Bps
0.0390618 Msps 0.312495 Bps
0.0390617 Msps 0.312494 Bps
0.0390615 Msps 0.312492 Bps
[ERROR] SoapyLogAcceptor::handlerLoop() FAIL: SoapyRPCUnpacker::recv() TIMEOUT:
0.0390615 Msps 0.312492 Bps
0.0390614 Msps 0.312491 Bps
0.0390626 Msps 0.312501 Bps
0.0390624 Msps 0.3125 Bps
0.0390623 Msps 0.312499 Bps
0.0390622 Msps 0.312498 Bps
0.0390622 Msps 0.312497 Bps
0.0390621 Msps 0.312497 Bps
Ctrl+C pressed here.
the remote 'SoapySDRServer --bind' dies.

@guruofquality
Copy link
Contributor

Thanks, that pointed me in the right direction. the log acceptor also needed to recv() forever without a timeout. The timeout was only meant for the control packets and not the listener loops which can be hung up on. I made a change on the timeout_improved branch. Didnt test it yet, but its straightforward.

@coburnw
Copy link

coburnw commented Jul 6, 2017

No errors so far. Very Nice.
Thank you.

@guruofquality
Copy link
Contributor

Anyways sorry for the bugs, thanks for testing. I will try to get this tagged as well since its a pretty annoying bug to have in a release. @rkb68 let me know if the current master branch resolves your opensuse issues. Perhaps it was also the timeout bug in the logger.

If not I can make the suggestion that the connection timeout of 50 ms could be too strict in some cases. So I offer the following patch to simply increase it and see what happens:

diff --git a/common/SoapyRemoteDefs.hpp b/common/SoapyRemoteDefs.hpp
index d422b5b..368ae71 100644
--- a/common/SoapyRemoteDefs.hpp
+++ b/common/SoapyRemoteDefs.hpp
@@ -67,7 +67,7 @@
 #define SOAPY_REMOTE_DEFAULT_SERVICE "55132"
 
 //! Use this timeout for every socket poll loop
-#define SOAPY_REMOTE_SOCKET_TIMEOUT_US (50*1000) //50 ms
+#define SOAPY_REMOTE_SOCKET_TIMEOUT_US (500*1000) //500 ms
 
 //! Backlog count for the server socket listen
 #define SOAPY_REMOTE_LISTEN_BACKLOG 100

Thanks!

@rkb68
Copy link
Author

rkb68 commented Jul 10, 2017

Thanks again; when I increased SOAPY_REMOTE_SOCKET_TIMEOUT_US to 500 ms, --probe still did not work, but --find did, though it would fail after the first run. However, after increasing the timeout to a full second, I can now run both --find and --probe with no errors. Perhaps my openSUSE installation has greater latency somewhere? Anyway, thanks for looking into this.

@guruofquality
Copy link
Contributor

@rkb68 1 second is an incredible amount of time for a socket to connect on another host on a local network. Thats a bit worry some all by itself.

Is the openSUSE in a remote location or can you think of some oddity with the network configuration otherwise? Howabout sshing into the openSUSE box, is there a noticeable lag? Sometimes there can be issues with the DNS that cause the client to reverse lookup an IP address and timeout on every lookup -- something else to double check anyway.

I'm thinking of making the timeout configurable with an input argument or environment variable just to be safe and future proof things. But there is probably something else going on here.

@rkb68
Copy link
Author

rkb68 commented Jul 13, 2017

I was able to back off the timeout to 700 ms, but yes, that is still a very long time. I installed openSUSE on the RPi3 using their card image, did not tweak any settings, merely enabled wifi with DHCP.

And yes, there is often (not always) a lag when ssh'ing in from elsewhere on the network (which does not happen with the Raspian image).

@guruofquality
Copy link
Contributor

I added a remote timeout option but I forgot to mention it. Remote timeout it should hit all of the timeout values on the client that were previously using the default. I think this should not be needed usually, but in case its useful: https://github.com/pothosware/SoapyRemote/wiki#the-remote-timeout-key

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants