You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Background: I'm attempting to learn about frc-characterization using a Romi robot; the robot code runs in desktop debugging mode so the related networktables server is also running on the desktop. The frc-characterization logging component is python code that uses pynetworktables. When the team number in the logging tool is set to 0, the tool does NetworkTables.initialize(server="localhost") to connect to the server. When running on my Windows 10 machines this never succeeds.
Here's what's happening. NetworkTables.initialize(server="localhost") eventually turns into a call to python's socket.create_connection(("localhost", 1735), timeout=self.timeout) in tcp_connector.py. create_connection tries both the IPv6 and IPv4 resolutions of "localhost" in that order on Windows 10 if IPv6 is enabled. Of course there is no server running on IPv6 so that connection is refused. For reasons I don't understand, instead of immediately reporting a "connection refused" failure to the application, Windows chooses to wait for the timeout period to expire and then reports a "timeout" failure. It then proceeds to try the IPv4 address and that succeeds. However, the created connection is never seen by the pynetworktables code. Why?
Well, two things: first, TcpConnector.connect is called not with ("localhost", 1735) which would use the straightforward, in-thread, code for a single server, but rather with [("localhost",1735)] which uses the more complex multi-threaded code that tries to connect in parallel to multiple servers, accepting the result of whichever one finishes first. Second, if none of the threads succeed in the timeout period as monitored by a call to self.cond.wait(self.timeout) in the parent thread, this code returns None because none of the child threads has stored anything different.
The problem, thus, is that the child thread that calls create_connection succeeds (when it tries to connect using IPv4 after the IPv6 connection times out), but only after the parent thread has timed out and moved on.
I can see a number of possible fixes: simply setting the cond.wait timeout in the parent thread to 2*self.timeout will probably prevent it from happening. Calling connect with with ("localhost", 1735) rather than [("localhost",1735)] would work. Calling with "127.0.0.1" rather than "localhost" would work.
I haven't tried it on Linux of MacOS but I suspect that it would not be a problem there, even if they try IPv6 first, given that they immediately report ECONNREFUSED rather than waiting for a timeout.
Finally, three more points:
I believe this is also at the root of issue issue running as server 'read error in handshake: end of file' #84. I tried the scenario described there of using a pynetworktables server and pynetworktables client on Windows 10 and saw the described behavior and Wireshark traces similar to what prompted the analysis here.
I am concerned that the code in tcp_connector.py is also failure-prone when passed a multi-item list as server_or_servers. If I understand the code correctly, the first child thread to complete notifys self.cond. If that thread failed to create a connection, for example, receiving ECONNREFUSED then the result from any later-completing thread that succeeds will never be used.
The behavior of socket.create_connection that I've described can be verified in an interactive python session while simultaneously running Wireshark.
The text was updated successfully, but these errors were encountered:
Yes I have done the doubled timeout test and it works. I have also called the NetworkTables.initialize(server="127.0.0.1") from interactive python and that works. But the doubled timeout is easy and will help enormously for Windows users.
Background: I'm attempting to learn about frc-characterization using a Romi robot; the robot code runs in desktop debugging mode so the related networktables server is also running on the desktop. The frc-characterization logging component is python code that uses pynetworktables. When the team number in the logging tool is set to 0, the tool does
NetworkTables.initialize(server="localhost")
to connect to the server. When running on my Windows 10 machines this never succeeds.Here's what's happening.
NetworkTables.initialize(server="localhost")
eventually turns into a call to python'ssocket.create_connection(("localhost", 1735), timeout=self.timeout)
intcp_connector.py
.create_connection
tries both the IPv6 and IPv4 resolutions of "localhost" in that order on Windows 10 if IPv6 is enabled. Of course there is no server running on IPv6 so that connection is refused. For reasons I don't understand, instead of immediately reporting a "connection refused" failure to the application, Windows chooses to wait for the timeout period to expire and then reports a "timeout" failure. It then proceeds to try the IPv4 address and that succeeds. However, the created connection is never seen by the pynetworktables code. Why?Well, two things: first,
TcpConnector.connect
is called not with("localhost", 1735)
which would use the straightforward, in-thread, code for a single server, but rather with[("localhost",1735)]
which uses the more complex multi-threaded code that tries to connect in parallel to multiple servers, accepting the result of whichever one finishes first. Second, if none of the threads succeed in the timeout period as monitored by a call toself.cond.wait(self.timeout)
in the parent thread, this code returnsNone
because none of the child threads has stored anything different.The problem, thus, is that the child thread that calls
create_connection
succeeds (when it tries to connect using IPv4 after the IPv6 connection times out), but only after the parent thread has timed out and moved on.I can see a number of possible fixes: simply setting the
cond.wait
timeout in the parent thread to2*self.timeout
will probably prevent it from happening. Calling connect with with("localhost", 1735)
rather than[("localhost",1735)]
would work. Calling with"127.0.0.1"
rather than"localhost"
would work.I haven't tried it on Linux of MacOS but I suspect that it would not be a problem there, even if they try IPv6 first, given that they immediately report
ECONNREFUSED
rather than waiting for a timeout.Finally, three more points:
tcp_connector.py
is also failure-prone when passed a multi-item list asserver_or_servers
. If I understand the code correctly, the first child thread to completenotify
sself.cond
. If that thread failed to create a connection, for example, receivingECONNREFUSED
then the result from any later-completing thread that succeeds will never be used.socket.create_connection
that I've described can be verified in an interactive python session while simultaneously running Wireshark.The text was updated successfully, but these errors were encountered: