Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pynetworktables client can't connect to networktables server on Windows 10 "localhost" #116

Closed
chauser opened this issue Jan 18, 2021 · 4 comments

Comments

@chauser
Copy link
Contributor

chauser commented Jan 18, 2021

Background: I'm attempting to learn about frc-characterization using a Romi robot; the robot code runs in desktop debugging mode so the related networktables server is also running on the desktop. The frc-characterization logging component is python code that uses pynetworktables. When the team number in the logging tool is set to 0, the tool does NetworkTables.initialize(server="localhost") to connect to the server. When running on my Windows 10 machines this never succeeds.

Here's what's happening. NetworkTables.initialize(server="localhost") eventually turns into a call to python's socket.create_connection(("localhost", 1735), timeout=self.timeout) in tcp_connector.py. create_connection tries both the IPv6 and IPv4 resolutions of "localhost" in that order on Windows 10 if IPv6 is enabled. Of course there is no server running on IPv6 so that connection is refused. For reasons I don't understand, instead of immediately reporting a "connection refused" failure to the application, Windows chooses to wait for the timeout period to expire and then reports a "timeout" failure. It then proceeds to try the IPv4 address and that succeeds. However, the created connection is never seen by the pynetworktables code. Why?

Well, two things: first, TcpConnector.connect is called not with ("localhost", 1735) which would use the straightforward, in-thread, code for a single server, but rather with [("localhost",1735)] which uses the more complex multi-threaded code that tries to connect in parallel to multiple servers, accepting the result of whichever one finishes first. Second, if none of the threads succeed in the timeout period as monitored by a call to self.cond.wait(self.timeout) in the parent thread, this code returns None because none of the child threads has stored anything different.

The problem, thus, is that the child thread that calls create_connection succeeds (when it tries to connect using IPv4 after the IPv6 connection times out), but only after the parent thread has timed out and moved on.

I can see a number of possible fixes: simply setting the cond.wait timeout in the parent thread to 2*self.timeout will probably prevent it from happening. Calling connect with with ("localhost", 1735) rather than [("localhost",1735)] would work. Calling with "127.0.0.1" rather than "localhost" would work.

I haven't tried it on Linux of MacOS but I suspect that it would not be a problem there, even if they try IPv6 first, given that they immediately report ECONNREFUSED rather than waiting for a timeout.

Finally, three more points:

  • I believe this is also at the root of issue issue running as server 'read error in handshake: end of file' #84. I tried the scenario described there of using a pynetworktables server and pynetworktables client on Windows 10 and saw the described behavior and Wireshark traces similar to what prompted the analysis here.
  • I am concerned that the code in tcp_connector.py is also failure-prone when passed a multi-item list as server_or_servers. If I understand the code correctly, the first child thread to complete notifys self.cond. If that thread failed to create a connection, for example, receiving ECONNREFUSED then the result from any later-completing thread that succeeds will never be used.
  • The behavior of socket.create_connection that I've described can be verified in an interactive python session while simultaneously running Wireshark.
@virtuald
Copy link
Member

Have you tried one of the simple examples to see if they have the same behavior? For example, see https://github.com/robotpy/pynetworktables/blob/main/samples/nt_driverstation.py

Definitely would be happy to accept a PR with a fix. Unfortunately, I still don't actively use windows so this is pretty difficult to diagnose for me.

@virtuald
Copy link
Member

Setting the timeout to 2x would probably be the least risky change. Try it and let us know if that solves it?

@chauser
Copy link
Contributor Author

chauser commented Jan 18, 2021

Yes I have done the doubled timeout test and it works. I have also called the NetworkTables.initialize(server="127.0.0.1") from interactive python and that works. But the doubled timeout is easy and will help enormously for Windows users.

@TheTripleV
Copy link
Member

closed by #117

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants