Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hostname resolution fails using zeroconf #99

Closed
stertingen opened this issue Apr 10, 2019 · 7 comments
Closed

Hostname resolution fails using zeroconf #99

stertingen opened this issue Apr 10, 2019 · 7 comments

Comments

@stertingen
Copy link
Contributor

@stertingen stertingen commented Apr 10, 2019

Problem

Using the zeroconf discovery, SyncThread on local host hostfoo throws errors about being unable to resolve the hostname of a remote host hostbar:

[INFO][rosout]: hostbar is now online
[INFO][rosout]: SyncThread[hostfoo] Requesting remote state from 'http://hostbar:11911/'
[ERROR][rosout]: SyncThread[hostfoo] ERROR: Traceback (most recent call last):
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/master_sync_fkie/sync_thread.py", line 264, in _request_remote_state
    remote_state = remote_monitor.masterInfo()
  File "/usr/lib/python2.7/xmlrpclib.py", line 1243, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1602, in __request
    verbose=self.__verbose
  File "/usr/lib/python2.7/xmlrpclib.py", line 1283, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1311, in single_request
    self.send_content(h, request_body)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1459, in send_content
    connection.endheaders(request_body)
  File "/usr/lib/python2.7/httplib.py", line 1053, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 897, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 859, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 836, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 557, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
gaierror: [Errno -2] Name or service not known

Notes:

  • There is no network-wide name resolution, I want to rely on mdns only.
  • It's an IPv6-only network

Debugging

My /etc/nsswitch.conf (relevant line):

hosts:          files mdns [NOTFOUND=return] dns

The parameters for the zeroconf node are default; the zeroconf nodes discover each other perfectly.

Manual hostname resolution fails with:

host hostbar
host hostbar.local
avahi-resolve -n hostbar
getent hosts hostbar

Manual hostname resolution works with:

avahi-resolve -n hostbar.local
getent hosts hostbar.local

I'm not sure, it might be a better idea to use hostbar.local instead?

@atiderko

This comment has been minimized.

Copy link
Member

@atiderko atiderko commented Apr 10, 2019

I used socket.gethostname() for hostname detection.

I changed it now and use the hostname from ros masteruri.
Please, try again!

@stertingen

This comment has been minimized.

Copy link
Contributor Author

@stertingen stertingen commented Apr 10, 2019

Nope, does not help; still the same error.

@stertingen

This comment has been minimized.

Copy link
Contributor Author

@stertingen stertingen commented Apr 10, 2019

The output from avahi-browse -a -t -r is interesting:

+ enp0s3 IPv6 hostbar                            _ros-master._tcp     local
= enp0s3 IPv6 hostbar                            _ros-master._tcp     local
   hostname = [hostbar.local]
   address = [fd78:9d42:7594::56f]
   port = [11311]
   txt = ["network_id=0" "rpcuri=http://hostbar:11911" "zname=/master_discovery" "master_uri=http://hostbar:11311/" "timestamp_local=1554903303.730091095" "timestamp=1554903303.730091095"]

While the hostname is correcty set to hostbar.local, the values master_uri and rpcuri in the txt array are propably not.

Interesting: setting ROS_MASTER_URI=http://hostbar.local:11311 explicitly on hostbar (analog on hostfoo) does not help either.

@stertingen

This comment has been minimized.

Copy link
Contributor Author

@stertingen stertingen commented Apr 10, 2019

This might be a problem with the ROS Master API (http://wiki.ros.org/ROS/Master_API).

The discovery node calls getUri(), which returns the local hostname without the .local suffix.

I'm not sure whether this is intended behavior; if it is, the discovery node might have to do a workaround for this.

@atiderko

This comment has been minimized.

Copy link
Member

@atiderko atiderko commented Apr 11, 2019

I added a fqdn parameter. You have to set this parameter to true.

rosrun master_discovery_fkie zeroconf _fqdn:=true

Regarding your post, I'm afraid that the ROS-topic communication will not work anyway. But you can try it out!

@stertingen

This comment has been minimized.

Copy link
Contributor Author

@stertingen stertingen commented Apr 11, 2019

Thank you very much!

Well, the discovery works fine now, but since the nodes itself do not use the FQDN, I have to set ROS_HOSTNAME=$(hostname -f) explicitly (see ros/ros_comm#138).

Since I'm bound to an IPv6-only network, there were still a few things missing:

For better IPv6 compatibility it would be nice if the MasterMonitor was initialized in IPv6 mode

  • either depending on the address returned by hostname resolution of the RPC server to be created
  • or by reading the ROS_IPV6 environment variable (which is undocumented, might be removed or changed in the future)

I've got a setung running now with:

  • two hosts in an IPv6-only network
  • Zeroconf discovery
  • Service calls and Topics working
  • Adjustments to /etc/hosts (localhost name resolution) and environment
@stertingen

This comment has been minimized.

Copy link
Contributor Author

@stertingen stertingen commented Apr 11, 2019

atiderko added a commit that referenced this issue Apr 11, 2019
@stertingen stertingen closed this Apr 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants
You can’t perform that action at this time.