Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rospy publisher hanging when subscriber drops connection (e.g. because of wifi) #169

Closed
advaitjain opened this issue Feb 7, 2013 · 13 comments
Assignees

Comments

@advaitjain
Copy link

I'm submitting an issue for this question on ROS answers:
http://answers.ros.org/question/29387/rospy-publisher-hanging-when-subscriber-drops-connection-because-of-wifi-is-this-normal/

Summary:
If a rospy publisher is connected to a subscriber on a different computer and you disconnect the ethernet cable, the publisher hangs after about a minute.

Issue can be replicated with the rospy talker node running on a different computer than the listener node.

The cause might be here:
https://github.com/ros/ros_comm/blob/groovy-devel/clients/rospy/src/rospy/impl/tcpros_base.py#L621

@forrestv
Copy link

I ran into the same problem. It seems that rospy's use of blocking writes to sockets is indeed the cause.

@mikeferguson
Copy link
Contributor

@tfoote I'm going to bring this back up -- as hydro tf/tf2 will use pure python publishers and this is going to cause a major change in how tf publishers operate over wifi/unreliable links compared to Groovy and earlier.

As can be seen in https://github.com/ros/ros_comm/blob/groovy-devel/clients/rospy/src/rospy/impl/tcpros_base.py#L620, Ken never finished implementing async or retry logic, which means that if you have say, a remote RVIZ connected to a python node publishing tf, you will now have the python node hang if the wifi link drops out. You might recall the problems we had with a certain robot when python nodes started randomly hanging (in that case, it was an /odom topic).

@flixr
Copy link
Contributor

flixr commented Oct 15, 2013

This is indeed a big problem... is anyone working on this?
I will probably need to wait until this is fixed before I can switch to hydro.

@dirk-thomas
Copy link
Member

The actually issue has been in ROS since forever. rospy exposes a synchronous blocking publisher API. The problem is that addressing this issue requires asynchronous IO operations. In order to not block the publisher when one subscriber is not able to receive the information (drop-out of wireless, reading the data too slow) each message must be queued per connections and being delivered to the socket from a separate thread (as it is done in roscpp)

This can not be done easily since there is no good way to keep the current API for publishing. E.g. signaling error conditions can not be done via exceptions since the error will happen asynchronously.

May be the change @mikeferguson mentioned which he states to be new in "hydro tf/tf2" can be addressed within tf itself. @tfoote any ideas on that?

@tfoote
Copy link
Member

tfoote commented Oct 29, 2013

@mikeferguson All python processes in tf were already using python based publishers. tf has always used native client communications. tf vs tf2 Was there more that you thought would exacerbate this issue?

@mikeferguson
Copy link
Contributor

From the migration guide (http://wiki.ros.org/hydro/Migration):

tf2 continues to use the C++ core library, but with the isolation of the BufferCore class only the very core algorithm is the wrapped C++ class. All communications for the python bindings are done in pure python avoiding many of the corner cases where pytf would have issues.

I thought that meant tf had originally used C++ publishers in the wrapper. If not, then disregard the tf/tf2 aspects.

Regardless though, this bug pretty much makes rospy useless over a wifi link, are we really saying it just can't be fixed? I guess I haven't looked at the upper level APIs, but I've looked at the roscpp queues, and thought we could implement something similar in here.

@forrestv
Copy link

This can not be done easily since there is no good way to keep the current API for publishing. E.g. signaling error conditions can not be done via exceptions since the error will happen asynchronously.

I don't think the API creates any constraints - all the documented exceptions could happen immediately when publish() is called, not later when a write fails or something (since a write failing doesn't cause publish() to raise an exception in the caller).

@tfoote
Copy link
Member

tfoote commented Oct 29, 2013

@mikeferguson The critical difference between tf2 and tf is that tf2's core library no longer uses time and all the timeout are implemented in python. That was where pytf had issues when there were timeouts insdie of the wrapped c++ but roscpp was not updating time because it was not initialized.

@dirk-thomas and I have spent quite a while on the whiteboard making sure that we can do this without significantly changing existing behavior. We're closing in on a solution that I think will work.

@mikeferguson
Copy link
Contributor

@tfoote thanks for the explanation, and thanks for taking a look at this!

@dirk-thomas
Copy link
Member

Please take a look at #308 and provide feedback. Would someone try to update tf2 to use this new option and verify that it fixes the describe problem adequately?

@tfoote
Copy link
Member

tfoote commented Oct 31, 2013

Before closing this after releasing #308 we should collect a list of major publishers to be patched. So far I have

  • tf
  • tf2
  • rospy tutorials (wiki)
  • rospy tutorials (repo)

dirk-thomas added a commit that referenced this issue Nov 18, 2013
implement optional queueing for rospy publications (#169)
dirk-thomas added a commit to ros/ros_tutorials that referenced this issue May 2, 2014
tfoote added a commit to ros/geometry_tutorials that referenced this issue May 2, 2014
tfoote added a commit to ros/geometry_tutorials that referenced this issue May 2, 2014
tfoote added a commit to ros/geometry_tutorials that referenced this issue May 2, 2014
tfoote added a commit to ros/geometry_tutorials that referenced this issue May 2, 2014
tfoote added a commit to ros/geometry that referenced this issue May 2, 2014
tfoote added a commit to ros/geometry that referenced this issue May 2, 2014
@tfoote
Copy link
Member

tfoote commented May 2, 2014

all four have been checked.

@tfoote tfoote closed this as completed May 2, 2014
severin-lemaignan pushed a commit to severin-lemaignan/robotpkg that referenced this issue Aug 18, 2014
Changes since 1.9.30:

^^^^^^^^^^^^^^^^^^^^^^^^
Changelog for package tf
^^^^^^^^^^^^^^^^^^^^^^^^

1.11.3 (2014-05-07)
-------------------
* convert to boost signals2 following `ros/ros_comm#267
<https://github.com/ros/ros_comm/issues/267>`_ Fixes `#23
<https://github.com/ros/geometry/issues/23>`_. Requires
`ros/geometry2#61
<https://github.com/ros/geometry_experimental/issues/61>`_ as well.
* add rospy publisher queue_size argument
  `ros/ros_comm#169 <https://github.com/ros/ros_comm/issues/169>`_
* add queue_size to tf publisher
  `ros/ros_comm#169 <https://github.com/ros/ros_comm/issues/169>`_
* make rostest in CMakeLists optional (`ros/rosdistro#3010
<https://github.com/ros/rosdistro/issues/3010>`_)
* Contributors: Lukas Bulwahn, Tully Foote

1.11.2 (2014-02-25)
-------------------
* fixing test linking
* Contributors: Tully Foote

1.11.1 (2014-02-23)
-------------------

1.11.0 (2014-02-14)
-------------------
* TF uses ros::MessageEvent to get connection information
* Contributors: Kevin Watts, Tully Foote

1.10.8 (2014-02-01)
-------------------
* Port groovy-devel patch to hydro-devel
* Added rosconsole as catkin dependency for catkin_package
* Add rosconsole as runtime dependency
* Contributors: Michael Ferguson, Mirza Shah

1.10.7 (2013-12-27)
-------------------
* fix bug in tf::Matrix3x3::getEulerYPR()
  Fixes a bug in tf::Matrix3x3::getEulerYPR() implementation's handling
  of gimbal lock cases (when the new x axis aligns with the old +/-z
  axis).
* add test that demonstrated bug in tf::Matrix3x3
  tf::Matrix3x3::getEulerYPR() has a bug which returns an incorrect rpy
  for certain corner case inputs.  This test demonstrates that bug.
* Fix const correctness of tf::Vector3 rotate() method
  The method does not modify the class thus should be const.
  This has already been fixed in Bullet itself.
* add automatic tf buffer cleaning on bag loop for python
  This logic was already implemented for c++
  but not for the python module.
* Contributors: Acorn Pooley, Timo Rohling, Tully Foote, v4hn

1.10.6 (2013-08-28)
-------------------
* switching to wrapper scripts which will provide a deprecation warning for `#3
<https://github.com/ros/geometry/issues/3>`_
* add missing roswtf dependency to really export the plugin (fix `#27
<https://github.com/ros/geometry/issues/27>`_)
* Update listener.py
  Fix the tf listener service exception in rospy. See:
  http://answers.ros.org/question/10777/service-exception-using-tf-listener-in-rospy/
* Fix MessageFilter race condition
  If MessageFilter does not explicitly stop its callback timer when it's
  being destroyed, there is a race condition when that timer is processed in
  a callback queue run by a different thread.  Specifically,
  maxRateTimerCallback() may be called after messages_mutex_ has been
  destroyed, causing a unrecoverable error.

1.10.5 (2013-07-19)
-------------------
* tf: export dependency on tf2_ros
  Fixes `#21 <https://github.com/ros/geometry/issues/21>`_
* added run dependency on graphviz
  closes `#19 <https://github.com/ros/geometry/issues/19>`_

1.10.4 (2013-07-11)
-------------------
* fixing erase syntax
* resolving ros/geometry#18 using implementation
added in tf2::BufferCore, adding dependency on next version of tf2 for this

1.10.3 (2013-07-09)
-------------------
* fixing unittest for new resolve syntax

1.10.2 (2013-07-09)
-------------------
* strip leading slashes in resolve, and also any time a method is passed from
tf to tf2 assert the leading slash is stripped as well.  tf::resolve with two
arguments will end up with foo/bar instead of /foo/bar.  Fixes
ros/geometry2#12
* added two whitespaces to make message_filter compile with c++11
  more on this here:
  http://stackoverflow.com/questions/10329942/error-unable-to-find-string-literal-operator-slashes
* using CATKIN_ENABLE_TESTING to optionally configure tests in tf

1.10.1 (2013-07-05)
-------------------
* updating dependency requirement to tf2_ros 0.4.3
* removing unused functions
  removing unused private methods
  removing ``max_extrapolation_distance_``
  removing unused data storage _frameIDs frameIDS_reverse ``frame_authority_``
  removing cache_time from tf, passing through method to tf2 buffer_core
  removing unused variables ``frames_`` and ``frame_mutex_`` and
  ``interpolating_``
  removing unused mutex and transformchanged signaling
  commenting on deprecation of MAX_EXTRAPOLATION_DISTANCE

1.10.0 (2013-07-05)
-------------------
* adding versioned dependency on recent geometry_experimental changes
* fixing test dependencies
* fixing callbacks for message filters
* remove extra invalid comment
* dedicated thread logic all implemented
* removing commented out code
* mostly completed conversion of tf::TransformListener to use tf2 under the
hood
* lookuptwist working
* tf::Transformer converted to use tf2::Buffer under the hood.  passing
tf_unittest.cpp
* making tf exceptions typedefs of tf2 exceptions for compatability
* first stage of converting Transformer to Buffer
* switching to use tf2's TransformBroadcaster
* adding dependency on tf2_ros to start moving over contents
* fixing unit tests
kshehata pushed a commit to kshehata/geometry that referenced this issue May 11, 2015
kshehata pushed a commit to kshehata/geometry that referenced this issue May 11, 2015
rsinnet pushed a commit to MisoRobotics/ros_comm that referenced this issue Jun 19, 2017
implement optional queueing for rospy publications (ros#169)
nuclearsandwich pushed a commit to nuclearsandwich/ros_canary that referenced this issue Apr 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants