Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unavailable nodes from local node cache _caller_apis #2010

Merged
merged 2 commits into from
Aug 3, 2020
Merged

Remove unavailable nodes from local node cache _caller_apis #2010

merged 2 commits into from
Aug 3, 2020

Conversation

bofetadas
Copy link
Contributor

@bofetadas bofetadas commented Jul 26, 2020

Fixes #1967
Unavailable nodes that are still registered at master but are actually offline due to battery or network connection issues have to be removed from local cache _caller_apis. This avoids buggy behaviour explained in #1967.

How to reproduce
Two nodes will be needed.

  • The first node continuously pings all registered nodes to see if any have died. If a node has died but is still registered, this node unregisters them from master.
    This is necessary because nodes running on a battery powered or mobile device do not have the chance to unregister themselves when their device runs out of battery or exits the wifi network due to too great distance to the router.
  • The second one can be any kind of node (service server/client or publisher/subscriber).

The bug that occurs is explained detailedly in the above mentioned bug.
After the second node has exit and rejoined the ros network, the pinging node will notice that the second node has come online again. But, it will look into its cache and try to ping the old URI of the second node which will yield a node is offline result. In order to avoid this, the cache also needs to be updated once a node is unregistered from master.
Since unregistering nodes eventually results in a call of cleanup_master_blacklist, this behaviour was implemented there.

…ode cache _caller_apis. This prevents buggy behaviour when nodes become unvailable because they exit the ros network due to wifi issues but then reconnect to it later on.
@bofetadas bofetadas changed the title rosnode cleanup removes unavailable nodes from _caller_apis Remove unavailable nodes from local node cache _caller_apis Jul 26, 2020
@dirk-thomas
Copy link
Member

Thanks for the patch.

@dirk-thomas dirk-thomas merged commit c6ad71e into ros:noetic-devel Aug 3, 2020
jacobperron pushed a commit that referenced this pull request Oct 16, 2020
* rosnode_cleanup() now also removes unavailable nodes from the local node cache _caller_apis. This prevents buggy behaviour when nodes become unvailable because they exit the ros network due to wifi issues but then reconnect to it later on.

* Fixed indentation
jacobperron pushed a commit that referenced this pull request Oct 22, 2020
* rosnode_cleanup() now also removes unavailable nodes from the local node cache _caller_apis. This prevents buggy behaviour when nodes become unvailable because they exit the ros network due to wifi issues but then reconnect to it later on.

* Fixed indentation
nim65s added a commit to nim65s/robotpkg that referenced this pull request Mar 12, 2021
Because DEPEND_ABI.ros-comm.noetic?= ros-comm>=1.15

1.15.9 (2020-10-16)
-------------------
* Fix deadlock when service connection is dropped (ros/ros_comm#2074)
* Update maintainers (ros/ros_comm#2075)
* Fix case where accessing cached parameters shuts down another node (ros/ros_comm#2068)
* Fix spelling (ros/ros_comm#2066)
* Fix Lost Wake Bug in ROSOutAppender (ros/ros_comm#2033)
* Fix compatibility issue with boost 1.73 and above (ros/ros_comm#2023)
* Gracefully stop recording upon SIGTERM and SIGINT (ros/ros_comm#2038)
* Use heapq.merge instead of custom merge sort code (ros/ros_comm#2017)
* Fix handling of single quotes in command arguments on Windows (ros/ros_comm#2051)
* Clearer error message (ros/ros_comm#2035)
* Ignore underscores when parsing literal numeric values for Python 3 compatibility (ros/ros_comm#2022)
* Clear cached URI for a node that has gone offline (ros/ros_comm#2010)
* Add skip_cache parameter to rosnode_ping() (ros/ros_comm#2009)
* Install advertisetest (ros/ros_comm#2046)
* Use range instead of xrange for Python 3 compatibility (ros/ros_comm#2013)
* Fix to address CVE-2020-16124 (ros/ros_comm#2065)
* Fix XmlRpcValue::_doubleFormat being unused (ros/ros_comm#2003)

1.15.8 (2020-07-23)
-------------------
* change is_async_connected to use epoll when available (ros/ros_comm#1983)
* allow mixing latched and unlatched publishers (ros/ros_comm#1991)
* remove not existing NodeProxy from rospy __all_\_ (ros/ros_comm#2007)
* fix typo in topics.py (ros/ros_comm#1977)
* fix bad relative import (still Python 2 style) (ros/ros_comm#1973)
* improve shutdown message with duplicate node name (ros/ros_comm#1992)
* remove dependency on rostopic from rostest package (ros/ros_comm#2002)
* fix missing reload() function in Python 3 (ros/ros_comm#1968)
* add latch param to throttle (ros/ros_comm#1944)
* add const versions of XmlRpcValue converting operators (ros/ros_comm#1978)

1.15.7 (2020-05-28)
-------------------
* fix Windows build break (ros/ros_comm#1961)
* fix NameError in launch error handling (ros/ros_comm#1965)

1.15.6 (2020-05-21)
-------------------
* fix a bug that using a destroyed connection object (ros/ros_comm#1950)

1.15.5 (2020-05-15)
-------------------
* check if async socket connect is success or failure before TransportTCP::read() and TransportTCP::write() (ros/ros_comm#1954)
* fix bug that connection drop signal related funtion throw a bad_weak exception (ros/ros_comm#1940)
* multiple latched publishers per process on the same topic (ros/ros_comm#1544)
* fix negative numbers in ros statistics (ros/ros_comm#1531)
* remove extra \n in ROS_DEBUG (ros/ros_comm#1925)
* add option to repeat latched messages at the start of bag splits (ros/ros_comm#1850)
* fix bag migration failures caused by typo in connection_header assignment (ros/ros_comm#1952)
* fix brief description comments after members (ros/ros_comm#1920)
* add --sigint-timeout and --sigterm-timeout parameters (ros/ros_comm#1937)
* roslaunch-check: search dir recursively (ros/ros_comm#1914)
* sort printed nodes by namespace alphabetically (ros/ros_comm#1934)
* remove pycrypto import (not used) (ros/ros_comm#1922)
* avoid infinite recursion in rosrun tab completion when rosbash is not installed (ros/ros_comm#1948)
* fix bare pointer in topic_tools::ShapeShifter (ros/ros_comm#1722)
* clear message queue on simtime jumping back (ros/ros_comm#1518)
* use undefined dynamic_lookup on macOS (ros/ros_comm#1923)
* check if enough FDs are free, instead counting the total free FDs (ros/ros_comm#1929)

1.15.4 (2020-03-19)
-------------------
* restrict boost dependencies to components used (ros/ros_comm#1871)
* add exception for ConnectionAbortedError (ros/ros_comm#1908)
* fix mac trying to use epoll instead of kqueue (ros/ros_comm#1907)
* fix AttributeError: __exit__ (ros/ros_comm#1915, regression from 1.14.4)

1.15.3 (2020-02-28)
-------------------
* remove Boost version check since Noetic only targets platforms with 1.67+ (ros/ros_comm#1903)

1.15.2 (2020-02-25)
-------------------
* export missing Boost dependency (ros/ros_comm#1898)
* add timestamp formatting for rosconsole (ros/ros_comm#1892)

1.15.1 (2020-02-24)
-------------------
* fix missing boost dependencies (ros/ros_comm#1895)
* use setuptools instead of distutils (ros/ros_comm#1870)
* increase time limit of advertisetest/publishtest.test to reduce flakyness (ros/ros_comm#1897)

1.15.0 (2020-02-21)
-------------------
* fix dictionary changed size during iteration (ros/ros_comm#1894)
* update test to pass with old and new yaml (ros/ros_comm#1893)

Packaging changes:
- removed patch-an, as there are no more boost version checks
- updated patch-ao
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug] python rosnode_ping failure after node restart
2 participants