Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid deadlock in ConnectionManager::Stop #2950

Conversation

ivanpauno
Copy link
Contributor

@ivanpauno ivanpauno commented Mar 18, 2021

Similarly to #2949, it fixes another condition where gzserver hangs when shutting down.

The main thread was interrepted by sigint and is waiting for ConnectionManager::Stop() to finish:

Thread 1 (Thread 0x7f8c8c45a3c0 (LWP 12947)):
#0  0x00007f8c9edc43bf in __GI___clock_nanosleep (clock_id=clock_id@entry=1, flags=flags@entry=0, req=req@entry=0x7ffe7834a060, rem=rem@entry=0x7ffe7834a070) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
#1  0x00007f8c9f3dfadc in gazebo::common::Time::Sleep(gazebo::common::Time const&) (_time=...) at /home/ivanpauno/ros2_ws/other_ws/gazebo/gazebo/common/Time.cc:441
#2  0x00007f8c9f3dfc5c in gazebo::common::Time::MSleep(unsigned int) (_ms=_ms@entry=100) at /home/ivanpauno/ros2_ws/other_ws/gazebo/gazebo/common/Time.cc:459
#3  0x00007f8c9e456a2d in gazebo::transport::ConnectionManager::Stop() (this=<optimized out>) at /home/ivanpauno/ros2_ws/other_ws/gazebo/gazebo/transport/ConnectionManager.cc:248
#4  gazebo::transport::ConnectionManager::Stop() (this=0x7f8c9fb44e60 <SingletonT<gazebo::transport::ConnectionManager>::GetInstance()::t>) at /home/ivanpauno/ros2_ws/other_ws/gazebo/gazebo/transport/ConnectionManager.cc:242
#5  0x00007f8c9faeb2dd in std::function<void ()>::operator()() const (this=<optimized out>) at /usr/include/c++/9/bits/std_function.h:683
#6  gazebo::event::EventT<void ()>::Signal() (this=<optimized out>) at /home/ivanpauno/ros2_ws/other_ws/gazebo/gazebo/common/Event.hh:283
#7  gazebo::event::EventT<void ()>::operator()() (this=<optimized out>) at /home/ivanpauno/ros2_ws/other_ws/gazebo/gazebo/common/Event.hh:123
#8  gazebo::Server::SigInt(int) () at /home/ivanpauno/ros2_ws/other_ws/gazebo/gazebo/Server.cc:571
#9  0x00007f8c9ed2a210 in <signal handler called> () at /lib/x86_64-linux-gnu/libc.so.6
#10 0x00007f8c9f024860 in std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long) () at /lib/x86_64-linux-gnu/libstdc++.so.6
#11 0x00007f8c9f3c2084 in std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) (__s=0x7f8c9f3f598a " ", __out=...) at /usr/include/c++/9/bits/char_traits.h:335
#12 ignition::math::v6::operator<<(std::ostream&, ignition::math::v6::Vector3<double> const&) (_pt=..., _out=...) at /usr/include/ignition/math6/ignition/math/Vector3.hh:726
#13 ignition::math::v6::operator<<(std::ostream&, ignition::math::v6::Pose3<double> const&) (_pose=..., _out=...) at /usr/include/ignition/math6/ignition/math/Pose3.hh:466
#14 sdf::v9::Param::Set<ignition::math::v6::Pose3<double> >(ignition::math::v6::Pose3<double> const&) (this=0x560b70e76000, _value=...) at /usr/include/sdformat-9.5/sdf/Param.hh:277
--Type <RET> for more, q to quit, c to continue without paging--
#15 0x00007f8c9eac6068 in sdf::v9::Element::Set<ignition::math::v6::Pose3<double> >(ignition::math::v6::Pose3<double> const&) (_value=..., this=0x560b70e80230) at /usr/include/c++/9/ext/atomicity.h:82
#16 gazebo::rendering::Visual::SetPosition(ignition::math::v6::Vector3<double> const&) (this=0x560b70e716d0, _pos=...) at /home/ivanpauno/ros2_ws/other_ws/gazebo/gazebo/rendering/Visual.cc:1903
#17 0x00007f8c9eac62b9 in gazebo::rendering::Visual::SetPose(ignition::math::v6::Pose3<double> const&) (this=0x560b70e716d0, _pose=...) at /usr/include/ignition/math6/ignition/math/Pose3.hh:361
#18 0x00007f8c9ea94fc8 in gazebo::rendering::Scene::PreRender() (this=0x560b6e3fef40) at /home/ivanpauno/ros2_ws/other_ws/gazebo/gazebo/rendering/Scene.cc:2077
#19 0x00007f8c9ec7a3f9 in std::function<void ()>::operator()() const (this=<optimized out>) at /usr/include/c++/9/bits/std_function.h:683
#20 gazebo::event::EventT<void ()>::Signal() (this=<optimized out>) at /home/ivanpauno/ros2_ws/other_ws/gazebo/gazebo/common/Event.hh:283
#21 gazebo::event::EventT<void ()>::operator()() (this=<optimized out>) at /home/ivanpauno/ros2_ws/other_ws/gazebo/gazebo/common/Event.hh:123
#22 gazebo::sensors::SensorManager::ImageSensorContainer::Update(bool) (this=0x560b6ec96070, _force=<optimized out>) at /home/ivanpauno/ros2_ws/other_ws/gazebo/gazebo/sensors/SensorManager.cc:980
#23 0x00007f8c9ec7fe5b in gazebo::sensors::SensorManager::Update(bool) (this=0x7f8c9fb45c80 <SingletonT<gazebo::sensors::SensorManager>::GetInstance()::t>, this@entry=0x7f8c9ece0e00 <SingletonT<gazebo::sensors::SensorManager>::GetInstance()::t>, _force=<optimized out>) at /home/ivanpauno/ros2_ws/other_ws/gazebo/gazebo/sensors/SensorManager.cc:370
#24 0x00007f8c9ec717ab in gazebo::sensors::run_once(bool) (_force=_force@entry=false) at /home/ivanpauno/ros2_ws/other_ws/gazebo/gazebo/common/SingletonT.hh:48
#25 0x00007f8c9faecd0f in gazebo::Server::Run() (this=0x560b6e3d8a90) at /home/ivanpauno/ros2_ws/other_ws/gazebo/gazebo/Server.cc:666

That will never happen, because gazebo::rendering::Scene::PreRender() was interrupted while taking this mutex and the connection manager thread got stuck in the same mutex:

Thread 22 (Thread 0x7f8c37fff700 (LWP 13042)):                                                                                                                                                              
#0  __lll_lock_wait (futex=futex@entry=0x560b6ed0c298, private=0) at lowlevellock.c:52                                                                                                                      
#1  0x00007f8c9e8c6131 in __GI___pthread_mutex_lock (mutex=0x560b6ed0c298) at ../nptl/pthread_mutex_lock.c:115                                                                                              
#2  0x00007f8c9ea8ea6f in __gthread_mutex_lock (__mutex=0x560b6ed0c298) at /usr/include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:749                                                                      
#3  __gthread_recursive_mutex_lock (__mutex=0x560b6ed0c298) at /usr/include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:811                                                                                  
#4  std::recursive_mutex::lock() (this=0x560b6ed0c298) at /usr/include/c++/9/mutex:106                                                                                                                      
#5  std::lock_guard<std::recursive_mutex>::lock_guard(std::recursive_mutex&) (__m=..., this=<synthetic pointer>) at /usr/include/c++/9/bits/std_mutex.h:159                                                 
#6  gazebo::rendering::Scene::OnPoseMsg(boost::shared_ptr<gazebo::msgs::PosesStamped const> const&) (this=0x560b6e3fef40, _msg=...) at /home/ivanpauno/ros2_ws/other_ws/gazebo/gazebo/rendering/Scene.cc:2870
#7  0x00007f8c9ea9bff2 in boost::function1<void, boost::shared_ptr<gazebo::msgs::PosesStamped const> const&>::operator()(boost::shared_ptr<gazebo::msgs::PosesStamped const> const&) const (a0=..., this=0x5
60b6f124ba0) at /usr/include/boost/function/function_template.hpp:677                                                                                                                                       
#8  gazebo::transport::CallbackHelperT<gazebo::msgs::PosesStamped>::HandleMessage(boost::shared_ptr<google::protobuf::Message>) (this=0x560b6f124b60, _newMsg=...) at /home/ivanpauno/ros2_ws/other_ws/gazeb
o/gazebo/transport/CallbackHelper.hh:159                                                                                                                                                                    
#9  0x00007f8c9e464408 in gazebo::transport::Node::ProcessIncoming() (this=0x560b6efbefa0) at /usr/include/c++/9/bits/atomic_base.h:539                                                                     
#10 0x00007f8c9e47079f in gazebo::transport::TopicManager::ProcessNodes(bool) (this=0x7f8c9fb44d20 <SingletonT<gazebo::transport::TopicManager>::GetInstance()::t>, _onlyOut=_onlyOut@entry=false) at /usr/i
nclude/boost/smart_ptr/shared_ptr.hpp:732                                                                                                                                                                   #11 0x00007f8c9e4588d2 in gazebo::transport::ConnectionManager::RunUpdate() (this=0x7f8c9fb44e60 <SingletonT<gazebo::transport::ConnectionManager>::GetInstance()::t>) at /home/ivanpauno/ros2_ws/other_ws/g
azebo/gazebo/common/SingletonT.hh:36                                                                                                                                                                        #12 0x00007f8c9e458b88 in gazebo::transport::ConnectionManager::Run() (this=0x7f8c9fb44e60 <SingletonT<gazebo::transport::ConnectionManager>::GetInstance()::t>) at /home/ivanpauno/ros2_ws/other_ws/gazebo/
gazebo/transport/ConnectionManager.cc:314                                                                                                                                                                   
#13 0x00007f8c9d60443b in  () at /lib/x86_64-linux-gnu/libboost_thread.so.1.71.0                                                                                                                            
#14 0x00007f8c9e8c3609 in start_thread (arg=<optimized out>) at pthread_create.c:477                                                                                                                        
#15 0x00007f8c9ee06293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

i.e. this while loop:

https://github.com/osrf/gazebo/blob/2df3bfb039c1f5616122a0452ece99a8f3395e17/gazebo/transport/ConnectionManager.cc#L308-L316

will get stuck in L310 and never reach L316.


The solution is that Stop() will not wait until the background thread exits the loops, but just signal its ending.
I also deleted the condition variable notify_all() call, as that's not signal safe.
There's only a timed wait, so notifying or not shouldn't matter.

Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
Copy link

@jacobperron jacobperron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this together with #2949 and it haven't seen any lingering gzserver process after shutdown.

@ivanpauno
Copy link
Contributor Author

@jacobperron if you have write permissions, please go ahead and merge.

(the PR checkers failures doesn't seem related to me, but IDK)

@scpeters
Copy link
Member

scpeters commented Apr 7, 2021

I was having trouble reproducing this reliably, but with the following patch it happen every time (lol)

diff --git a/gazebo/rendering/Scene.cc b/gazebo/rendering/Scene.cc
index f6900d694f..3e3e15f472 100644
--- a/gazebo/rendering/Scene.cc
+++ b/gazebo/rendering/Scene.cc
@@ -2056,6 +2056,7 @@ void Scene::PreRender()
 
   {
     std::lock_guard<std::recursive_mutex> lock(this->dataPtr->poseMsgMutex);
+    common::Time::MSleep(100);
 
     // Process all the model messages last. Remove pose message from the list
     // only when a corresponding visual exits. We may receive pose updates

@scpeters
Copy link
Member

scpeters commented Apr 7, 2021

I merged this branch with gazebo11 so we can see one more round of CI results

@scpeters scpeters merged commit 6595e68 into gazebosim:gazebo11 Apr 7, 2021
scpeters added a commit to scpeters/gazebo that referenced this pull request Apr 14, 2021
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>

Co-authored-by: Steve Peters <scpeters@openrobotics.org>
scpeters added a commit to scpeters/gazebo that referenced this pull request Apr 16, 2021
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>

Co-authored-by: Steve Peters <scpeters@openrobotics.org>
scpeters added a commit that referenced this pull request Apr 21, 2021
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>

Co-authored-by: Steve Peters <scpeters@openrobotics.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants