Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gazebo9 - Race condition between EventT::Signal and EventT::Connect leading to segfault during world update #2874

Open
ojasjoshi opened this issue Nov 6, 2020 · 1 comment
Labels
bug Something isn't working common

Comments

@ojasjoshi
Copy link

ojasjoshi commented Nov 6, 2020

Stack trace:

#0  std::__atomic_base<bool>::load (__m=std::memory_order_seq_cst, this=0x0) at /usr/include/c++/5/bits/atomic_base.h:396
#1  std::atomic<bool>::operator bool (this=0x0) at /usr/include/c++/5/atomic:81
#2  gazebo::event::EventT<void (gazebo::common::UpdateInfo const&)>::Signal<gazebo::common::UpdateInfo>(gazebo::common::UpdateInfo const&) (
    this=0x7f60bc054940 <gazebo::event::Events::worldUpdateBegin>, _p=...) at /gazebo/gazebo/common/Event.hh:384
#3  0x00007f60ba9aee59 in gazebo::event::EventT<void (gazebo::common::UpdateInfo const&)>::operator()<gazebo::common::UpdateInfo>(gazebo::common::UpdateInfo const&) (_p=..., this=<optimized out>) at /gazebo/gazebo/common/Event.hh:221
#4  gazebo::physics::World::Update (this=this@entry=0x274a180) at /gazebo/gazebo/physics/World.cc:690
#5  0x00007f60ba9be4df in gazebo::physics::World::Step (this=this@entry=0x274a180) at /gazebo/gazebo/physics/World.cc:621
#6  0x00007f60ba9be955 in gazebo::physics::World::RunLoop (this=0x274a180) at /gazebo/gazebo/physics/World.cc:469
#7  0x00007f60b844c5d5 in ?? ()
   from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.58.0
#8  0x00007f60bace36ba in start_thread (arg=0x7f5fb4bf4700) at pthread_create.c:333
#9  0x00007f60bb2ed41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

A call to EventT::Connect from any subscriber creates a new EventConnection(true, _subscriber) between the event and the subscriber. A call to Event::Signal during worldUpdateBeginLoop then executes callbacks to all the EventConnection's added to this event.

A race condition occurs between EventT::Signal and EventT::Connect when the new EventConnection is not initialized within EventT::Connect but can be referenced by EventT::Signal via this->connections map, leading to a segfault.

Reference from EventT::Connect

this->connections[index].reset(new EventConnection(true, _subscriber));

A potential fix here would be to add a mutex lock guard within EventT::Connect to make it thread-safe. Such a mutex (std::lock_guard<std::mutex> lock(this->mutex)) already exists for EventT::CleanUp but is missing in all other EventT methods. The performance lag occuring from mutex locking EventT::Connect should not be significant considering that a call to Connect occurs at-most once for every subscriber and also considering the design thought behind adding a mutex lock to CleanUp (a call to CleanUp is made during every Signal loop, which should be significantly more than calls to Connect).

@ojasjoshi ojasjoshi changed the title Gazebo9 - Race condition between EventT::Signal and EventT::Connect leading to segfault Gazebo9 - Race condition between EventT::Signal and EventT::Connect leading to segfault during world update Nov 6, 2020
@chapulina
Copy link
Contributor

A potential fix here would be to add a mutex lock guard within EventT::Connect to make it thread-safe.

Thank you for the ticket. I think the suggestion of adding a mutex to protect the connections map sounds reasonable.

The performance lag occuring from mutex locking EventT::Connect should not be significant

If we're very concerned about this, we could use the profiler recently added to compare some common scenario before and after adding the mutex.

@chapulina chapulina added bug Something isn't working common labels Nov 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working common
Projects
None yet
Development

No branches or pull requests

2 participants