Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add execution thread mutex. #1709

Merged
merged 4 commits into from
Nov 8, 2019
Merged

Conversation

livanov93
Copy link
Contributor

@livanov93 livanov93 commented Nov 4, 2019

Description

#1481
My project was in huge need of canceling trajectory execution in quite often manner, over direct message via "stop" event on "/trajectory_execution_event" topic, or via move group interface stop() method. I added one more mutex for protecting execution_thread_ variable and it worked. Definitely it is not the optimal and the most elegant solution, but is quite effective. Long simulation period confirmed it has no memory leaks.

Checklist

  • Required by CI: Code is auto formatted using clang-format
  • Extend the tutorials / documentation reference
  • Document API changes relevant to the user in the MIGRATION.md notes
  • Create tests, which fail without this PR reference
  • Include a screenshot if changing a GUI
  • While waiting for someone to review your request, please help review another open pull request to support the maintainers

@welcome
Copy link

welcome bot commented Nov 4, 2019

Thanks for helping in improving MoveIt

@mamoll mamoll self-requested a review November 4, 2019 15:26
Copy link
Contributor

@mamoll mamoll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. 👍

@v4hn
Copy link
Contributor

v4hn commented Nov 4, 2019

Seems like a workaround to me. If we merge it, it should still be cleaned up soon.
The whole locking in this class is not well-structured...

Looking into it right now.

Copy link
Contributor

@rhaschke rhaschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this contribution - I didn't find time to fix #1481 yet.
There are some minor points to improve from my point of view.
@v4hn: Why do you consider this solution only a workaround?

@@ -1182,8 +1187,13 @@ void TrajectoryExecutionManager::stopExecution(bool auto_clear)
else if (execution_thread_) // just in case we have some thread waiting to be joined from some point in the past, we
// join it now
{
execution_thread_->join();
execution_thread_.reset();
execution_thread_mutex_.lock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock comes too late here: a few lines before, execution_thread_ is already accessed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could remove this else if with just else statement. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an instance of double-checked locking.
It might not be necessary here, but just removing it because the read is outside the mutex is no good argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could not make any harm, with else statement and additional check, the same thing is achieved. Isn't it? Fix me if I am wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please look up the concept.
Double-checked locking is used when the time used to lock the mutex is relevant and the mutex is usually not needed (the latter clearly being the case here).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to keep the double-check locking here, one would need to add another if (execution_thread_) to verify that the pointer is still valid after acquiring the execution_thread_mutex_.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhaschke Check if this commit is ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@livanov93: I pushed a cleanup commit, to restore the double-checked locking.

@@ -1170,8 +1170,13 @@ void TrajectoryExecutionManager::stopExecution(bool auto_clear)
ROS_INFO_NAMED(name_, "Stopped trajectory execution.");

// wait for the execution thread to finish
execution_thread_->join();
execution_thread_.reset();
execution_thread_mutex_.lock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a scoped_lock to ensure automatic unlocking of the mutex when leaving the context instead of manual locking and unlocking.

@v4hn
Copy link
Contributor

v4hn commented Nov 5, 2019

Why do you consider this solution only a workaround?

Because it's not actually well defined what the existing mutex protects and adding a new one without making sure it's not a bug in the existing logging does not make things any more consistent.
I'm working my way through the class right now, also improving some documentation and updating to C++11.

@livanov93
Copy link
Contributor Author

livanov93 commented Nov 5, 2019

Why do you consider this solution only a workaround?

Because it's not actually well defined what the existing mutex protects and adding a new one without making sure it's not a bug in the existing logging does not make things any more consistent.
I'm working my way through the class right now, also improving some documentation and updating to C++11.

Basically, this mutex was implemented to avoid calling execution_thread_->join() after the other thread calls execution_thread_.reset(). That was the problem at the beginning. Whole class refactoring was not goal for this PR.

@v4hn
Copy link
Contributor

v4hn commented Nov 5, 2019

That was the problem at the beginning. Whole class refactoring was not goal for this PR.

I know and thank you for providing a fix!
If two other maintainers agree, this can be merged and I might change it again in a bit, if needed.

The problem is that MoveIt (including the TrajectoryExecutionManager) is full of such local fixes and this makes maintainance much harder over time.

@livanov93
Copy link
Contributor Author

That was the problem at the beginning. Whole class refactoring was not goal for this PR.

I know and thank you for providing a fix!
If two other maintainers agree, this can be merged and I might change it again in a bit, if needed.

The problem is that MoveIt (including the TrajectoryExecutionManager) is full of such local fixes and this makes maintainance much harder over time.

I totally understand you and appreciate all the contributions you have done. People usually fix what they need and they move forward. This, I imagine, is a huge problem for official maintainers.

@rhaschke
Copy link
Contributor

rhaschke commented Nov 5, 2019

The problem is that MoveIt (including the TrajectoryExecutionManager) is full of such local fixes and this makes maintainance much harder over time.

@v4hn, I fully understand your reasoning. If you have found a better solution, please file a PR.

@rhaschke
Copy link
Contributor

rhaschke commented Nov 6, 2019

@livanov93, could you please have a look (and validate) the alternative fix filed in #1712?
This alternative solution just reuses the existing execution_state_mutex_.
Also, it would be great if you could provide a unit test to ensure that the correct behavior is retained also in the future.

@livanov93
Copy link
Contributor Author

@livanov93, could you please have a look (and validate) the alternative fix filed in #1712?
This alternative solution just reuses the existing execution_state_mutex_.
Also, it would be great if you could provide a unit test to ensure that the correct behavior is retained also in the future.

I will try to make the test soon for fix comparation, generally #1712 does not do the job for me, it stops after first preemption.

@v4hn
Copy link
Contributor

v4hn commented Nov 7, 2019

Right now I would propose to merge this fix into kinetic-devel, melodic-devel and master to resolve the issue, if @rhaschke and @livanov93 agree.
We probably want it for kinetic and melodic anyway as a minimal fix. I will file my own changes (which are not at all related to this problem until now) later on for master only.

Tests for such issues are still very welcome, of course!

@livanov93
Copy link
Contributor Author

@rhaschke I agree with merging proposal by @v4hn so at least crashing is resolved for now. Tests will be provided in additional PR.

@rhaschke rhaschke merged commit d3b2db3 into moveit:master Nov 8, 2019
@welcome
Copy link

welcome bot commented Nov 8, 2019

Congrats on getting your first MoveIt pull request merged and improving open source robotics!

rhaschke pushed a commit to ubi-agni/moveit that referenced this pull request Nov 8, 2019
Fix race condition accessing execution_thread_ by adding a new mutex.
@livanov93 livanov93 deleted the fix-preempt-crash branch November 9, 2020 19:02
sjahr pushed a commit to sjahr/moveit that referenced this pull request Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants