-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[nav2_costmap_2d] add the std::unique_lock
before layered_costmap->isCurrent()
#3958
Conversation
Makes sense! |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3958 +/- ##
==========================================
- Coverage 90.35% 90.34% -0.01%
==========================================
Files 417 415 -2
Lines 18516 18469 -47
==========================================
- Hits 16730 16686 -44
+ Misses 1786 1783 -3 ☔ View full report in Codecov by Sentry. |
@GoesM, your PR has failed to build. Please check CI outputs and resolve issues. |
@GoesM, your PR has failed to build. Please check CI outputs and resolve issues. |
@GoesM, your PR has failed to build. Please check CI outputs and resolve issues. |
@GoesM, your PR has failed to build. Please check CI outputs and resolve issues. |
@GoesM, your PR has failed to build. Please check CI outputs and resolve issues. |
@GoesM, your PR has failed to build. Please check CI outputs and resolve issues. |
Goes m patch 1
Looking at the code - there is a lock for |
Looking at the code of planner/controller server, The state of
So they belongs to two different threads. However they two different threads both have the right to access, change and free the same pointer Thus, one thread keep accessing the pointer while another one could reset the same pointer at the same time, which caused the bug. For more details, we're going to describe the conflict from both logical analysis and experimental results, shown in later comments |
code-logic anaylysismutilple threads go horizontally and they all have rights to free and access the same pointer.why it isn't enough to avoid this fault by adding a check (
|
experimental resultsCore Experimental InfoFrom the code coverage results, our code line for NullPtr check has not been covered. It should be because of the limitation of existing test set, difficult to simulate scenarios where external obstacles randomly and frequently move. In our simulation, we attempted to simulate scenarios where external obstacles frequently change and move randomly; Additionally, we have inserted a log prompt in our 'NullPtr check' to prove that it has indeed had a practical effect. //we try it by
bool isCurrent()
{
RCLCPP_INFO(get_logger(), "--------------------in isCurrent()-------------------");
//lock because no ptr-access is allowed until other ptr-free finished
std::unique_lock<Costmap2D::mutex_t> lock(*access_);
if ( layered_costmap_ == nullptr ) {
RCLCPP_INFO(get_logger(), "[!]------nullptr catched after lock()-----------[!]");
return false;// to avoid nullptr accessed
}
return layered_costmap_->isCurrent();
} in
[!][!][!] this log strongly shows that: global_costmap could do Other Experimental Info<1> To check if the problem is caused by "layered_costmap_ is a NullPtr"Firstly, we only add the lock between However, which is tried without a nullptr check, so that we get an asan_report frequently:
!!! focus on address 0x000000000048, which is similar with a Therefore, we executed //we try it by:
bool isCurrent()
{
//insert .reset() to check
layered_costmap_.reset();
return layered_costmap_->isCurrent();
}
//any other code wasn't changed
[!][!][!] this asan_report strongly shows that: <2> To check if the problem is caused by "layered_costmap_ is a NullPtr"very begin of our ISSUE #3940, we provided an asan_report:
also focus on address 0x0000000000a1, which is similar with a similarly, we tried add [!][!][!] it additionally strongly shows that: |
A SUMMARYIt should be one of concurrency vulnerability types , named as atomicity violation Simply put, the cause of the entire bug is:Although During the execution of Thus, the release of the relevant pointer was performed, resulting in a segment error (SEGV). The following two conditions create prerequisites for such bugs
Our suggestion for this type of bugTo fix this type of bug, our suggested method is to use Of course, you can also adjust the permissions of each thread to achieve repairs, but this may affect the construction of the entire nav2 code framework, which is worried by us. by the way, it seems that this bug would occur in crowded and fast-riding scenrios randomly, so may be meaningful for nav2-program ^_^ Hoping that our description is clear to be understood ~ : ) |
@@ -360,6 +363,8 @@ Costmap2DROS::on_cleanup(const rclcpp_lifecycle::State & /*state*/) | |||
|
|||
layer_publishers_.clear(); | |||
|
|||
//lock because no ptr-access is allowed until this ptr-free finished | |||
std::unique_lock<Costmap2D::mutex_t> lock(*access_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anywhere else we should be locking the layered_costmap_ as well?
Incorrect, the I don't see how this is possible, I'm not saying that you're wrong, but that analysis is incorrect and I want to understand exactly what is happening before taking action - since this could potentially not be the right action or not fully resolve the issue If the server was attempted to do when either action server was deactivated and the goal was still processing: It is technically possible for https://github.com/ros-planning/navigation2/blob/main/nav2_planner/src/planner_server.cpp#L243-L258 In which case, this would only happen on the system stopping due to a control + C or other user-requested event to stop the program - not just randomly during execution. Is this only a shutdown issue or something mid-execution? |
Co-authored-by: Steve Macenski <stevenmacenski@gmail.com>
uh so that's what you mean! Thanks a lot for your reminder, I confused the two different error situations in my previous description and understanding. Situation 1: it caused that Situation 2: Do it caused that For the situation 1, it occurs during the normal execution of the program For the situation 2, it occurs after node's pre-shutdown We got mixed up roles of 'clear_ costmap_ service ' and ' on_cleanup ', so mistakenly thought that our solution could solve both situations at the same time. Now, it seems that our current method only solves situation 2. Thus, the error in situation 1 should still be triggered in our subsequent testing. If we have any new results, we will proactively share them with you. This PR/ISSUE can be considered as a solution for users' pre-shutdown . If there's something new happens with situation 1 again, we'd create a new ISSUE or PR to share it. |
If we have a situation that is mostly due to shutdown mechanics, I'd rather modify the shutdown mechanics than introduce a new mutex. For instance, reorder the shutdown mechanics so that the situation that is happening now is impossible by destroying the objects, resetting them, or adding another |
we like not to do so to aid only in shutdown mechanics, as well. however that has been the only successful method in our local test... SO we come up with a completely new way to fix the bug, discard creating a new mutex, and only add double check as following: //in file `planer_server.cpp`
nav2_util::CallbackReturn
PlannerServer::on_cleanup(const rclcpp_lifecycle::State & /*state*/)
{
RCLCPP_INFO(get_logger(), "Cleaning up");
//double check whether action_server is running..+
if(action_server_pose_->is_running()){
action_server_pose_->deactivate();
}
action_server_pose_.reset();
//double check as well
if(action_server_poses_->is_running()){
action_server_poses_->deactivate();
}
action_server_poses_.reset();
plan_publisher_.reset();
tf_.reset();
...
...
... Furthermore, we obtained some new experimental info, which may be helpful for you to understand why we do so
it seems that because by the code, BUT, following its real behavior, I guess there's some other mechanics that would make As a result, we try to solve it by add a double check in thus,
Hoping our experimental info helpful for you. as we changed it too many times, if you like the new solution, we'd like to pull it in a new PR and close this one. : ) or if you need the method to |
I've changed my code following your "changes requested", while it seems the CI failed in places we didn't change |
Yeah, not your fault, that's mine #3969. The linter profiles changed recently and that PR was still using the old linting profile in CI so it didn't catch before.
That does not seem possible to me. Specifically, how are you bringing down the system? Are you properly calling deactivate -> cleanup -> shutdown or just going directly there? Are you using our lifecycle manager tool?
This is incorrect. Yes, that would make sense to close this PR and open this new one -- or just make the changes here, that's up to you. We can delete the current mutex work though. |
actually, we very normally launch the nav2 by and we shutdown it also normally by sending a finally, we got log files going directly here. As you can see,
we're also confused why costmap deactivate and cleanup firstly ... it strongly shows that there's some previous shutdown mechanics to make thus, that's why I said:
|
i'd like to try it in our future work . <(^-^)> |
* bug fixed * add space * Update planner_server.cpp * add space for code style * add childLifecycleNode mode to costmap_2d_ros * add childLifecycleNode mode to costmap_2d_ros * add childLifecycleNode mode to costmap_2d_ros * add childLifecycleNode mode in costmap_2d_ros * add childLifecycleNode mode in costmap_2d_ros * add childLifecycleNode mode in costmap_2d_ros * add ChildLifecycleNode mode in costmap_2d_ros * NodeOption: is_lifecycle_follower_ * NodeOption: is_lifecycle_follower_ * fit to NodeOption: is_lifecycle_follower_ * NodeOption: is_lifecycle_follower_ * fit to NodeOption: is_lifecycle_follower * fit to NodeOption: is_lifecycle_follower * fit reorder Werror * fix wrong use of is_lifecycle_follower * remove blank line * NodeOption: is_lifecycle_follower_ * NodeOption: is_lifecycle_follower_ * Add files via upload * NodeOption: is_lifecycle_follower_ * NodeOption:is_lifecycle_follower_ * NodeOption:is_lifecycle_follower * NodeOption:is_lifecycle_follower * NodeOption:is_lifecycle_follower * change default * add NodeOption for costmap_2d_ros * add node options for costmap2dros as an independent node * code style reformat * fit to NodeOption of Costmap2DROS * fit to NodeOption of Costmap2DROS * fit to NodeOption of Costmap2DROS * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp Co-authored-by: Steve Macenski <stevenmacenski@gmail.com> * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp Co-authored-by: Steve Macenski <stevenmacenski@gmail.com> * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp Co-authored-by: Steve Macenski <stevenmacenski@gmail.com> * changes * comment changes * change get_parameter into =false * comment modification * missing line * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp * delete last line * change lifecycle_test fit to NodeOption --------- Co-authored-by: GoesM <GoesM@buaa.edu.cn> Co-authored-by: Steve Macenski <stevenmacenski@gmail.com>
* bug fixed * add space * Update planner_server.cpp * add space for code style * add childLifecycleNode mode to costmap_2d_ros * add childLifecycleNode mode to costmap_2d_ros * add childLifecycleNode mode to costmap_2d_ros * add childLifecycleNode mode in costmap_2d_ros * add childLifecycleNode mode in costmap_2d_ros * add childLifecycleNode mode in costmap_2d_ros * add ChildLifecycleNode mode in costmap_2d_ros * NodeOption: is_lifecycle_follower_ * NodeOption: is_lifecycle_follower_ * fit to NodeOption: is_lifecycle_follower_ * NodeOption: is_lifecycle_follower_ * fit to NodeOption: is_lifecycle_follower * fit to NodeOption: is_lifecycle_follower * fit reorder Werror * fix wrong use of is_lifecycle_follower * remove blank line * NodeOption: is_lifecycle_follower_ * NodeOption: is_lifecycle_follower_ * Add files via upload * NodeOption: is_lifecycle_follower_ * NodeOption:is_lifecycle_follower_ * NodeOption:is_lifecycle_follower * NodeOption:is_lifecycle_follower * NodeOption:is_lifecycle_follower * change default * add NodeOption for costmap_2d_ros * add node options for costmap2dros as an independent node * code style reformat * fit to NodeOption of Costmap2DROS * fit to NodeOption of Costmap2DROS * fit to NodeOption of Costmap2DROS * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp Co-authored-by: Steve Macenski <stevenmacenski@gmail.com> * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp Co-authored-by: Steve Macenski <stevenmacenski@gmail.com> * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp Co-authored-by: Steve Macenski <stevenmacenski@gmail.com> * changes * comment changes * change get_parameter into =false * comment modification * missing line * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp * delete last line * change lifecycle_test fit to NodeOption --------- Co-authored-by: GoesM <GoesM@buaa.edu.cn> Co-authored-by: Steve Macenski <stevenmacenski@gmail.com>
* bug fixed * add space * Update planner_server.cpp * add space for code style * add childLifecycleNode mode to costmap_2d_ros * add childLifecycleNode mode to costmap_2d_ros * add childLifecycleNode mode to costmap_2d_ros * add childLifecycleNode mode in costmap_2d_ros * add childLifecycleNode mode in costmap_2d_ros * add childLifecycleNode mode in costmap_2d_ros * add ChildLifecycleNode mode in costmap_2d_ros * NodeOption: is_lifecycle_follower_ * NodeOption: is_lifecycle_follower_ * fit to NodeOption: is_lifecycle_follower_ * NodeOption: is_lifecycle_follower_ * fit to NodeOption: is_lifecycle_follower * fit to NodeOption: is_lifecycle_follower * fit reorder Werror * fix wrong use of is_lifecycle_follower * remove blank line * NodeOption: is_lifecycle_follower_ * NodeOption: is_lifecycle_follower_ * Add files via upload * NodeOption: is_lifecycle_follower_ * NodeOption:is_lifecycle_follower_ * NodeOption:is_lifecycle_follower * NodeOption:is_lifecycle_follower * NodeOption:is_lifecycle_follower * change default * add NodeOption for costmap_2d_ros * add node options for costmap2dros as an independent node * code style reformat * fit to NodeOption of Costmap2DROS * fit to NodeOption of Costmap2DROS * fit to NodeOption of Costmap2DROS * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp Co-authored-by: Steve Macenski <stevenmacenski@gmail.com> * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp Co-authored-by: Steve Macenski <stevenmacenski@gmail.com> * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp Co-authored-by: Steve Macenski <stevenmacenski@gmail.com> * changes * comment changes * change get_parameter into =false * comment modification * missing line * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp * delete last line * change lifecycle_test fit to NodeOption --------- Co-authored-by: GoesM <GoesM@buaa.edu.cn> Co-authored-by: Steve Macenski <stevenmacenski@gmail.com> Signed-off-by: gg <josho.wallace@gmail.com>
* bug fixed * add space * Update planner_server.cpp * add space for code style * add childLifecycleNode mode to costmap_2d_ros * add childLifecycleNode mode to costmap_2d_ros * add childLifecycleNode mode to costmap_2d_ros * add childLifecycleNode mode in costmap_2d_ros * add childLifecycleNode mode in costmap_2d_ros * add childLifecycleNode mode in costmap_2d_ros * add ChildLifecycleNode mode in costmap_2d_ros * NodeOption: is_lifecycle_follower_ * NodeOption: is_lifecycle_follower_ * fit to NodeOption: is_lifecycle_follower_ * NodeOption: is_lifecycle_follower_ * fit to NodeOption: is_lifecycle_follower * fit to NodeOption: is_lifecycle_follower * fit reorder Werror * fix wrong use of is_lifecycle_follower * remove blank line * NodeOption: is_lifecycle_follower_ * NodeOption: is_lifecycle_follower_ * Add files via upload * NodeOption: is_lifecycle_follower_ * NodeOption:is_lifecycle_follower_ * NodeOption:is_lifecycle_follower * NodeOption:is_lifecycle_follower * NodeOption:is_lifecycle_follower * change default * add NodeOption for costmap_2d_ros * add node options for costmap2dros as an independent node * code style reformat * fit to NodeOption of Costmap2DROS * fit to NodeOption of Costmap2DROS * fit to NodeOption of Costmap2DROS * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp Co-authored-by: Steve Macenski <stevenmacenski@gmail.com> * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp Co-authored-by: Steve Macenski <stevenmacenski@gmail.com> * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp Co-authored-by: Steve Macenski <stevenmacenski@gmail.com> * changes * comment changes * change get_parameter into =false * comment modification * missing line * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp * delete last line * change lifecycle_test fit to NodeOption --------- Co-authored-by: GoesM <GoesM@buaa.edu.cn> Co-authored-by: Steve Macenski <stevenmacenski@gmail.com> Signed-off-by: enricosutera <enricosutera@outlook.com>
* bug fixed * add space * Update planner_server.cpp * add space for code style * add childLifecycleNode mode to costmap_2d_ros * add childLifecycleNode mode to costmap_2d_ros * add childLifecycleNode mode to costmap_2d_ros * add childLifecycleNode mode in costmap_2d_ros * add childLifecycleNode mode in costmap_2d_ros * add childLifecycleNode mode in costmap_2d_ros * add ChildLifecycleNode mode in costmap_2d_ros * NodeOption: is_lifecycle_follower_ * NodeOption: is_lifecycle_follower_ * fit to NodeOption: is_lifecycle_follower_ * NodeOption: is_lifecycle_follower_ * fit to NodeOption: is_lifecycle_follower * fit to NodeOption: is_lifecycle_follower * fit reorder Werror * fix wrong use of is_lifecycle_follower * remove blank line * NodeOption: is_lifecycle_follower_ * NodeOption: is_lifecycle_follower_ * Add files via upload * NodeOption: is_lifecycle_follower_ * NodeOption:is_lifecycle_follower_ * NodeOption:is_lifecycle_follower * NodeOption:is_lifecycle_follower * NodeOption:is_lifecycle_follower * change default * add NodeOption for costmap_2d_ros * add node options for costmap2dros as an independent node * code style reformat * fit to NodeOption of Costmap2DROS * fit to NodeOption of Costmap2DROS * fit to NodeOption of Costmap2DROS * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp * changes * comment changes * change get_parameter into =false * comment modification * missing line * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp * delete last line * change lifecycle_test fit to NodeOption --------- Co-authored-by: GoesM <GoesM@buaa.edu.cn> Co-authored-by: Steve Macenski <stevenmacenski@gmail.com>
* bug fixed * add space * Update planner_server.cpp * add space for code style * add childLifecycleNode mode to costmap_2d_ros * add childLifecycleNode mode to costmap_2d_ros * add childLifecycleNode mode to costmap_2d_ros * add childLifecycleNode mode in costmap_2d_ros * add childLifecycleNode mode in costmap_2d_ros * add childLifecycleNode mode in costmap_2d_ros * add ChildLifecycleNode mode in costmap_2d_ros * NodeOption: is_lifecycle_follower_ * NodeOption: is_lifecycle_follower_ * fit to NodeOption: is_lifecycle_follower_ * NodeOption: is_lifecycle_follower_ * fit to NodeOption: is_lifecycle_follower * fit to NodeOption: is_lifecycle_follower * fit reorder Werror * fix wrong use of is_lifecycle_follower * remove blank line * NodeOption: is_lifecycle_follower_ * NodeOption: is_lifecycle_follower_ * Add files via upload * NodeOption: is_lifecycle_follower_ * NodeOption:is_lifecycle_follower_ * NodeOption:is_lifecycle_follower * NodeOption:is_lifecycle_follower * NodeOption:is_lifecycle_follower * change default * add NodeOption for costmap_2d_ros * add node options for costmap2dros as an independent node * code style reformat * fit to NodeOption of Costmap2DROS * fit to NodeOption of Costmap2DROS * fit to NodeOption of Costmap2DROS * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp * changes * comment changes * change get_parameter into =false * comment modification * missing line * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp * Update nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp * delete last line * change lifecycle_test fit to NodeOption --------- Co-authored-by: GoesM <GoesM@buaa.edu.cn> Co-authored-by: Steve Macenski <stevenmacenski@gmail.com>
Basic Info
Description of contribution in a few bullet points
following ISSUE #3940 :
BUG Description
When in complex scenarios (like within frequent movement of obstacles),
significant changes in the external environment fed back by sensors messages will lead to a reset of the costmap , which free pointers of
plugin
andfilter
;At the same time, it happened that goal action server access
plugin
andfilter
pointers inisCurrent()
, resulting in a crash of the program.Solution
The only place where
plugin
andfilter
pointers could be freed:in file
nav2_costmap_2d/src/clear_costmap_service.cpp
, in function as following:costmap_.resetLayers()
could freeplugin
andfilter
pointershere's already a
std::unique_lock<>
, to avoid some other functions running whileresetLayers()
function runningbut not to avoid
layered_costmap_->isCurrent()
whileresetLayers()
so, in file
nav2_costmap_2d/include/costmap_2d_ros.hpp
, we insert the same lock in function as following:Description of documentation updates required from your changes
Future work that may be required in bullet points
For Maintainers: