Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common_objects: getSharedRobotModelLoader fix deadlock #734

Merged
merged 3 commits into from
Oct 13, 2021

Conversation

JafarAbdi
Copy link
Contributor

Description

Fix a deadlock introduced by my previous PR by locking the mutex inside getSharedRobotModelLoader and calling it from getSharedRobotModel that locks the same mutex, other solutions could be by providing a different mutex for robot_model_loaders_ or adding a getSharedRobotModelLoaderLockless function that will be called inside getSharedRobotModel

I did also replace the boost::* namespace with std::*

Checklist

  • Required by CI: Code is auto formatted using clang-format
  • Extend the tutorials / documentation reference
  • Document API changes relevant to the user in the MIGRATION.md notes
  • Create tests, which fail without this PR reference
  • Include a screenshot if changing a GUI
  • While waiting for someone to review your request, please help review another open pull request to support the maintainers

@JafarAbdi
Copy link
Contributor Author

I'll do a new patch release for both galactic and rolling after this get merged

@codecov
Copy link

codecov bot commented Oct 11, 2021

Codecov Report

Merging #734 (a37132f) into main (2c60aa0) will not change coverage.
The diff coverage is n/a.

❗ Current head a37132f differs from pull request most recent head 0882558. Consider uploading reports for the commit 0882558 to get more accurate results
Impacted file tree graph

@@           Coverage Diff           @@
##             main     #734   +/-   ##
=======================================
  Coverage   54.22%   54.22%           
=======================================
  Files         192      192           
  Lines       20230    20230           
=======================================
  Hits        10968    10968           
  Misses       9262     9262           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2c60aa0...0882558. Read the comment docs.

@vatanaksoytezer vatanaksoytezer added blocks release This PR should be merged before the next release bug Something isn't working labels Oct 12, 2021
Copy link
Contributor

@vatanaksoytezer vatanaksoytezer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR MGI demos were broken, this fixes them nicely and gets them working again. I think we should merge this as soon as possible and make a release.

@JafarAbdi JafarAbdi merged commit bbcf9a0 into moveit:main Oct 13, 2021
@JafarAbdi JafarAbdi deleted the pr-fix_deadlock branch October 13, 2021 18:02
JafarAbdi added a commit to JafarAbdi/moveit2 that referenced this pull request Nov 4, 2021
JafarAbdi added a commit to JafarAbdi/moveit2 that referenced this pull request Nov 4, 2021
JafarAbdi added a commit that referenced this pull request Nov 8, 2021
* Add getSharedRobotModelLoader to fix race condition when having multiple displays for the same node (#525)

* common_objects: getSharedRobotModelLoader fix deadlock (#734)
@shuhaowu
Copy link

This wasn't backported to Galactic, right? I'll either have to build MoveIt manually or upgrade to rolling?

@vatanaksoytezer
Copy link
Contributor

Hi @shuhaowu,

Galactic has been branched of fairly recently, thus this exists in Galactic. Let us know if you have any troubles.

@shuhaowu
Copy link

Thanks for the quick response.

I created my own move group interface demo code based on the MoveGroupInterface C++ tutorial on the moveit2_tutorial and the code hangs on constructor for MoveGroupInterface. I used gdb to figure out that it's stuck at getSharedRobotModelLoader trying to acquire a lock. I'm using the ros-galactic-moveit packages as opposed to a source build.

It is possible that I'm doing something wrong with my modifications to that demo code, although I only really changed the group name and loaded my own robot descriptions (i.e. I'm using my own robot, instead of panda). If that is the case, I haven't figured out why. I'm going to try moveit from a source build to see if that makes a difference. If that doesn't help, maybe I can run the moveit2_tutorial code directly to see why my setup fails.

I'll try to report back at some point.

@shuhaowu
Copy link

Actually MoveIt2 built faster than I thought it would. I just tried the MoveGroupInterface code with the latest main commit for MoveIt2 and I see messages like [moveit_rdf_loader.rdf_loader]: Loaded robot model in 0.00444073 seconds. I suspect it is working now, but will have to confirm maybe tomorrow. Thanks again.

@tylerjw
Copy link
Member

tylerjw commented Jan 19, 2022

Nice. I plan on blooming moveit this week for galactic and rolling which means that at the next sync you should be able to use it without building from source.

@vatanaksoytezer
Copy link
Contributor

@shuhaowu glad your problem has been resolved and build was fast! We also publish source based docker images that is pre-built for you [here](https://hub.docker.com/r/moveit/moveit2/tags.

@shuhaowu
Copy link

shuhaowu commented Jan 19, 2022

Thanks for the response guys. My problem is not entirely resolve, as another issue has popped up. The latest build of MoveIt2 appears to segfault in the move_group for me:

[move_group-1] [FATAL] [1642629122.702513168] [moveit_ros.trajectory_execution_manager]: Parameter '~moveit_controller_manager' not specified. This is needed to identify the plugin to use for interacting with controllers. No paths can be executed.
[move_group-1] [ERROR] [1642629122.702525161] [moveit_ros.trajectory_execution_manager]: Failed to reload controllers: `controller_manager_` does not exist.
[move_group-1] Stack trace (most recent call last):
[move_group-1] #9    Object "", at 0xffffffffffffffff, in 
[move_group-1] #8    Object "/workspaces/ros2-ws/build/moveit_ros_move_group/move_group", at 0x55bf90a2df3d, in _start
[move_group-1] #7    Object "/usr/lib/x86_64-linux-gnu/libc-2.31.so", at 0x7f0be9f6a0b2, in __libc_start_main
[move_group-1] #6    Object "/workspaces/ros2-ws/build/moveit_ros_move_group/move_group", at 0x55bf90a2d4eb, in main
[move_group-1] #5    Object "/workspaces/ros2-ws/install/moveit_ros_planning/lib/libmoveit_cpp.so.2.3.2", at 0x7f0bea89c1f9, in moveit_cpp::MoveItCpp::MoveItCpp(std::shared_ptr<rclcpp::Node> const&, moveit_cpp::MoveItCpp::Options const&)
[move_group-1] #4    Object "/workspaces/ros2-ws/install/moveit_ros_planning/lib/libmoveit_trajectory_execution_manager.so.2.3.2", at 0x7f0be9bcddc9, in trajectory_execution_manager::TrajectoryExecutionManager::TrajectoryExecutionManager(std::shared_ptr<rclcpp::Node> const&, std::shared_ptr<moveit::core::RobotModel const> const&, std::shared_ptr<planning_scene_monitor::CurrentStateMonitor> const&)
[move_group-1] #3    Object "/workspaces/ros2-ws/install/moveit_ros_planning/lib/libmoveit_trajectory_execution_manager.so.2.3.2", at 0x7f0be9bccad2, in trajectory_execution_manager::TrajectoryExecutionManager::initialize()
[move_group-1] #2    Object "/workspaces/ros2-ws/install/moveit_ros_planning/lib/libmoveit_planning_scene_monitor.so.2.3.2", at 0x7f0bea7aba39, in bool rclcpp::Node::get_parameter<double>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, double&) const
[move_group-1] #1    Object "/workspaces/ros2-ws/install/moveit_ros_planning/lib/libmoveit_planning_scene_monitor.so.2.3.2", at 0x7f0bea7a339f, in rclcpp::extend_name_with_sub_namespace(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
[move_group-1] #0    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28", at 0x7f0bea295694, in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare(char const*) const
[move_group-1] Segmentation fault (Address not mapped to object [0x348])

Not sure if this is a known issue. Perhaps I'm building it wrong somehow. Will have to report back. If it is indeed a real issue, I'll see if I can get a coredump and create a separate issue.

@JafarAbdi
Copy link
Contributor Author

Do you have moveit installed from debians ? can you try removing it sudo apt remove --purge ros-galactic-moveit* and do a clean build ?

@shuhaowu
Copy link

shuhaowu commented Jan 19, 2022

Nope. I reset my container without moveit installed to build. The only moveit related packages are the description packages installed via rosdep:

[/workspaces/ros2-ws]$ apt list --installed | grep moveit

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

ros-galactic-moveit-resources-fanuc-description/focal,now 2.0.3-1focal.20210920.233233 amd64 [installed]
ros-galactic-moveit-resources-fanuc-moveit-config/focal,now 2.0.3-1focal.20211223.010712 amd64 [installed]
ros-galactic-moveit-resources-panda-description/focal,now 2.0.3-1focal.20210920.233254 amd64 [installed]
ros-galactic-moveit-resources-panda-moveit-config/focal,now 2.0.3-1focal.20211223.010747 amd64 [installed]
ros-galactic-moveit-resources-pr2-description/focal,now 2.0.3-1focal.20210920.233310 amd64 [installed]

Don't think they cause a problem.

That said, I've added -g -O0 to the compilation option and got some more detailed information. It seems like the controller_mgr_node_ member variable is a nullptr on this line. I suspect it is because I don't have a controller setup in my move_group.launch?

image

There's the following in the logs:

[move_group-1] [FATAL] [1642629122.702513168] [moveit_ros.trajectory_execution_manager]: Parameter '~moveit_controller_manager' not specified. This is needed to identify the plugin to use for interacting with controllers. No paths can be executed.
[move_group-1] [ERROR] [1642629122.702525161] [moveit_ros.trajectory_execution_manager]: Failed to reload controllers: `controller_manager_` does not exist.

I thought controller manager is not strictly necessary if you want to use it to only plan? it does say FATAL above so maybe not? That said, if it is FATAL, presumably it should crash when that log is emitted, as opposed to continue and eventually reach a segfault?

@shuhaowu
Copy link

shuhaowu commented Jan 19, 2022

Seems like an upstream problem with rclcpp. See this example program and this comment upstream: See this comment upstream: ros2/rclcpp#450 (comment). I'll file a bug there I guess...

Meanwhile, I guess I'll have to configure moveit_controller_manager? I'll take a look.

#include "rclcpp/rclcpp.hpp"

class TestNode : public rclcpp::Node {
 public:
  TestNode() : Node("test_node") {
    RCLCPP_FATAL(get_logger(), "fatal error");
    RCLCPP_INFO(get_logger(), "shouldn't see this line");
  }
};

int main(int argc, char** argv) {
  rclcpp::init(argc, argv);
  rclcpp::spin(std::make_shared<TestNode>());
  rclcpp::shutdown();
  return 0;
}

Running this:

[FATAL] [1642633730.613062155] [test_node]: fatal error
[INFO] [1642633730.613141427] [test_node]: shouldn't see this line
^C[INFO] [1642633740.323608940] [rclcpp]: signal_handler(signal_value=2)

I filed an issue upstream: ros2/rclcpp#1869

@shuhaowu
Copy link

To loop around back to this, it looks like FATAL should not crash the program. This means there's a minor moveit bug here. I can create an issue tracking this in a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocks release This PR should be merged before the next release bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants