-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
terminate called after throwing an instance of 'std::system_error' what(): Resource deadlock avoided #1519
Comments
Are you really using Eloquent on Ubuntu 20.04? That isn't a supported combination (and in point of fact, Eloquent is end-of-life now anyway). Can you try this on Foxy on Ubuntu 20.04 and see if you are still running into problems? |
I am interested in this issue. BTW:
Did you change the About |
I’ll write a sample program and upload soon. |
I didn't change the source code. Finally, I found the way to reproduce the error. If I implement the callback of subscription in the class TestClass
{
public:
static std::shared_ptr<TestClass> getInstance(){
static std::shared_ptr<TestClass> instance;
if(!instance)
instance.reset(new TestClass());
return instance;
}
bool init(rclcpp::Node::SharedPtr node){
node_ = node;
rclcpp::Parameter param;
node_->get_parameter("topic1", param);
std::string topic = param.get_value<std::string>();
RCLCPP_INFO_STREAM(node_->get_logger(), "read topic1: " <<topic);
module1 = std::make_shared<Module1>(node_);
//NOTE: uncomment this line to reproduce the bug
// auto fcn = std::bind(&Module1::stringCallback, module1, std::placeholders::_1);
// run ok
auto fcn = std::bind(&TestClass::stringCallback, this, std::placeholders::_1);
stringSub_ = node_->create_subscription<std_msgs::msg::String>(
"/string",
1,
fcn);
int16Pub_ = node_->create_publisher<std_msgs::msg::Int16>(
"/int",1);
auto pubCB = [this](){
std_msgs::msg::Int16 msg;
msg.set__data(1);
int16Pub_->publish(msg);
};
pubTimer_ = node_->create_wall_timer(10ms, pubCB);
return true;
}
void stringCallback(std_msgs::msg::String::SharedPtr msg){
// RCLCPP_INFO(node_->get_logger(), "received message");
}
protected:
TestClass();
private:
rclcpp::Node::SharedPtr node_;
rclcpp::Subscription<std_msgs::msg::String>::SharedPtr stringSub_;
rclcpp::Publisher<std_msgs::msg::Int16>::SharedPtr int16Pub_;
rclcpp::TimerBase::SharedPtr pubTimer_;
std::shared_ptr<Module1> module1;
}; The project is upload in the attached zip file. |
The problem also has something to do with the singleton implementation of class TestClass
{
public:
/* static std::shared_ptr<TestClass> getInstance(){
static std::shared_ptr<TestClass> instance;
if(!instance)
instance.reset(new TestClass());
return instance;
}*/
bool init(rclcpp::Node::SharedPtr node){
node_ = node;
rclcpp::Parameter param;
node_->get_parameter("topic1", param);
std::string topic = param.get_value<std::string>();
RCLCPP_INFO_STREAM(node_->get_logger(), "read topic1: " <<topic);
module1 = std::make_shared<Module1>(node_);
//NOTE: uncomment this line to reproduce the bug
auto fcn = std::bind(&Module1::stringCallback, module1, std::placeholders::_1);
// run ok
// auto fcn = std::bind(&TestClass::stringCallback, this, std::placeholders::_1);
stringSub_ = node_->create_subscription<std_msgs::msg::String>(
"/string",
1,
fcn);
int16Pub_ = node_->create_publisher<std_msgs::msg::Int16>(
"/int",1);
auto pubCB = [this](){
std_msgs::msg::Int16 msg;
msg.set__data(1);
int16Pub_->publish(msg);
};
pubTimer_ = node_->create_wall_timer(10ms, pubCB);
return true;
}
void stringCallback(std_msgs::msg::String::SharedPtr msg){
// RCLCPP_INFO(node_->get_logger(), "received message");
}
public:
TestClass();
private:
rclcpp::Node::SharedPtr node_;
rclcpp::Subscription<std_msgs::msg::String>::SharedPtr stringSub_;
rclcpp::Publisher<std_msgs::msg::Int16>::SharedPtr int16Pub_;
rclcpp::TimerBase::SharedPtr pubTimer_;
std::shared_ptr<Module1> module1;
}; And SetUp() is: void SetUp(){
auto context = contexts::default_context::get_global_default_context();
auto options = NodeOptions()
.context(context)
// BUG: if set true, test would crash: std::system_error Resource deadlock avoided!
.use_intra_process_comms(true)
.automatically_declare_parameters_from_overrides(true);
node_ = std::make_shared<Node>("test_bug", options);
// NOTE:uncomment this line to reproduce bug
// TestClass::getInstance()->init(node_);
testClass_ = std::make_shared<TestClass>();
testClass_->init(node_);
pubNode_ = std::make_shared<Node>("pub_node", options);
stringPub_ = pubNode_->create_publisher<std_msgs::msg::String>("/string", 1);
auto pubTimerCB = [this](){
std_msgs::msg::String msg;
msg.set__data("test");
stringPub_->publish(std::move(msg));
};
pubTimer_ = pubNode_->create_wall_timer(100ms, pubTimerCB);
// auto intCallback = [this](std_msgs::msg::Int16::SharedPtr msg){
// RCLCPP_INFO(pubNode_->get_logger(), "Receive int message");
// };
// intSub_ = pubNode_->create_subscription<Int16>("/int", 1, intCallback);
executor_ = std::make_shared<executors::MultiThreadedExecutor>();
executor_->add_node(node_);
executor_->add_node(pubNode_);
} the test program runs OK. |
I don't want to argue about whether the design of the sample is reasonable or not, We can find that rclcpp/rclcpp/src/rclcpp/intra_process_manager.cpp Lines 83 to 85 in b1ff2d5
subscriptions_.erase(intra_process_subscription_id);
^^^^^^^^^^^^^^^^^^^^ // There exists a case to destroy
// `AnySubscriptionCallback any_callback_` of SubscriptionIntraProcess
//, which might keep the lifecycle of node that will call `IntraProcessManager::remove_subscription`. How about adding a new |
could you elaborate why adding recursive_mutex is gonna address this issue? the root cause of this is dead lock as following,
trying to rwlock with writer access, it detects EDEADLK from glibc then raise exception. |
About std::system_error Resource deadlock avoided!, it seems there are two places to call std::unique_lockstd::shared_timed_mutex lock(mutex_); in one thread. (As I mentioned at #1519 (comment)) a thread to run:
{
std::unique_lock<std::shared_timed_mutex> lock(mutex_);
subscriptions_.erase(intra_process_subscription_id); ==> if `erase` lead to run `std::unique_lock<std::shared_timed_mutex> lock(mutex_);` again, it throws exception(`EDEADLK`).
} which kind of mutex do you think can fix this error? From what I know, |
i think in that case, |
Thank you for your suggestion. I think Based on your suggestion about throwing an exception in this case, so it's the users' responsibility to catch the exception. |
@iuhilnehc-ynos |
@LouisChen1905 I think you tagged me by mistake, right? I haven't been involved in this ticket before. |
|
Could you run your sample with If you have any questions, please reopen this. Thanks. |
Bug report
Required Info:
Steps to reproduce issue
I'm writing gtest scripts. My SetUp():
In
TEST_F()
, if I usedexecutor.spin_until_future_complete()
to wait some messages, test program would crash when entering the second TEST_F(). If I usespin_some()
instead, the program would run without crashing.You can find the line above
use_intra_process_comms(true)
. If it's set totrue
, 'Resource deadlock avoided' would occur in the second test case.Expected behavior
Run through each test case.
Actual behavior
The first test case passed OK. But test program crashed when entering the second test case.
The text was updated successfully, but these errors were encountered: