Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception being thrown: failed to initialize rcl node: Logger already initialized for node #375

Closed
mkhansenbot opened this issue Jan 23, 2019 · 20 comments
Assignees

Comments

@mkhansenbot
Copy link

Bug report

Required Info:

  • Operating System:

    • Ubuntu 18.04
  • Installation type:

    • source
  • Version or commit hash:

  • DDS implementation:

    • Default - FastRTPS
  • Client library (if applicable):

    • rclcpp

Steps to reproduce issue

Install navigation2

ros2 launch nav2_bringup nav2_bringup_1st_launch.py map:=<path/to/map>

Expected behavior

Should launch fine without issue.

Actual behavior

Exception occurs, and crashes amcl node:

ros2 launch nav2_bringup nav2_bringup_1st_launch.py map:=/home/mkhansen/ros2_dev/navigation2_ws/src/navigation2/nav2_system_tests/maps/map_circular.yaml
[INFO] [launch]: process[amcl-1]: started with pid [14806]
[INFO] [amcl]: Initializing AMCL
[INFO] [amcl]: Sensor model type is: "likelihood_field"
[INFO] [amcl]: Robot motion model is differential
terminate called after throwing an instance of 'rclcpp::exceptions::RCLError'
  what():  failed to initialize rcl node: Logger already initialized for node., at /home/mkhansen/ros2_dev/ros2_ws/src/ros2/rcl/rcl/src/rcl/logging_rosout.c:156
[ERROR] [launch]: process[amcl-1] process has died [pid 14806, exit code -6, cmd '/home/mkhansen/ros2_dev/navigation2_ws/install/nav2_amcl/lib/nav2_amcl/amcl __node:=amcl'].

Additional information

SO, the launch file by default has this for the amcl portion:

        launch_ros.actions.Node(
            package='nav2_amcl',
            node_executable='amcl',
            node_name='amcl',
            output='screen'),
            parameters=[{ 'use_sim_time': use_sim_time}]),

When I comment out these 2 lines, setting the node_name and parameters, it works fine, but use_sim_time isn't set (which is needed).

        launch_ros.actions.Node(
            package='nav2_amcl',
            node_executable='amcl',
#            node_name='amcl',
            output='screen'),
#            parameters=[{ 'use_sim_time': use_sim_time}]),

I suspect the issue has something to do with the node_name field, but I don't know exactly what the issue is because I use nearly the same syntax for map_server in the same launch file and it works fine:

        launch_ros.actions.Node(
            package='nav2_map_server',
            node_executable='map_server',
            node_name='map_server',
            output='screen',
            parameters=[{ 'use_sim_time': use_sim_time}, { 'yaml_filename':  map_yaml_file }]),

The only difference there being the use of a yaml file for a second parameter in the list.

This only recently stopped working (last week), and it coincided with #350, and the message was added in that PR, so it has to be related. I'm just not sure what the PR added and why having the node name and parameters makes it fail. I see the failure occurs here:
logging_rosout.c:156

Please help to understand why I'm hitting this error.

@mkhansenbot
Copy link
Author

Adding @crdelsey

@sloretz
Copy link
Contributor

sloretz commented Jan 23, 2019

@mkhansen-intel does the error happen with a different node name like amclfoo or fooamcl?

@mkhansenbot
Copy link
Author

Good idea, but I just tried it and it didn't make a difference except in the logger output on the screen:

[INFO] [amclfoo]: Robot motion model is differential
terminate called after throwing an instance of 'rclcpp::exceptions::RCLError'
  what():  failed to initialize rcl node: Logger already initialized for node., at /home/mkhansen/ros2_dev/ros2_ws/src/ros2/rcl/rcl/src/rcl/logging_rosout.c:156
[ERROR] [launch]: process[amcl-1] process has died [pid 15271, exit code -6, cmd '/home/mkhansen/ros2_dev/navigation2_ws/install/nav2_amcl/lib/nav2_amcl/amcl __node:=amclfoo __params:=/tmp/launch_params_7s1jh4rr'].

@mkhansenbot
Copy link
Author

OK here's more info.
I isolated the failure down to this line in amcl:
tfl_.reset(new tf2_ros::TransformListener(*tf_));

The gdb backtrace of the error is this:

(gdb) backtrace
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff662d801 in __GI_abort () at abort.c:79
#2  0x00007ffff6c828b7 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff6c88a06 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff6c88a41 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff6c88c74 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff7a1812c in rclcpp::exceptions::throw_from_rcl_error(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rcutils_error_state_t const*, void (*)()) ()
   from /home/mkhansen/ros2_dev/ros2_ws/install/rclcpp/lib/librclcpp.so
#7  0x00007ffff7a4e9e8 in rclcpp::node_interfaces::NodeBase::NodeBase(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<rclcpp::Context>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool) () from /home/mkhansen/ros2_dev/ros2_ws/install/rclcpp/lib/librclcpp.so
#8  0x00007ffff7a49830 in rclcpp::Node::Node(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<rclcpp::Context>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&, bool, bool, bool) ()
   from /home/mkhansen/ros2_dev/ros2_ws/install/rclcpp/lib/librclcpp.so
#9  0x00007ffff7a496aa in rclcpp::Node::Node(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) ()
   from /home/mkhansen/ros2_dev/ros2_ws/install/rclcpp/lib/librclcpp.so
#10 0x00007ffff427cd8a in void __gnu_cxx::new_allocator<rclcpp::Node>::construct<rclcpp::Node, char const (&) [24]>(rclcpp::Node*, char const (&) [24]) () from /home/mkhansen/ros2_dev/ros2_ws/install/tf2_ros/lib/libtf2_ros.so
#11 0x00007ffff427c0cc in void std::allocator_traits<std::allocator<rclcpp::Node> >::construct<rclcpp::Node, char const (&) [24]>(std::allocator<rclcpp::Node>&, rclcpp::Node*, char const (&) [24]) ()
   from /home/mkhansen/ros2_dev/ros2_ws/install/tf2_ros/lib/libtf2_ros.so
#12 0x00007ffff427aefe in std::_Sp_counted_ptr_inplace<rclcpp::Node, std::allocator<rclcpp::Node>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<char const (&) [24]>(std::allocator<rclcpp::Node>, char const (&) [24]) ()
   from /home/mkhansen/ros2_dev/ros2_ws/install/tf2_ros/lib/libtf2_ros.so
#13 0x00007ffff42797b3 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<rclcpp::Node, std::allocator<rclcpp::Node>, char const (&) [24]>(std::_Sp_make_shared_tag, rclcpp::Node*, std::allocator<rclcpp::Node> const&, char const (&) [24]) () from /home/mkhansen/ros2_dev/ros2_ws/install/tf2_ros/lib/libtf2_ros.so
#14 0x00007ffff4277f8a in std::__shared_ptr<rclcpp::Node, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<rclcpp::Node>, char const (&) [24]>(std::_Sp_make_shared_tag, std::allocator<rclcpp::Node> const&, char const (&) [24]) () from /home/mkhansen/ros2_dev/ros2_ws/install/tf2_ros/lib/libtf2_ros.so
#15 0x00007ffff42765c1 in std::shared_ptr<rclcpp::Node>::shared_ptr<std::allocator<rclcpp::Node>, char const (&) [24]>(std::_Sp_make_shared_tag, std::allocator<rclcpp::Node> const&, char const (&) [24]) ()
   from /home/mkhansen/ros2_dev/ros2_ws/install/tf2_ros/lib/libtf2_ros.so
#16 0x00007ffff4274ce4 in std::shared_ptr<rclcpp::Node> std::allocate_shared<rclcpp::Node, std::allocator<rclcpp::Node>, char const (&) [24]>(std::allocator<rclcpp::Node> const&, char const (&) [24]) ()
   from /home/mkhansen/ros2_dev/ros2_ws/install/tf2_ros/lib/libtf2_ros.so
#17 0x00007ffff4273df1 in std::shared_ptr<rclcpp::Node> std::make_shared<rclcpp::Node, char const (&) [24]>(char const (&) [24]) () from /home/mkhansen/ros2_dev/ros2_ws/install/tf2_ros/lib/libtf2_ros.so
#18 0x00007ffff42734a8 in std::shared_ptr<rclcpp::Node> rclcpp::Node::make_shared<char const (&) [24]>(char const (&) [24]) () from /home/mkhansen/ros2_dev/ros2_ws/install/tf2_ros/lib/libtf2_ros.so
#19 0x00007ffff4271b25 in tf2_ros::TransformListener::TransformListener(tf2::BufferCore&, bool) ()
   from /home/mkhansen/ros2_dev/ros2_ws/install/tf2_ros/lib/libtf2_ros.so
#20 0x00007ffff718fa77 in AmclNode::AmclNode() ()
   from /home/mkhansen/ros2_dev/navigation2_ws/install/nav2_amcl/lib/libamcl_core.so
#21 0x0000555555557d30 in void __gnu_cxx::new_allocator<AmclNode>::construct<AmclNode>(AmclNode*) ()
#22 0x0000555555557bef in void std::allocator_traits<std::allocator<AmclNode> >::construct<AmclNode>(std::allocator<AmclNode>&, AmclNode*) ()
#23 0x0000555555557a1a in std::_Sp_counted_ptr_inplace<AmclNode, std::allocator<AmclNode>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<>(std::allocator<AmclNode>) ()
#24 0x0000555555557715 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<AmclNode, std::allocator<AmclNode>>(std::_Sp_make_shared_tag, AmclNode*, std::allocator<AmclNode> const&) ()
#25 0x00005555555575c8 in std::__shared_ptr<AmclNode, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<AmclNode>>(std::_Sp_make_shared_tag, std::allocator<AmclNode> const&) ()
#26 0x0000555555557516 in std::shared_ptr<AmclNode>::shared_ptr<std::allocator<AmclNode>>(std::_Sp_make_shared_tag, std::allocator<AmclNode> const&) ()
#27 0x0000555555557408 in std::shared_ptr<AmclNode> std::allocate_shared<AmclNode, std::allocator<AmclNode>>(std::allocator<AmclNode> const&) ()
#28 0x00005555555571df in std::shared_ptr<AmclNode> std::make_shared<AmclNode>() ()
#29 0x0000555555556d51 in main ()

Line 19 is where it calls into the TransformListener constructor. I still don't understand why that is causing an exception to be thrown. Is the bug in TransformListener? Maybe I should have filed the bug in Geometry2 instead?

Let me know how I should proceed.

@crdelsey
Copy link

Interesting. When you specify the node_name option, all the nodes get given that same name. Seems the new logger code doesn't like it when there are two nodes with the same name.

I created a minimal example to reproduce the problem.

#include "rclcpp/rclcpp.hpp"

using namespace std;

int main(int argc, char ** argv)
{
  rclcpp::init(argc, argv);
  auto node1 = rclcpp::Node::make_shared("foo");
  auto node2 = rclcpp::Node::make_shared("bar");
  // rclcpp::spin(node);
  rclcpp::shutdown();

  return 0;
}

If I build the code above into a node called loggerfailure, then if I run without options
(./loggerfailure) everything is fine. If I add the node name option (./loggerfailure __node:=abc) then I get the reported exception.

@sloretz
Copy link
Contributor

sloretz commented Jan 24, 2019

@crdelsey In that example the nodes can be given different names by prefixing the rule with original name ./loggerfailure foo:__node:=abc bar:__node:=cba

@crdelsey
Copy link

@sloretz That does fix the problem in my example. However, I'm not aware of a way to specify those prefixes when using a launch file like in the original problem description.

@mkhansenbot
Copy link
Author

Why does the logger now need to throw that exception? What harm is it causing to have two nodes with the same logger? If the only downside is that they both will log using the same name and therefore you won't know what output was from which node, then it seems like a warning "Duplicate logger for node abc" is all that is needed. Am I missing something?

@sloretz
Copy link
Contributor

sloretz commented Jan 24, 2019

What harm is it causing to have two nodes with the same logger?

I agree two nodes sharing a logger would be fine as a warning. Elsewhere, nodes with the same name create parameter services with the same name. This makes setting different parameters on them a problem.

@mkhansenbot
Copy link
Author

So @crdelsey has a work-around for this which we can live with. I still think this exception should be a warning instead. Please let me know whether that change will be made so we can revert the work-around at a future date.

@sloretz
Copy link
Contributor

sloretz commented Jan 28, 2019

I can reproduce this exception with

ros2 run nav2_amcl amcl __node:=amcl

but not with

ros2 run nav2_amcl amcl

I don't yet know why this is the case.

@sloretz sloretz self-assigned this Jan 28, 2019
@sloretz
Copy link
Contributor

sloretz commented Jan 28, 2019

tfl_.reset(new tf2_ros::TransformListener(*tf_)); throws because this constructor creates a hidden node. Both change their names according to the __node.

Another way to work around the issue is to use another __node rule to tell the transform listener to keep it's name.

ros2 run nav2_amcl amcl transform_listener_impl:__node:=transform_listener_impl __node:=amcl

But I assume in this case amcl would rather reuse the same node with the transform listener rather than create a new one.

@sloretz
Copy link
Contributor

sloretz commented Jan 28, 2019

__node:= is causing trouble when code creates other nodes in the background. For example, TransformListener creates a node, and this code in navigation2 creates a node. Launch doesn't know there are multiple nodes in the process so it has no way of knowing it should use a __node name rule with a node name prefix (amcl:__node:=amcl).

Here are a few options. @ros2/team mind looking at the options and giving feedback?

  1. Make the logging code warn instead of error
    • This gets rid of the crash, but other systems assume nodes in a process have different names (parameter services)
    • log messages for nodes with the same name will stop being published to rosout when the node which created the publisher is shut down
    • Could be done by
      • rcl_logging_rosout_init_publisher_for_node Warns that a node with the same name exists
      • rcl_node_init warns and then continues creating a node if the logger returns RCL_RET_ALREADY_INIT
      • rcl_logging_rosout_fini_publisher_for_node only fini's publisher if the passed in node matches entry.node
  2. Allow multiple rosout publishers per logger
    • The hash map used to store loggers would need to store a list of node/publisher pairs
    • If there are N nodes with the same name then each message is published N times because there's no way to know which node published the message
  3. Nothing
    • Downstream users are likely to encounter __node renaming multiple nodes again, resulting in this crash
  4. Recommend temporary nodes use use_global_arguments = false
    • Nodes will all have different names, but without global arguments they won't obey remap rules
    • Users will still encounter this issue in code that doesn't set use_global_arguments
  5. Add an anonymous flag to nodes that causes it to create a unique name and ignore __node rules.
    • Nodes will all have different names and obey remap rules
    • Users will still encounter this issue if code doesn't set the flag
  6. Make __node consumable, first node to launch takes the rule for itself
    • Only one node will change its name, so there won't be duplicates
    • Not consistent with __ns, __log_level __parameters, etc
    • Depending on how nodes are created, it could be a race condition to see which node gets which name
  7. Only allow __node with a node name prefix old_name:__node:=new_name
    • Launch file author must know the old name in order to give a node a new one
  8. If another node with the same name exists in the same process, the new node's name is prefixed/suffixed/renamed
    • Avoids this crash and other systems that expect node names to be unique per process
    • If a user specified other remap rules with a node name prefix then they won't have an effect

@hidmic
Copy link
Contributor

hidmic commented Jan 29, 2019

@sloretz so, given that those hidden nodes seem to work only as proxies/wrappers to ROS interfaces, I'd say (5) is a reasonable path forward, though a less obscure exception would be nice i.e. letting you know there're other non-anonymous nodes in the process and thus you can't just use __node:=... w/o a prefix. Just my 2 cents.

BTW what would happen if e.g. two TransformListener are instantiated? Would that result in name clashing as well, even if no __node remapping is being used?

@sloretz
Copy link
Contributor

sloretz commented Jan 29, 2019

BTW what would happen if e.g. two TransformListener are instantiated? Would that result in name clashing as well, even if no __node remapping is being used?

Yup, two TransformListener alone crashes since they create nodes with the same name
#include "tf2_ros/transform_listener.h"
#include "rclcpp/rclcpp.hpp"

int main(int argc, char ** argv)
{
  rclcpp::init(argc, argv);
  auto nh = rclcpp::Node::make_shared("dual_listener");
  auto clock = nh->get_clock();

  tf2_ros::Buffer buffer1(clock);
  auto tfl1 = tf2_ros::TransformListener(buffer1);
  tf2_ros::Buffer buffer2(clock);
  auto tfl2 = tf2_ros::TransformListener(buffer2);

  return 0;
}
$ ./build/tf2_ros/dual_listener
terminate called after throwing an instance of 'rclcpp::exceptions::RCLError'
  what():  failed to initialize rcl node: Logger already initialized for node., at /home/sloretz/ws/ros2/src/ros2/rcl/rcl/src/rcl/logging_rosout.c:156
Aborted (core dumped)

@jacobperron
Copy link
Member

I agree that (5) seems like the right thing to do (and then do the TODO to resolve this particular ticket).

I may have been under the wrong assumption, but I thought fully qualified node names should be unique. If so, then I believe an error (rather than a warning) is correct in the reported case.

@cottsay cottsay added the ready Work is about to start (Kanban column) label Jan 31, 2019
@Karsten1987
Copy link
Contributor

Karsten1987 commented Feb 6, 2019

So we currently run into this issue as well on rosbag2.
https://ci.ros2.org/job/ci_linux/6127/testReport/junit/(root)/projectroot/test_record/

in contrast to this issue here, i believe we don't spawn multiple nodes with the same time simultaneously but sequentially in the same process.
I actually found that we do have nodes with the same node name. So I can second that this is the cause for this problem.

@nburek
Copy link
Contributor

nburek commented Feb 6, 2019

Why does the logger now need to throw that exception? What harm is it causing to have two nodes with the same logger? If the only downside is that they both will log using the same name and therefore you won't know what output was from which node, then it seems like a warning "Duplicate logger for node abc" is all that is needed. Am I missing something?

A little background, for the rosout logging functionality in ROS2 we create a publisher instance on every node and the logger that is associated with that node sends logs out on the rosout topic publisher specifically for that node. We did it this way to avoid randomly picking a single node in the process and sending all logs out via a publisher on that node. Additionally, the only thing linking a node logger in rclcpp/rclpy to the logger in the rcl layer is a string name that is set to the node's name. This is why we maintain the hashmap so that we can map log messages on a node logger to the publisher for that node.

It was also my understanding at the time that while nothing explicitly prevented two nodes with the same name it wasn't recommended.

In light of all that, I added the exception that explicitly prevents a second node with the same name registering a logger because it then becomes unclear what the behavior of the system becomes. There isn't a way to map log messages coming from the OOP node loggers to the correct node publisher instance given only the node name. Additionally, as @sloretz points out, if we just warn and reuse the same logger for both node instances then when the node with the publisher gets destroyed it would stop the other instance from publishing to rosout. Additionally, I didn't feel it was the right choice that any node could secretly impact traffic going over another node instance's publishers under the hood.

All these things could be fixed by adding slightly more complexity in the logger or accepting some of the pitfalls of the workarounds, but I'd like to know what the long term story is for supporting multiple nodes with the same name inside the same process before committing to a solution.

@nburek
Copy link
Contributor

nburek commented Feb 7, 2019

Fix to make this a warning message for now #384, as discussed here: #350

@jacobperron
Copy link
Member

@mkhansen-intel I believe this issue has been resolved by #384. If not, you can comment here and I can reopen the issue,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants