Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROS2 Launch fails to return GDB session for debugging #165

Open
SteveMacenski opened this issue Aug 3, 2020 · 45 comments
Open

ROS2 Launch fails to return GDB session for debugging #165

SteveMacenski opened this issue Aug 3, 2020 · 45 comments
Assignees

Comments

@SteveMacenski
Copy link

SteveMacenski commented Aug 3, 2020

Bug report

Required Info:

  • Operating System: Ubuntu 20.04
  • Installation type: Foxy binaries
  • Version or commit hash: Most recently released binaries
  • DDS implementation: Fast-RTPS
  • Client library (if applicable): rclcpp

Steps to reproduce issue

clone https://github.com/samsung-ros/gdb_test_pkg, build, and run launch file, e.g.

git clone https://github.com/samsung-ros/gdb_test_pkg.git
colcon build --packages-select gdb_test_pkg
ros2 launch gdb_test_pkg launch.py

This package is a simple ROS2 package template with 1 trivial node only containing functions to cause crashes to display this issue

Expected behavior

Crashes to occur then gdb prompt to be returned. From that prompt able to get a traceback / see info.

Actual behavior

Crashes occur as an inferior thread and gdb prompt is never returned. For packages with clean exits normally, using the GDB prefix fails to cleanly exit resulting in a SIGTERM and some errors. Ultimately exits but doesn't provide with a GDB session to get a backtrace.

the only way I'm able to get a backtrace or gdb prompt is to bypass prefix altogether and call direct install path executables. E.g.

gdb -ex r --args /home/steve/Documents/nav2_ws/install/gdb_test_pkg/lib/gdb_test_pkg/gdb_test_node --ros-args -r __node:=gdb_test_node

which makes prefix not very useful and all the instructions for ROS2 debugging are incorrect. (https://answers.ros.org/question/267261/how-can-i-run-ros2-nodes-in-a-debugger-eg-gdb/ and https://answers.ros.org/question/343326/ros2-prefix-in-launch-file/
).

steve@Eve:~/Documents/nav2_ws$ ros2 launch gdb_test_pkg launch.py
[INFO] [launch]: All log files can be found below /home/steve/.ros/log/2020-08-03-14-09-04-639193-Eve-47662
[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [gdb_test_node-1]: process started with pid [47675]
[gdb_test_node-1] GNU gdb (Ubuntu 9.1-0ubuntu1) 9.1
[gdb_test_node-1] Copyright (C) 2020 Free Software Foundation, Inc.
[gdb_test_node-1] License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
[gdb_test_node-1] This is free software: you are free to change and redistribute it.
[gdb_test_node-1] There is NO WARRANTY, to the extent permitted by law.
[gdb_test_node-1] Type "show copying" and "show warranty" for details.
[gdb_test_node-1] This GDB was configured as "x86_64-linux-gnu".
[gdb_test_node-1] Type "show configuration" for configuration details.
[gdb_test_node-1] For bug reporting instructions, please see:
[gdb_test_node-1] <http://www.gnu.org/software/gdb/bugs/>.
[gdb_test_node-1] Find the GDB manual and other documentation resources online at:
[gdb_test_node-1]     <http://www.gnu.org/software/gdb/documentation/>.
[gdb_test_node-1] 
[gdb_test_node-1] For help, type "help".
[gdb_test_node-1] Type "apropos word" to search for commands related to "word"...
[gdb_test_node-1] Reading symbols from /home/steve/Documents/nav2_ws/install/gdb_test_pkg/lib/gdb_test_pkg/gdb_test_node...
[gdb_test_node-1] Starting program: /home/steve/Documents/nav2_ws/install/gdb_test_pkg/lib/gdb_test_pkg/gdb_test_node --ros-args -r __node:=gdb_test_node
[gdb_test_node-1] [Thread debugging using libthread_db enabled]
[gdb_test_node-1] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[gdb_test_node-1] [New Thread 0x7ffff6a9b700 (LWP 47689)]
[gdb_test_node-1] [New Thread 0x7ffff629a700 (LWP 47690)]
[gdb_test_node-1] [New Thread 0x7ffff5a99700 (LWP 47691)]
[gdb_test_node-1] [New Thread 0x7ffff5298700 (LWP 47692)]
[gdb_test_node-1] [New Thread 0x7ffff4a97700 (LWP 47693)]
[gdb_test_node-1] [New Thread 0x7fffeffff700 (LWP 47694)]
[gdb_test_node-1] [New Thread 0x7fffef7fe700 (LWP 47695)]
[gdb_test_node-1] [INFO] [1596488945.551147551] [gdb_test_node]: Starting up
[gdb_test_node-1] [INFO] [1596488945.551304968] [gdb_test_node]: Vector Crashing...
[gdb_test_node-1] terminate called after throwing an instance of 'std::out_of_range'
[gdb_test_node-1]   what():  vector::_M_range_check: __n (which is 100) >= this->size() (which is 0)
[gdb_test_node-1] 
[gdb_test_node-1] Thread 1 "gdb_test_node" received signal SIGABRT, Aborted.
[gdb_test_node-1] __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
[gdb_test_node-1] 50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
^C[WARNING] [launch]: user interrupted with ctrl-c (SIGINT)
[gdb_test_node-1] Quit
[ERROR] [gdb_test_node-1]: process[gdb_test_node-1] failed to terminate '5' seconds after receiving 'SIGINT', escalating to 'SIGTERM'
[INFO] [gdb_test_node-1]: sending signal 'SIGTERM' to process[gdb_test_node-1]
[gdb_test_node-1] Exception ignored in: <gdb._GdbOutputFile object at 0x7f18dc4518b0>
[gdb_test_node-1] Traceback (most recent call last):
[gdb_test_node-1]   File "/usr/share/gdb/python/gdb/__init__.py", line 43, in flush
[gdb_test_node-1]     def flush(self):
[gdb_test_node-1] KeyboardInterrupt: 
[INFO] [gdb_test_node-1]: process has finished cleanly [pid 47675]
[gdb_test_node-1] (gdb) (gdb) 
steve@Eve:~/Documents/nav2_ws$

Additional information

I provide 3 different crash methods to show that in all cases I don't get a clean exist or able to get a GDB prompt in the most simple of nodes.

int main(int argc, char ** argv)
{
  rclcpp::init(argc, argv);

  auto node = std::make_shared<GDBTester>();
  //node->ExitCrash();  // calls exit(-1)
  //node->NullptrCrash();  // tries to access elements of a nullptr
  node->VectorCrash();  // tries to access non-existent vector elements

  rclcpp::spin(node->get_node_base_interface());
  rclcpp::shutdown();
  return 0;
}

Select which you'd like to trigger by uncommenting it.

@SteveMacenski
Copy link
Author

SteveMacenski commented Aug 3, 2020

Maybe the right place is ros2cli, not sure. Some basic tests I did with ros2 run on the nodes appears to be working correctly.

Also note a new ROS2 tutorial on GDB: ros-navigation/docs.nav2.org#58 to help users without debugging experience get backtraces. This is a common issue in Nav2 where users haven't used it before and this is aimed at a single end-to-end resource to help people through it to help in getting more useful debug information in tickets from junior developers when having issues.

@allenh1
Copy link

allenh1 commented Aug 3, 2020

Same behavior for me on Ubuntu 20.04 (distro built from source up-to-date as of yesterday).

@ros-discourse
Copy link

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-backtrace-tutorial-navigation2/15781/3

@siddhya
Copy link

siddhya commented Aug 7, 2020

The -prefix would be very useful. I can see two processes in the foreground. See Sl+ and S+ below. What does that mean? Could that be the reason we do not see the gdb prompt?

 1463 pts/2    Sl+    0:00 /usr/bin/python3 /home/siddhya/git/ros2_ws/install/ros2cli/bin/ros2 launch gdb_test_pkg launch.py
 1465 pts/2    S+     0:00 gdb -ex r --args /home/siddhya/git/nav2_ws/install/gdb_test_pkg/lib/gdb_test_pkg/gdb_test_node --ros-args -r __node:=gdb_test_node
 1468 pts/2    tl     0:00 /home/siddhya/git/nav2_ws/install/gdb_test_pkg/lib/gdb_test_pkg/gdb_test_node --ros-args -r __node:=gdb_test_node```

@hidmic
Copy link
Contributor

hidmic commented Sep 3, 2020

Apologies for the delay!

Crashes occur as an inferior thread and gdb prompt is never returned.

Yes, that's expected. ros2 launch cannot return the gdb prompt, as launch itself doesn't even connect to launched processes' stdin. Still, you can debug these processes if you bring up a separate terminal along with them e.g. using an xterm -e gdb -ex run --args prefix.

Perhaps this is an unfortunate omission in launch, but on the other hand, I don't see how it could. What if there's more than one GDB session in the same launch? Which prompt do you return? Sure, it could detect the gdb prefix and silently execute the process in a separate shell (and window), but that's a bit brittle and potentially surprising. Which makes me wonder how roslaunch used to handle this situation 🤔

I think it's better if the user explicitly does this. Those answers.ros.org entries need an update though.

Maybe the right place is ros2cli, not sure. Some basic tests I did with ros2 run on the nodes appears to be working correctly.

I'd say https://github.com/ros2/launch is the right place. ros2run is a Popen call with ROS aware executable lookup logic. launch is a different beast.

@SteveMacenski
Copy link
Author

SteveMacenski commented Sep 3, 2020

@hidmic then that should be outright removed from the design documentation https://design.ros2.org/articles/roslaunch.html

launch prefix (used to inject things like gdb, valgrind, etc…)

and @sloretz also assumes this should work https://answers.ros.org/question/343326/ros2-prefix-in-launch-file/

prefix=['gdb -ex=r --args'],

also ROS1 http://wiki.ros.org/roslaunch/Tutorials/Roslaunch%20Nodes%20in%20Valgrind%20or%20GDB

launch-prefix="xterm -e gdb --args"

I don't think "that's expected" is a legitimate answer. This worked in ROS1 (which was also a python API), this should work in ROS2, and even the design docs and key engineers agree that this is one of the key uses for prefix. I understand there's engineering challenges involved, but this is a critical feature everyone expects to work.

Feel free to transfer the issue over there if that's the most appropriate place.

@hidmic
Copy link
Contributor

hidmic commented Sep 3, 2020

then that should be outright removed from the design documentation

I don't think it should be removed. Launch prefix does work, and can be used to execute a process in a gdb session, but it does require delegation to a separate terminal to regain access to the prompt.

@sloretz also assumes this should work

I did see that, and that's why I said it may need an update (to reflect the current state of things, that much I didn't say).

This worked in ROS1

It does bring about the question of how this worked in ROS 1. Looking at roslaunch sources, I can see all launched processes inherit the CLI stdin file descriptor (by omitting stdin here). So if one had more than one process that's reading it's stdin, user input would go to all of them. It makes controlling two or more independent GDB sessions in the same launch file impossible without also delegating execution to separate terminals.

I understand people are used to gdb -ex run --args, but I also think we shouldn't do what roslaunch is doing in launch just for the sake of not having to prepend xterm -e (which you have to anyways in ROS 1 for the most general case). But that's just me, perhaps the @ros2/team thinks otherwise.

@SteveMacenski
Copy link
Author

SteveMacenski commented Sep 3, 2020

it does require delegation to a separate terminal to regain access to the prompt.

Can you show that? I'm 95% sure I tried with xterm and also failed to produce a gdb session. I spent a half a day trying a bunch of different combinations between ros2 run, ros2 launch, in the launch file, calling the executable manually, etc and none of my notes indicate I was able to get anything launch file related working at any point.

It makes controlling two or more independent GDB sessions in the same launch file impossible without also delegating execution to separate terminals.

Lets worry about 2+ sessions after we just get 1 session working properly. I think that's the most common application anyhow: the 80/20 rule applies here. Most uses in my experience are because a specific server is crashing I'm trying to debug it in a launch file of N servers (or a launch file with just a single server, but involving a number of remaps, parameter file loading, and other application specific logic). Pulling out that one server, adding all the CLI remaps, param file, node names, etc is time consuming and error prone - and in some cases impossible to exactly replicate.

@clalancette
Copy link
Contributor

Lets worry about 2+ sessions after we just get 1 session working properly.

The problem is that this is a specific instance of the more general case of multiple processes trying to use the stdin handle. For instance, what happens if you try to debug a process via gdb and have teleop_twist_keyboard in the same launch file?

I guess we could go ahead and have launch throw an exception at the beginning if there were 2 or more processes that were trying to get input from stdin.

@SteveMacenski
Copy link
Author

SteveMacenski commented Sep 3, 2020

The problem is that this is a specific instance of the more general case of multiple processes trying to use the stdin handle

I don't disagree, but lets just get to the point where we're even at parity with the feature status of ROS1 and meeting the (currently aspirational) documentation that this works for any case.

Great is the enemy of good 😉 If getting 1 session working, given we have a reference design in ROS1 for it, takes 20% of the time, it'll fill 80% of the need. I think that's a good compromise to get started.

Throwing something would work. Optionally just returning the session from the first that fails. I don't think anyone would run gdb with multiple nodes at the same time unless they were trying to debug a single problem affecting multiple nodes. Otherwise, you're trying to debug N problems at once, hoping you get a particular one, which isn't really an effective process.

@hidmic
Copy link
Contributor

hidmic commented Sep 3, 2020

Can you show that? I'm 95% sure I tried with xterm and also failed to produce a gdb session. I spent a half a day trying a bunch of different combinations between ros2 run, ros2 launch, in the launch file, calling the executable manually, etc and none of my notes indicate I was able to get anything launch file related working at any point.

Yeah, more and better documentation would be good. That's on us. I just opened samsung-ros/gdb_test_pkg#1 with the change I had to make to your repro to have access to the gdb prompt. This requires an X server laying around. I think you can start the process in a tmux session if you don't have one.

Lets worry about 2+ sessions after we just get 1 session working properly.

lets just get to the point where we're even at parity with the feature status of ROS1

Great is the enemy of good. If getting 1 session working, given we have a reference design in ROS1 for it, takes 20% of the time, it'll fill 80% of the need. I think that's a good compromise to get started.

The thing is that the whole idea of connecting ros2 launch standard input with that of the processes it launched is problematic. And, IMHO, introducing a feature knowing from the get go that it is broken for all but one use case (or hacking our way through it, like throwing if we see the gdb substring in more than one launch prefix, as AFAIK there's no way to detect if a child process is reading from an inherited stdin) doesn't seem right.

I'd be onboard if the feature was entirely missing and there were no resources to solve it the right way (whatever that is), but there is a way (or so it seems).

@hidmic
Copy link
Contributor

hidmic commented Sep 3, 2020

@SteveMacenski let me know if using xterm does the trick :)

@SteveMacenski
Copy link
Author

SteveMacenski commented Sep 3, 2020

introducing a feature knowing from the get go that it is broken for all but one use case ... doesn't seem right.

It seems more right than doing nothing and leaving it in its currently bad user state. This is also the exact behavior as in ROS1 so its not as if we're breaking something that was previously different. I think you should be viewing this in a lens of fixing something broken; not breaking something. Better is good. I think the absolute worst outcome is that nothing changes and we just accept this being broken in perpetuity.

To your own point, I don't think there's a clean way of dealing with multiple stdin if there are multiple GDB nodes in a launch file, but this is no different than how its been the last 10 years to my knowledge. The best option I can think of off hand is @clalancette's exception idea. I think that would be a fine enough solution, though I perhaps overly limiting. I could live with it though.

@clalancette
Copy link
Contributor

It seems more right than doing nothing and leaving it in its currently bad user state. This is also the exact behavior as in ROS1 so its not as if we're breaking something that was previously different.

I will say that the whole point of ROS 2 was to rethink these things. So just saying that it worked in ROS 1 is not sufficient, in my view. The fact that it meets a user need is a good reason, but we should think about the consequences of doing so.

I still have the problem that it is easy to come up with a non-contrived use-case that fails pretty badly. We either need to come up with a way to have multiple processes share the handle (which may be very difficult/impossible as you've pointed out), or we need a way to ensure that the user can't easily shoot themselves in the foot with the functionality.

In the meantime, I do believe we have ways to do it, as @hidmic has pointed out. So its not that it is not possible to do, just that it is clunky at the moment.

@SteveMacenski
Copy link
Author

I just want some action on this, its extremely disruptive to my workflow to have to constantly rip out nodes from a launch file and manually set remaps, node names, paths to config files, set node options on the commandline, and then transition up my lifecycle nodes (if necessary). This is major quality of life determinant from ROS1 and its clearly intended to be supported from design documents, answers, etc.

So just saying that it worked in ROS 1 is not sufficient

You're cherry picking one of many points I made why this should be done. You're right by itself that would be a weak argument, but that's not the major point I was making in this discussion.

I do believe we have ways to do it, as @hidmic has pointed out.

I'm re-read the thread, I don't see any suggestions from @hidmic to resolve, can you point that out? The only solution I've seen presented was yours (below):

I guess we could go ahead and have launch throw an exception at the beginning if there were 2 or more processes that were trying to get input from stdin.

Which to me, seems like a reasonable starting point to get the largest issue resolved quickly and prevent bad behavior. It also fixes the general issue of having multiple stdin (like multiple teleop nodes) which is a problem in it of itself disregarding our gdb discussion.

@clalancette
Copy link
Contributor

I'm re-read the thread, I don't see any suggestions from @hidmic to resolve, can you point that out? The only solution I've seen presented was yours (below):

Mich opened a PR: samsung-ros/gdb_test_pkg#1

@wjwwood
Copy link
Member

wjwwood commented Sep 3, 2020

There seems to be an assumption that we could easily just "connect stdin to everything" as ROS 1 does it and it would be fine, but that we don't want to do that for pedantic reasons. I don't think this is the case at all.

I did look into this when implementing it and it is not trivial. The assumption above is flawed because roslaunch in ROS 1 is not implemented the same way as launch, and in ROS 2's launch we used asyncio.subprocess (which is vastly superior in many ways) but it is unable to handle this case easily.

I actually spent the better part of a week trying to make this use case better out of the box and was unable to make it so. If you, or anyone else, thinks it is possible, please I'd love to see that. But keep in mind it needs to not break other use cases and needs to be reliable and it needs to consider things like "sending stdin to all process could cause very confusing behavior for users". It turns out doing it right is very tricky. I'm fairly well convinced that connecting all processes to stdin all the time is a bad idea.

A few more quick replies:

  • it isn't possible to know which processes are trying to access stdin, though we could have a "give stdin" option on our ExecuteProcess actions and raise an error if more than one tries to use that option
  • even if stdin is connected, things like ctrl-C and signals would not be caught by gdb, but instead by launch, and there is no simple way around this
  • the design documentation is still accurate, as xterm+gdb works (and is suggested in ROS 1's wiki too as an option), and for other things you'd use prefix for that are not interactive it is totally fine, e.g. time or valgrind, so suggesting we remove it is not right in my opinion
  • only the second answers.ros.org entry is wrong, the first one is about ros2 run --prefix and that works fine because there is only one process and stdin is connected automatically
    • the second isn't exactly wrong as much as it is incomplete, because it will run with gdb but it just doesn't let you interact with it which is not useful, but if the user wanted valgrind, for instance, that would totally fine

Some ideas:

  • we could have a new option along side prefix like prefix_in_new_terminal which could automatically (and hopefully portably) open it in a new terminal as well as add the prefix, but this is just syntactic sugar
  • the aforementioned give_process_stdin which could only be used by one process in a launch file without producing an error, but this also doesn't address the issue with ctrl-c and signals, and it prevents launch from using stdin
  • note this difference in the ROS 1 to ROS 2 launch migration tutorial, it is mentioned briefly here in passing, but doesn't note the difference in how stdin is used: https://index.ros.org/doc/ros2/Tutorials/Launch-files-migration-guide/#id6

@SteveMacenski
Copy link
Author

SteveMacenski commented Sep 3, 2020

I view xterm as a work around rather than a solution. A solution would be getting a gdb session in the same terminal as users would (apparently naively) expect. To @wjwwood's point, if there are other considerations that make even just that a very non-trivial task, then I understand the difficulty vs marginal benefit argument.

So then moving on from the "lets fix it!" camp to the "lets document it!" camp:

This should just be really explicit then that ROS2 launch does not, and will not, support GDB in terminal session, regardless of the number of nodes and distribution of tasks. Maybe even a [thing] in launch then if you tried to do so with gdb it throws an exception to prevent users from thinking they're going crazy that they're not getting a session back (like me a couple weeks ago). I'm not the only one I know to have spent a bunch of hours wondering where I was going wrong because I wasn't getting a session back.

prefix_in_new_terminal but this is just syntactic sugar

I think I have the same feeling about that as you, this is probably unnecessary. I think if the above documentation is created, just point out xterm as the tool of choice and that largely resolves the issue of unintentionally running into the dark.

give_process_stdin

I kind of like this option as for users that want to handle their own signals, but I agree that its likely going to be misused and largely create more issues than it solves.

@hidmic
Copy link
Contributor

hidmic commented Sep 3, 2020

I agree we need a very big sign saying that launch does not yield access to processes' standard input. But I fail to see why using xterm (or similar) is inadequate. Even more so considering it'll just work for all cases, without pitfalls to watch out for.

@SteveMacenski
Copy link
Author

SteveMacenski commented Sep 3, 2020

But I fail to see why using xterm (or similar) is inadequate.

Its inadequate unless you stop/correct people from doing impossible things and heavily documenting the presence of this issue invalidating some previously common workflows. William's comment makes it clear that it doesn't work, nor will it probably ever work. Changing workflows is fine, when there's a large technical benefit from doing so (like, requiring 500 hours of engineering time for a small feature), and the old workflow is clearly denoted as now invalid (documentation and exceptions saying that you can't do that, preferably with a URL to the docs about what to do instead).

Else, like what I ran into, folks end up spinning their wheels not knowing what we're trying to do is impossible but previously common (and documentation implies that it should work).

@wjwwood
Copy link
Member

wjwwood commented Sep 3, 2020

I don't think xterm is a workaround, in my opinion it is the solution, but more helpful messages is always good.

We can make note for gdb, but other interactive prefixes (like lldb) as well as interactive processes (like keyboard teleop), will also not work and we cannot automatically detect them all.

@hidmic
Copy link
Contributor

hidmic commented Sep 3, 2020

Else, like what I ran into, folks end up spinning their wheels not knowing what we're trying to do is impossible but previously common (and documentation implies that it should work).

Ok, I see. I just wanted to make sure there wasn't any other technical issue that'd prevent you from using it.

I fully agree documentation is lacking. Contributions are always welcomed :)

@SteveMacenski
Copy link
Author

SteveMacenski commented Sep 4, 2020

Sure thing 😉

I’d be happy to write up some documentation for ROS Index on top of my GDB tutorial on navigation.ros.org. Maybe add a note to the design doc? The launch side of things, throwing an exception when running gdb in the same session, I have markedly less experience with. I’d be happy to give it a stab if you can point me to the right direction, else I’d also be fine if someone knowing this codebase better did it as well.

@adamsj-ros
Copy link

I think xterm levies an additional dependency on having X installed and configured. What about systems where it's not default to have an X service running like Mac, Windows, containers, etc?

@wjwwood
Copy link
Member

wjwwood commented Nov 3, 2020

Using gdb also places a dependency on it. macOS uses lldb and most windows users would use something else. I don’t see using xterm as any different. It is definitely a non portable solution.

As for containers and such you could use another solution to control focus, like screen or remix maybe. But I don’t see it as a launch specific problem to solve. We can add helpers if good solutions come up from users.

@adamsj-ros
Copy link

I understand what you're saying about gdb and lldb adding dependencies but I think it's for a different reason than xterm. I think the use case we're attempting to handle is to get a back trace or some other debug function. The assumption is that a command line debugger (from any platform) could be used from the same terminal we use to launch. We would insert the appropriate command line debugger into the launch prefix. This could even be a runtime switch to add the correct prefix based on the toolchain or an environment variable.

The X service sometimes requires a lot more setup and configuration than a command line debugger that comes with the toolchain the developer is already using. At least lldb and gdb would work well with this setup. I don't know much about command line debuggers for Windows, but it looks like Visual Studio has an option.

Some systems including embedded and realtime often don't even have the option to run xterm but they usually do have command line debuggers or related tools.

@wjwwood
Copy link
Member

wjwwood commented Nov 3, 2020

I think you're missing the point, my point is that xterm is linux specific just like the debugging tools. It's different than the fact that xterm needs X.org setup, sure, but it's a similar kind of non-portable things you might put in your launch file. It might work on your machine, but not others.

For the fact that xterm needs X.org, you could replace xterm with any number of other tools that allow you to control the focus of your input, like other graphical solutions like gnome-terminal which still need X.org or text based ones like screen, byobu, or tmux which may not. They're still helping you control the focus of input, and they have different trade-offs and dependencies. But all of them solve the problem so launch doesn't have to do so.

In the end, launch is not depending on xterm, the user may choose to use xterm with launch to solve this problem. There are other tools and solutions with different trade-offs. The user may choose to use them too.

@hidmic
Copy link
Contributor

hidmic commented Nov 3, 2020

@wjwwood you beat me to it 😂

As William mentioned, @adamsj-oci for your use case i.e. a single (remote?) terminal in a machine that has no X server running, you could resort to something like tmux. From within a tmux session, use tmux split-window gdb -ex run --args as your launch prefix. You'll get one pane for each process that has said prefix. You can even pack everything up and do tmux new-session ros2 launch <package-name> <launch-file> -- it'll open one pane for launch alongside those for the processes you intend to debug.

@wjwwood
Copy link
Member

wjwwood commented Nov 3, 2020

From within a tmux session, use tmux split-window gdb -ex run --args as your launch prefix. You'll get one pane for each process that has said prefix.

Definitely something worth adding to the examples I think, thanks @hidmic. :)

@adamsj-ros
Copy link

Whatever moves us closer to a general solution for breakpoint debugging from processes spawned from launch. I'll try tmux. I definitely think some examples on index.ros.org would be helpful.

@SteveMacenski
Copy link
Author

I definitely think some examples on index.ros.org would be helpful.

Would you like to write one? It's pretty simple markdown and the repo for a PR is here. Wouldn't take you more than an hour end-to-end!

@adamsj-ros
Copy link

Good idea. If I figure it out, I'll definitely make an update.

@SteveMacenski
Copy link
Author

Here's an example of my GDB tutorial on the Nav2 documentation. As you can see, I also need to update from our xterm conversation as this was written while I was under the impression this was fixable.

@adamsj-ros
Copy link

could resort to something like tmux. From within a tmux session, use tmux split-window gdb -ex run --args as your launch prefix

Combining this suggestion with the example @SteveMacenski provided, it looks like I have to start tmux first to make this work otherwise I get:

no server running on /tmp/tmux-0/default

@hidmic
Copy link
Contributor

hidmic commented Nov 4, 2020

Correct. You can follow the second half of the suggestion:

You can even pack everything up and do tmux new-session ros2 launch <package-name> <launch-file> -- it'll open one pane for launch alongside those for the processes you intend to debug.

@adamsj-ros
Copy link

I'm still working on an elegant way to tmux the two nodes/processes in my launch file. While I was working that, it occured to me that the easier way to get a back trace is to get a core file on crash. If someone would want to step through, the core file obviously wouldn't work.

What @hidmic described, worked in a way for me but I really need to get the right gdb options to make the process run from the start. I thought the run option did that but it wasn't working in my case. Maybe it was something in the setup of tmux. For the purposes of this issue, it's a viable option but didn't end up being the best option for my need.

@SteveMacenski
Copy link
Author

FYI, the ROS2 backtrace tutorial now reflects the xterm recommendations: https://navigation.ros.org/tutorials/docs/get_backtrace.html

I'm not sure the best way to handle this ticket from here. I think we have a sufficient work around to say that things work, however not really what I'd consider to be a "complete" solution.

I'd be OK with closing out this ticket if we think having GDB inline of launch system just is never going to be something possible. Some highly-visible documentation to that effect would be valuable context.

@clalancette
Copy link
Contributor

I'd be OK with closing out this ticket if we think having GDB inline of launch system just is never going to be something possible.

"Never" is a long time, but at present I don't see a way towards that goal. So definitely "for now".

Some highly-visible documentation to that effect would be valuable context.

I'm willing to put a note somewhere, but I'm not sure where. We have https://index.ros.org/doc/ros2/Tutorials/Launch-system/ , but I'm not sure if that is visible enough. There is also ros2/ros2_documentation#876 , but we haven't merged it yet. @SteveMacenski any opinions?

@SteveMacenski
Copy link
Author

@adamsj-oci are you open to writing a quick document about this for ROS index? That tmux stuff would good too!

@clalancette I agree neither of those are really great places, but the first look seems like the best of the two. I think a separate doc on this topic would be helpful, it doesn't have to be very long, just cover the basics of how to do it with xterm / tmux and a note about why it doesn't work the way you might expect without them. If Adam isn't able to do it, I can add it to my backlog.

@rcywongaa
Copy link

rcywongaa commented Feb 13, 2021

From within a tmux session, use tmux split-window gdb -ex run --args as your launch prefix. You'll get one pane for each process that has said prefix. You can even pack everything up and do tmux new-session ros2 launch <package-name> <launch-file> -- it'll open one pane for launch alongside those for the processes you intend to debug.

This doesn't seem to work as tmux split-window runs the command in a bare sh shell (or bypasses the shell completely), thus you lose all the ROS related environment variables which leads to errors like

error while loading shared libraries: librclcpp.so: cannot open shared object file: No such file or directory

I have yet to find a way to override this behavior. In order to have split-window run multiple commands, quotes are required, but there's no way to also include the actual executable in the quote.

On a separate note, I think the biggest hurdle to getting gdb / tmux / gdbgui working with prefix is the fact that the actual executable (.../node --ros-args -r __node:=Node --params-file /tmp/launch_params_cd1tg5ie --params-file /tmp/launch_params_665qxiyz) is blindly concatenated to the prefix which messes up tools that don't handle multiple unquoted arguments well. I'm wondering since we are already using a general purpose language (python), can't we do some more sophisticated string substitutions to make it easier to inject commands via prefix? At the very least, being able to add quotes around the executable string (.../node --ros-args -r __node:=Node --params-file /tmp/launch_params_cd1tg5ie --params-file /tmp/launch_params_665qxiyz) should make many tools play nicely with this.

I imagine something along the lines of

Node(
    package="...",
    executable="...",
    preprocess_cmd="tmux new-window '/opt/ros/install/env.sh $@'"
)

where $@ will be substituted with .../node --ros-args -r __node:=Node --params-file /tmp/launch_params_cd1tg5ie --params-file /tmp/launch_params_665qxiyz

In fact, it might even be cleaner / easier / make more sense if this preprocessing was done outside of the Node declaration. Something like

node = Node(...)
preprocessed_node = preprocess("some fancy debug command %@", node)
return LaunchDescription([preprocessed_node])

Another potentially simpler approach is to add the suffix so we can do something like

prefix=["tmux new-window '/opt/ros/foxy/env.sh gdb -ex run --args"],
suffix=["'"]

@hidmic
Copy link
Contributor

hidmic commented Feb 15, 2021

This doesn't seem to work as tmux split-window runs the command in a bare sh shell (or bypasses the shell completely), thus you lose all the ROS related environment variables

Ahh, yeah, there's a subtle gotcha in there. You must source your workspace in the same environment in which the tmux server is started (see here). Unless you're running many sessions simultaneously, sourcing your workspace before running tmux new-session will do.


re: prefix improvements. Contributions are most welcomed, though I will say that I fail to see why the additional complexity (when needed) cannot be handled by a custom script (i.e. prefix="path/to/some/script.sh").

@adamsj-ros
Copy link

@adamsj-oci are you open to writing a quick document about this for ROS index? That tmux stuff would good too!

I didn't get it working. I decided to use core files instead which solved my immediate problem. I do think there should be an example on index that equals what we had for functionality in ROS1.

@ZhenshengLee
Copy link

Another potentially simpler approach is to add the suffix so we can do something like

prefix=["tmux new-window '/opt/ros/foxy/env.sh gdb -ex run --args"],
suffix=["'"]

I think the suffix can help to debug with coredump, but currently there is no suffix in launch system am I right?

@Ryanf55
Copy link

Ryanf55 commented Jan 13, 2024

I contributed changes to the tutorials to explain how to run tests under gdb:
https://docs.ros.org/en/humble/Tutorials/Intermediate/Testing/CLI.html#debugging-tests-with-gdb

The NAV2 documentation explains how to add a prefix to a python launch file for gdb, but doesn't include the XML syntax. It's not clear to me why this information lives in NAV2, if it's a generic limitation to launch_ros.

ros2 run works fine, but I can't get XML launch files working with gdb using the same prefix.

<node pkg="grid_map_geo" exec="map_publisher" name="map_publisher" output="screen" launch-prefix="xterm -e gdb -ex run --args" > 

Could we document the recommended process on how to debug ROS 2 with GDB? Most packages are using ros2 launch, the application complexity makes it hard to figure out how to launch. Many people also use VSCode, and that supposedly has support for setting breakpoints in nodes run through launch, however there's nothing in the ROS documentation about the VSCode ROS extension, and I haven't been able to get it working.

Another idea:
Since it's hard to figure out all the arguments passed to a node, why can't Launch offer an option to print out the command that you can copy and paste in your terminal?

For example, what if a lanch file that launches two nodes could print out the commands wih something like show-cli-gdb-cmd

ros2 launch foo_pkg bar_baz.launch.xml --show-cli-gdb-cmd
>>> ros2 run --prefix 'gdb -ex run --args' foo_pkg bar_node --ros-args -p paramA:=4 paramB:=5
>>> ros2 run --prefix 'gdb -ex run --args' foo_pkg baz_node --ros-args -p paramC:=8 paramD:=19

Then, a user could just copy-paste that into their terminal and start the node.

@ros-discourse
Copy link

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/growing-issue-with-ros-documentation/36075/64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests