-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROS2 Launch fails to return GDB session for debugging #165
Comments
Maybe the right place is ros2cli, not sure. Some basic tests I did with Also note a new ROS2 tutorial on GDB: ros-navigation/docs.nav2.org#58 to help users without debugging experience get backtraces. This is a common issue in Nav2 where users haven't used it before and this is aimed at a single end-to-end resource to help people through it to help in getting more useful debug information in tickets from junior developers when having issues. |
Same behavior for me on Ubuntu 20.04 (distro built from source up-to-date as of yesterday). |
This issue has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-backtrace-tutorial-navigation2/15781/3 |
The -prefix would be very useful. I can see two processes in the foreground. See Sl+ and S+ below. What does that mean? Could that be the reason we do not see the gdb prompt?
|
Apologies for the delay!
Yes, that's expected. Perhaps this is an unfortunate omission in I think it's better if the user explicitly does this. Those
I'd say https://github.com/ros2/launch is the right place. |
@hidmic then that should be outright removed from the design documentation https://design.ros2.org/articles/roslaunch.html
and @sloretz also assumes this should work https://answers.ros.org/question/343326/ros2-prefix-in-launch-file/
also ROS1 http://wiki.ros.org/roslaunch/Tutorials/Roslaunch%20Nodes%20in%20Valgrind%20or%20GDB
I don't think "that's expected" is a legitimate answer. This worked in ROS1 (which was also a python API), this should work in ROS2, and even the design docs and key engineers agree that this is one of the key uses for prefix. I understand there's engineering challenges involved, but this is a critical feature everyone expects to work. Feel free to transfer the issue over there if that's the most appropriate place. |
I don't think it should be removed. Launch prefix does work, and can be used to execute a process in a
I did see that, and that's why I said it may need an update (to reflect the current state of things, that much I didn't say).
It does bring about the question of how this worked in ROS 1. Looking at I understand people are used to |
Can you show that? I'm 95% sure I tried with xterm and also failed to produce a gdb session. I spent a half a day trying a bunch of different combinations between
Lets worry about 2+ sessions after we just get 1 session working properly. I think that's the most common application anyhow: the 80/20 rule applies here. Most uses in my experience are because a specific server is crashing I'm trying to debug it in a launch file of N servers (or a launch file with just a single server, but involving a number of remaps, parameter file loading, and other application specific logic). Pulling out that one server, adding all the CLI remaps, param file, node names, etc is time consuming and error prone - and in some cases impossible to exactly replicate. |
The problem is that this is a specific instance of the more general case of multiple processes trying to use the stdin handle. For instance, what happens if you try to debug a process via I guess we could go ahead and have launch throw an exception at the beginning if there were 2 or more processes that were trying to get input from stdin. |
I don't disagree, but lets just get to the point where we're even at parity with the feature status of ROS1 and meeting the (currently aspirational) documentation that this works for any case. Great is the enemy of good 😉 If getting 1 session working, given we have a reference design in ROS1 for it, takes 20% of the time, it'll fill 80% of the need. I think that's a good compromise to get started. Throwing something would work. Optionally just returning the session from the first that fails. I don't think anyone would run gdb with multiple nodes at the same time unless they were trying to debug a single problem affecting multiple nodes. Otherwise, you're trying to debug |
Yeah, more and better documentation would be good. That's on us. I just opened samsung-ros/gdb_test_pkg#1 with the change I had to make to your repro to have access to the
The thing is that the whole idea of connecting I'd be onboard if the feature was entirely missing and there were no resources to solve it the right way (whatever that is), but there is a way (or so it seems). |
@SteveMacenski let me know if using |
It seems more right than doing nothing and leaving it in its currently bad user state. This is also the exact behavior as in ROS1 so its not as if we're breaking something that was previously different. I think you should be viewing this in a lens of fixing something broken; not breaking something. To your own point, I don't think there's a clean way of dealing with multiple |
I will say that the whole point of ROS 2 was to rethink these things. So just saying that it worked in ROS 1 is not sufficient, in my view. The fact that it meets a user need is a good reason, but we should think about the consequences of doing so. I still have the problem that it is easy to come up with a non-contrived use-case that fails pretty badly. We either need to come up with a way to have multiple processes share the handle (which may be very difficult/impossible as you've pointed out), or we need a way to ensure that the user can't easily shoot themselves in the foot with the functionality. In the meantime, I do believe we have ways to do it, as @hidmic has pointed out. So its not that it is not possible to do, just that it is clunky at the moment. |
I just want some action on this, its extremely disruptive to my workflow to have to constantly rip out nodes from a launch file and manually set remaps, node names, paths to config files, set node options on the commandline, and then transition up my lifecycle nodes (if necessary). This is major quality of life determinant from ROS1 and its clearly intended to be supported from design documents, answers, etc.
You're cherry picking one of many points I made why this should be done. You're right by itself that would be a weak argument, but that's not the major point I was making in this discussion.
I'm re-read the thread, I don't see any suggestions from @hidmic to resolve, can you point that out? The only solution I've seen presented was yours (below):
Which to me, seems like a reasonable starting point to get the largest issue resolved quickly and prevent bad behavior. It also fixes the general issue of having multiple |
Mich opened a PR: samsung-ros/gdb_test_pkg#1 |
There seems to be an assumption that we could easily just "connect stdin to everything" as ROS 1 does it and it would be fine, but that we don't want to do that for pedantic reasons. I don't think this is the case at all. I did look into this when implementing it and it is not trivial. The assumption above is flawed because roslaunch in ROS 1 is not implemented the same way as launch, and in ROS 2's launch we used I actually spent the better part of a week trying to make this use case better out of the box and was unable to make it so. If you, or anyone else, thinks it is possible, please I'd love to see that. But keep in mind it needs to not break other use cases and needs to be reliable and it needs to consider things like "sending stdin to all process could cause very confusing behavior for users". It turns out doing it right is very tricky. I'm fairly well convinced that connecting all processes to stdin all the time is a bad idea. A few more quick replies:
Some ideas:
|
I view So then moving on from the "lets fix it!" camp to the "lets document it!" camp: This should just be really explicit then that ROS2 launch does not, and will not, support GDB in terminal session, regardless of the number of nodes and distribution of tasks. Maybe even a [thing] in
I think I have the same feeling about that as you, this is probably unnecessary. I think if the above documentation is created, just point out xterm as the tool of choice and that largely resolves the issue of unintentionally running into the dark.
I kind of like this option as for users that want to handle their own signals, but I agree that its likely going to be misused and largely create more issues than it solves. |
I agree we need a very big sign saying that |
Its inadequate unless you stop/correct people from doing impossible things and heavily documenting the presence of this issue invalidating some previously common workflows. William's comment makes it clear that it doesn't work, nor will it probably ever work. Changing workflows is fine, when there's a large technical benefit from doing so (like, requiring 500 hours of engineering time for a small feature), and the old workflow is clearly denoted as now invalid (documentation and exceptions saying that you can't do that, preferably with a URL to the docs about what to do instead). Else, like what I ran into, folks end up spinning their wheels not knowing what we're trying to do is impossible but previously common (and documentation implies that it should work). |
I don't think We can make note for |
Ok, I see. I just wanted to make sure there wasn't any other technical issue that'd prevent you from using it. I fully agree documentation is lacking. Contributions are always welcomed :) |
Sure thing 😉 I’d be happy to write up some documentation for ROS Index on top of my GDB tutorial on navigation.ros.org. Maybe add a note to the design doc? The launch side of things, throwing an exception when running gdb in the same session, I have markedly less experience with. I’d be happy to give it a stab if you can point me to the right direction, else I’d also be fine if someone knowing this codebase better did it as well. |
I think |
Using As for containers and such you could use another solution to control focus, like screen or remix maybe. But I don’t see it as a launch specific problem to solve. We can add helpers if good solutions come up from users. |
I understand what you're saying about The X service sometimes requires a lot more setup and configuration than a command line debugger that comes with the toolchain the developer is already using. At least Some systems including embedded and realtime often don't even have the option to run |
I think you're missing the point, my point is that xterm is linux specific just like the debugging tools. It's different than the fact that xterm needs X.org setup, sure, but it's a similar kind of non-portable things you might put in your launch file. It might work on your machine, but not others. For the fact that xterm needs X.org, you could replace xterm with any number of other tools that allow you to control the focus of your input, like other graphical solutions like In the end, |
@wjwwood you beat me to it 😂 As William mentioned, @adamsj-oci for your use case i.e. a single (remote?) terminal in a machine that has no X server running, you could resort to something like |
Definitely something worth adding to the examples I think, thanks @hidmic. :) |
Whatever moves us closer to a general solution for breakpoint debugging from processes spawned from launch. I'll try |
Would you like to write one? It's pretty simple markdown and the repo for a PR is here. Wouldn't take you more than an hour end-to-end! |
Good idea. If I figure it out, I'll definitely make an update. |
Here's an example of my GDB tutorial on the Nav2 documentation. As you can see, I also need to update from our xterm conversation as this was written while I was under the impression this was fixable. |
Combining this suggestion with the example @SteveMacenski provided, it looks like I have to start tmux first to make this work otherwise I get:
|
Correct. You can follow the second half of the suggestion:
|
I'm still working on an elegant way to tmux the two nodes/processes in my launch file. While I was working that, it occured to me that the easier way to get a back trace is to get a core file on crash. If someone would want to step through, the core file obviously wouldn't work. What @hidmic described, worked in a way for me but I really need to get the right |
FYI, the ROS2 backtrace tutorial now reflects the I'm not sure the best way to handle this ticket from here. I think we have a sufficient work around to say that things work, however not really what I'd consider to be a "complete" solution. I'd be OK with closing out this ticket if we think having GDB inline of launch system just is never going to be something possible. Some highly-visible documentation to that effect would be valuable context. |
"Never" is a long time, but at present I don't see a way towards that goal. So definitely "for now".
I'm willing to put a note somewhere, but I'm not sure where. We have https://index.ros.org/doc/ros2/Tutorials/Launch-system/ , but I'm not sure if that is visible enough. There is also ros2/ros2_documentation#876 , but we haven't merged it yet. @SteveMacenski any opinions? |
@adamsj-oci are you open to writing a quick document about this for ROS index? That tmux stuff would good too! @clalancette I agree neither of those are really great places, but the first look seems like the best of the two. I think a separate doc on this topic would be helpful, it doesn't have to be very long, just cover the basics of how to do it with xterm / tmux and a note about why it doesn't work the way you might expect without them. If Adam isn't able to do it, I can add it to my backlog. |
This doesn't seem to work as
I have yet to find a way to override this behavior. In order to have On a separate note, I think the biggest hurdle to getting I imagine something along the lines of
where In fact, it might even be cleaner / easier / make more sense if this preprocessing was done outside of the Node declaration. Something like
Another potentially simpler approach is to add the
|
Ahh, yeah, there's a subtle gotcha in there. You must source your workspace in the same environment in which the re: prefix improvements. Contributions are most welcomed, though I will say that I fail to see why the additional complexity (when needed) cannot be handled by a custom script (i.e. |
I didn't get it working. I decided to use core files instead which solved my immediate problem. I do think there should be an example on index that equals what we had for functionality in ROS1. |
I think the |
I contributed changes to the tutorials to explain how to run tests under gdb: The NAV2 documentation explains how to add a prefix to a python launch file for gdb, but doesn't include the XML syntax. It's not clear to me why this information lives in NAV2, if it's a generic limitation to
<node pkg="grid_map_geo" exec="map_publisher" name="map_publisher" output="screen" launch-prefix="xterm -e gdb -ex run --args" > Could we document the recommended process on how to debug ROS 2 with GDB? Most packages are using Another idea: For example, what if a lanch file that launches two nodes could print out the commands wih something like
Then, a user could just copy-paste that into their terminal and start the node. |
This issue has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/growing-issue-with-ros-documentation/36075/64 |
Bug report
Required Info:
Steps to reproduce issue
clone https://github.com/samsung-ros/gdb_test_pkg, build, and run launch file, e.g.
This package is a simple ROS2 package template with 1 trivial node only containing functions to cause crashes to display this issue
Expected behavior
Crashes to occur then gdb prompt to be returned. From that prompt able to get a traceback / see info.
Actual behavior
Crashes occur as an inferior thread and gdb prompt is never returned. For packages with clean exits normally, using the GDB prefix fails to cleanly exit resulting in a
SIGTERM
and some errors. Ultimately exits but doesn't provide with a GDB session to get a backtrace.the only way I'm able to get a backtrace or gdb prompt is to bypass
prefix
altogether and call direct install path executables. E.g.which makes
prefix
not very useful and all the instructions for ROS2 debugging are incorrect. (https://answers.ros.org/question/267261/how-can-i-run-ros2-nodes-in-a-debugger-eg-gdb/ and https://answers.ros.org/question/343326/ros2-prefix-in-launch-file/).
Additional information
I provide 3 different crash methods to show that in all cases I don't get a clean exist or able to get a GDB prompt in the most simple of nodes.
Select which you'd like to trigger by uncommenting it.
The text was updated successfully, but these errors were encountered: