Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROS 2 Launch System #163

Merged
merged 18 commits into from
Sep 18, 2019
Merged

ROS 2 Launch System #163

merged 18 commits into from
Sep 18, 2019

Conversation

wjwwood
Copy link
Member

@wjwwood wjwwood commented Feb 13, 2018

This is a WIP. It's not ready for review or comment right now (not even in a readable state I would say).

When it's ready for an early review I'll post a comment. When it's ready for wider review I'll post an RFC on discourse.ros.org.

@wjwwood wjwwood added the in progress Actively being worked on (Kanban column) label Feb 13, 2018
@wjwwood wjwwood self-assigned this Feb 13, 2018
articles/150_roslaunch.md Outdated Show resolved Hide resolved
articles/150_roslaunch.md Show resolved Hide resolved
articles/150_roslaunch.md Show resolved Hide resolved
articles/150_roslaunch.md Outdated Show resolved Hide resolved
articles/150_roslaunch.md Outdated Show resolved Hide resolved
articles/150_roslaunch.md Outdated Show resolved Hide resolved
articles/150_roslaunch.md Show resolved Hide resolved
articles/150_roslaunch.md Outdated Show resolved Hide resolved
articles/150_roslaunch.md Show resolved Hide resolved
articles/150_roslaunch.md Outdated Show resolved Hide resolved
articles/150_roslaunch.md Outdated Show resolved Hide resolved
articles/150_roslaunch.md Outdated Show resolved Hide resolved
articles/150_roslaunch.md Outdated Show resolved Hide resolved
articles/150_roslaunch.md Outdated Show resolved Hide resolved
@jack-oquin
Copy link

The current description abstracts the actual syntax of the "system configuration". That's nice at this design stage.

However, I find myself intensely curious whether XML, Python or some other syntax is under consideration.

@gbiggs
Copy link
Member

gbiggs commented Mar 1, 2018

Are we reviewing this now? I was holding off until @wjwwood said go for it, but would be happy to start commenting. 😄

@jack-oquin
Copy link

I am commenting because I find the document and "RFC" sections worth discussing, and probably because I don't know any better. 😄

@wjwwood
Copy link
Member Author

wjwwood commented Mar 1, 2018

Are we reviewing this now? I was holding off until @wjwwood said go for it, but would be happy to start commenting. 😄

I don't mind comments now, but it's not "ready" for review yet. I'm trying to push out a completed document as soon as possible, at which point I will solicit feedback actively, first on the pr and then on discourse. But until then, feel free to discuss, though I might not get back to you right away.

@wjwwood
Copy link
Member Author

wjwwood commented Mar 2, 2018

Good morning :), I pushed a set of changes for the "context" section where it compares it to ROS 1 and considers what might be different. I'm close but not quite finished with the system description and event sections, at which I think it will be in a relatively good place to start discussion on the pr (not quite ready for discourse still though).

@jack-oquin I'll try to respond to your comments asap, but I'll have to switch gears a bit before I have time to do that.

@jack-oquin
Copy link

No hurry on my account.

@stonier
Copy link

stonier commented Mar 14, 2018

Configuring the Quality of Service for Connections (reliable, unreliable, ...)

In ROS1 you could only do this in the code itself. It would have been awesome if you could configure this at the launch level. You could rewire connections, but could not reconfigure the kind of connection. That always struck me as odd.

We did however, manage to get by on workarounds. If it was our code, we parameterised the node to provide QoS configuration. If it wasn't, we got by with relays. Though for the product, we wrote specialised nodes to handle a conglomeration of relays with various capabilities (subsampling, QoS, ...) to minimise the impact of so many extra processes. We even had ghosts on the server to manage the unreliable connections to the robot and present a an api to the rest of the server that was always 'up'.

How hard would this be to do for ROS2's launch system? In scope?

@wjwwood
Copy link
Member Author

wjwwood commented Mar 20, 2018

In ROS1 you could only do this in the code itself. It would have been awesome if you could configure this at the launch level. You could rewire connections, but could not reconfigure the kind of connection. That always struck me as odd.

So, DDS does have a way to do this with XML files which can affect the QoS of entities in a program or globally. And we could do something similar.

However, I personally dislike this kind of runtime configuration, because when the person writing the code originally is not the person doing the system integration, changing QoS settings might break expectations.

As an example, a developer might create a node with a publisher which expects certain blocking behavior from the publish call (e.g. non-blocking), but if you change the QoS settings that might cause the publish call's blocking behavior to change, which breaks the expectation of the original programmer and thus subtly breaking the program itself.

Another example of this might be that a developer creates a program which pre-allocates 10 messages and sets the publish history to keep last and depth 10. If you externally change that to say 100, then maybe that breaks the program because now the history is larger than the resources pre-allocated by the developer.

Remapping of topic names is similar, and so the previous paragraphs might sound hypocritical, but I would argue that the important difference is that changing a topic name doesn't change the behavior of any of the API's in ways that might cause the program to break (as far as I can imagine).

We did however, manage to get by on workarounds. If it was our code, we parameterised the node to provide QoS configuration. If it wasn't, we got by with relays. Though for the product, we wrote specialised nodes to handle a conglomeration of relays with various capabilities (subsampling, QoS, ...) to minimise the impact of so many extra processes.

I actually very much prefer this approach of having developer defined configurations which system integrators can tweak. By only allowing changes to settings the developer exposes, the developer can assume the others are not changing under their feet (so it avoids the subtle changes in behavior when QoS changes) and for the ones they expose then they can also put constraints on them, e.g. if the developer wanted to make the depth configurable, then they could expose a ROS parameter for that, and then constrain it to be within the range 10-100.

As for relays which rename, throttle, or downsample, I think the approach of having these are reusable nodes which can be added to the process of other user defined nodes is a better approach. In ROS 1 you had to have a separate process for each and that caused a lot of overhead.

So that's a bit opinionated from me about the way systems should be constructed, but I'm open to being convinced that runtime configuration of QoS is a good idea.

How hard would this be to do for ROS2's launch system? In scope?

It would be hard as described because we have no way to control this right now in the C++/Python API. Like if a developer hard codes the history depth to 10 with keep last, then we have no way to intercept that and change it as it is being created. We could add that, it just isn't planned as far as I know.

I do think this out of scope for the launch system because the only impact it would have on the launch system is whether or not it's represented in the launch description and if so how it is communicated to the process containing the node. So I'd mention here after we decide to support that use case, but I'd put it in the same bucket as static remapping and parameters which are mentioned here but not part of the actual launch system itself.

@stonier
Copy link

stonier commented Mar 20, 2018

Going to take a step back and say I completely agree with you. That feature thought was from a few years ago, yet when working on the product, we aggressively found ways to very carefully manage robot-server connections using a variety of methods, only some of which I have mentioned above. If I think to where we might apply roslaunch QoS configuration, it would be for almost trivial situations and not for where the critical problems lie.

These machines all have SSH, which is the mechanism which is specifically called out to be used when launching processes on remote machines.
It also played a role in defining what you specified and how when configuring `roslaunch` from ROS 1 to be able to launch processes on remote machines.

In ROS 2, Windows has been added to the list of targeted platforms, and as of the writing of this document it does not support SSH natively.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would requiring the execution of a launch agent process on each additional available machine be a good solution for this? The agent could register its hostname, ip etc. (via a normal ros topic maybe?), which could then be used by the primary launch processes to delegate launches on other machines.

Would people be OK with each additional computer requiring some sort of ros2 launch client running? There could be some system that allows waiting for worker clients, handling events on an expected 2nd(+) computer failing to be present (e.g. timeout, connection loss).

Another option would be to allow launching the remote launch agent remotely (like Jenkins does with its agents). That way we could have a ssh launched worker on linux/mac, psexec launched processes on Windows. This does raise further questions about cross platform support.

If the default behavior is that the agent is launched somehow(daemon, cron, service, manually, etc.) on each secondary machine, cross-platform behavior would be identical. This would be portable but a departure from the convenience of ros1's ability to launch everything at once (despite the shortcomings of that specific implementation in error handling and recovery).

If there is a system to automatically launch such agents from the primary machine via an OS specific handler, things might get very complicated to implement and test (windows->linux launches for example).

sloretz and others added 3 commits February 27, 2019 14:24
* Proposal for dynamically composed nodes

* allow multiple extra_arguments

* Allow node_name and namespace to be empty

* Human readable error message

* Update articles/150_roslaunch.md

Co-Authored-By: sloretz <shane.loretz@gmail.com>

* Assign nodes unique ids, but still forbid duplicates

* Update articles/150_roslaunch.md

Co-Authored-By: sloretz <shane.loretz@gmail.com>

* Update articles/150_roslaunch.md

Co-Authored-By: sloretz <shane.loretz@gmail.com>

* Section to list

* More generic wording about container processes

* namespace -> node_namespace

* _launch/ -> ~/_container/

Signed-off-by: Shane Loretz <sloretz@osrfoundation.org>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: ivanpauno <ivanpauno@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: ivanpauno <ivanpauno@ekumenlabs.com>
articles/151_roslaunch_xml.md Show resolved Hide resolved
articles/151_roslaunch_xml.md Outdated Show resolved Hide resolved
ivanpauno and others added 5 commits July 29, 2019 14:36
Signed-off-by: ivanpauno <ivanpauno@ekumenlabs.com>
Signed-off-by: William Woodall <william@osrfoundation.org>
Signed-off-by: William Woodall <william@osrfoundation.org>
Signed-off-by: William Woodall <william@osrfoundation.org>
* Add launch XML substitution for a packages share directory

Rename find-pkg to find-pkg-prefix.
Add find-pkg-share substitution for the share directory.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
@ivanpauno
Copy link
Member

I don't know if this is ready to be merged or not, but it would be good if we merge ros2 launch related design documents.
Ongoing discussions could continue here after being merged, or in follow-up issues.

@wjwwood if you think a review is needed before merging, let me know and I will do it.

@wjwwood
Copy link
Member Author

wjwwood commented Sep 17, 2019

Let me have a once more over it. I was planning on merging it a few weeks ago and it got lost under the avalanche.

Signed-off-by: William Woodall <william@osrfoundation.org>
@wjwwood
Copy link
Member Author

wjwwood commented Sep 18, 2019

Thanks for the feedback everyone. I'm going to merge this so we can continue to build on it with more pull requests. Feel free to continue discussion here or open new pull requests as needed to extend this document.

@wjwwood wjwwood merged commit 478b1b1 into gh-pages Sep 18, 2019
@delete-merged-branch delete-merged-branch bot deleted the roslaunch branch September 18, 2019 18:32
@KenwoodFox
Copy link

Did ros2/launch#31 ever get looked at here? I noticed it had a few comments linking back, robot_upstart was for ros1 iirc, is it still the recomended course of action to leverage things like initd or systemd for running launch files at boot?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in progress Actively being worked on (Kanban column)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet