Skip to content

Commit

Permalink
Various edits (#95)
Browse files Browse the repository at this point in the history
* Add ROSCon 2015 slides

* Update instructions for running markdown linters

* Edits

* Edits

* ROS2 -> ROS 2

* Fixup

* Add link to present-day implementation details

* Copyedit

* Add categories to index (only one-worded titles atm)

* Add link to ros2 page in case someone stumbles upon this site
  • Loading branch information
dhood authored and wjwwood committed Oct 7, 2016
1 parent 7d27e4e commit 580664f
Show file tree
Hide file tree
Showing 14 changed files with 171 additions and 118 deletions.
2 changes: 1 addition & 1 deletion _config.yml
@@ -1,4 +1,4 @@
name: ROS2 Design
name: ROS 2 Design
description: "Distilled design documents related to the ROS 2.0 effort"

url: http://design.ros2.org
Expand Down
5 changes: 3 additions & 2 deletions articles/010_why_ros2.md
Expand Up @@ -6,6 +6,7 @@ abstract:
This article captures the reasons for making breaking changes to the ROS API, hence the 2.0.
published: true
author: Brian Gerkey
categories: Overview
---

{:toc}
Expand All @@ -26,7 +27,7 @@ In this article we will explain why.

ROS began life as the development environment for the Willow Garage PR2 robot.
Our primary goal was to provide the software tools that users would need to undertake novel research and development projects with the PR2.
At the same time, we knew that the PR2 would not be the only, or even the most important robot in the world, and we wanted ROS to be useful on other robots.
At the same time, we knew that the PR2 would not be the only, or even the most important, robot in the world, and we wanted ROS to be useful on other robots.
So we put a lot of effort into defining levels of abstraction (usually through message interfaces) that would allow much of the software to be reused elsewhere.

Still, we were guided by the PR2 use case, the salient characteristics of which included:
Expand All @@ -39,7 +40,7 @@ Still, we were guided by the PR2 use case, the salient characteristics of which
- maximum flexibility, with nothing prescribed or proscribed (e.g., "we don't wrap your main()").

It is fair to say that ROS satisfied the PR2 use case, but also overshot by becoming useful on a surprisingly wide [variety of robots](http://wiki.ros.org/Robots).
Today we see ROS used not only on the PR2 and robots that are similar the PR2, but also on wheeled robots of all sizes, legged humanoids, industrial arms, outdoor ground vehicles (including self-driving cars), aerial vehicles, surface vehicles, and more.
Today we see ROS used not only on the PR2 and robots that are similar to the PR2, but also on wheeled robots of all sizes, legged humanoids, industrial arms, outdoor ground vehicles (including self-driving cars), aerial vehicles, surface vehicles, and more.

In addition, we are seeing ROS adoption in domains beyond the mostly academic research community that was our initial focus.
ROS-based products are coming to market, including manufacturing robots, agricultural robots, commercial cleaning robots, and others.
Expand Down
91 changes: 49 additions & 42 deletions articles/020_ros_with_dds.md

Large diffs are not rendered by default.

30 changes: 14 additions & 16 deletions articles/030_ros_with_zeromq.md
Expand Up @@ -19,8 +19,6 @@ author: '[William Woodall](https://github.com/wjwwood)'
</div>

> This document pre-dates the decision to build ROS 2 on top of DDS.
>
> This article could use additional details, feel free to propose changes.
Original Author: {{ page.author }}

Expand All @@ -40,27 +38,27 @@ Since ROS 1.x was designed, there have been several new libraries in these compo
### Discovery

For discovery the first solution that was investigated was [Zeroconf](http://en.wikipedia.org/wiki/Zero_configuration_networking) with Avahi/Bonjour.
Some simple experiments were conducted which used [pybonjour](https://code.google.com/p/pybonjour/) to try out using the zeroconf system for discovery.
Some simple experiments were conducted which used [pybonjour](https://code.google.com/p/pybonjour/) to try out using the Zeroconf system for discovery.
The core technology here is `mDNSresponder`, which is provided by Apple as free software, and is used by both Bonjour (OS X and Windows) and Avahi (Linux, specifically avahi-compat).

These Zeroconf implementations, however, proved to not be so reliable with respect to keeping a consistent graph between machines.
Adding and removing more than about twenty items at a time from subprocesses typically resulted in inconsistent state on at least one of the computers on the network.
One particularly bad case was the experiment of removing items from Zeroconf, where in several "nodes" were registered on machine A and then after a few seconds shutdown cleanly.
The observed behavior on remote machines B and C was that the zeroconf browser would show all "nodes" as registered, but then after being shutdown only some would be removed from the list, resulting in "zombie nodes".
The observed behavior on remote machines B and C was that the Zeroconf browser would show all "nodes" as registered, but then after being shutdown only some would be removed from the list, resulting in "zombie nodes".
Worse still is that the list of "zombie nodes" were different on B and C.
This problem was only observed between machines using avahi as a compatibility layer, which lead into a closer look into avahi and its viability as a core dependency.
This closer look at avahi raised some concerns about the quality of the implementation with respect to the [Multicast DNS](http://en.wikipedia.org/wiki/Multicast_DNS) and [DNS Service Discovery](http://en.wikipedia.org/wiki/Zero_configuration_networking#Service_discovery) technology.
This closer look at avahi raised some concerns about the quality of the implementation with respect to the [Multicast DNS](http://en.wikipedia.org/wiki/Multicast_DNS) and [DNS Service Discovery (DNS-SD)](http://en.wikipedia.org/wiki/Zero_configuration_networking#Service_discovery) technology.

Further more DNS-SD seems to prefer the trade-off of light networking load for eventual consistency.
Furthermore, DNS-SD seems to prefer the trade-off of light networking load for eventual consistency.
This works reasonably well for something like service name look up, but it did not work well for quickly and reliably discovering the proto-ROS graph in the experiments.
This lead to the development of a custom discovery system which is implemented in a few languages as part of the prototype here:
This led to the development of a custom discovery system which is implemented in a few languages as part of the prototype here:

[https://bitbucket.org/osrf/disc_zmq/src](https://bitbucket.org/osrf/disc_zmq/src)

The custom discovery system used multicast UDP packets to post notifications like "Node started", "Advertise a publisher", and "Advertise a subscription", along with any meta data required to act, like for publishers, an address to connect to using ZeroMQ.
The details of this simple discovery system can be found at the above URL.

This system, though simple, was quite effective and was sufficient to prove that implementing such a custom discovery system, even in multiple languages is a tractable problem.
This system, though simple, was quite effective and was sufficient to prove that implementing such a custom discovery system, even in multiple languages, is a tractable problem.

### Data Transport

Expand All @@ -79,8 +77,8 @@ In this prototype:
[https://bitbucket.org/osrf/disc_zmq/src](https://bitbucket.org/osrf/disc_zmq/src)

ZeroMQ was used as the transport, which conveniently has bindings in C, C++, and Python.
After making discoveries using the above described simple discovery system, connections were made using ZeroMQ's `ZMQ_PUB` and `ZMQ_SUB` socket's.
This worked quite well, allowing for communication between process in an efficient and simple way.
After making discoveries using the above described simple discovery system, connections were made using ZeroMQ's `ZMQ_PUB` and `ZMQ_SUB` sockets.
This worked quite well, allowing for communication between processes in an efficient and simple way.
However, in order to get more advanced features, like for instance latching, ZeroMQ takes the convention approach, meaning that it must be implemented by users with a well known pattern.
This is a good approach which keeps ZeroMQ lean and simple, but does mean more code which must be implemented and maintained for the prototype.

Expand All @@ -101,16 +99,16 @@ First, there isn't any existing discovery systems which address the needs of the
Implementing a custom discovery system is a possible but time consuming.

Second, there is a good deal of software that needs to exist in order to integrate discovery with transport and serialization.
For example, the way in which connections are established, whether using point to point or multicast is a piece of code which lives between the transport and discovery systems.
Another example is the efficient intra-process communications, ZeroMQ provides an INPROC socket, but the interface to that socket is bytes, so you cannot use that without constructing a system where you pass around pointers through INPROC rather than serialized data.
For example, the way in which connections are established, whether using point to point or multicast, is a piece of code which lives between the transport and discovery systems.
Another example is the efficient intra-process communications: ZeroMQ provides an INPROC socket, but the interface to that socket is bytes, so you cannot use that without constructing a system where you pass around pointers through INPROC rather than serialized data.
At the point where you are passing around pointers rather than serialized data you have to start to duplicate behavior between the intraprocess and interprocess communications which are abstracted at the ROS API level.
One more piece of software which is needed is the type-safety system which works between the transport and the messages serialization system.
Needless to say, even with these component libraries solving a lot of the problems with implementing a middleware like ROS's, there still exists quite a few glue pieces which are need to finish the implementation.
One more piece of software which is needed is the type-safety system which works between the transport and the message serialization system.
Needless to say, even with these component libraries solving a lot of the problems with implementing a middleware like ROS's, there still exists quite a few glue pieces which are needed to finish the implementation.

Even though it would be a lot of work to implement a middleware using component libraries like ZeroMQ and Protobuf, the result would likely be a finely tuned and well understood piece of software.
This path would most likely give the most control over the middleware to the ROS community.

In exchange for the creative control over the middleware, comes the responsibility to document its behavior and design to the point that it can be verified and reproduced.
In exchange for the creative control over the middleware comes the responsibility to document its behavior and design to the point that it can be verified and reproduced.
This is a non-trivial task which ROS 1.x did not do very well because it had a relatively good pair of reference implementations.
Many users which wish to put ROS into mission critical situations and into commercial products have lamented that ROS lacks this sort of governing design document which allows them to certify and audit the system.
Many users that wish to put ROS into mission critical situations and into commercial products have lamented that ROS lacks this sort of governing design document which allows them to certify and audit the system.
It would be of paramount importance that this new middleware be well defined, which is not a trivial task and almost certainly rivals the engineering cost of the initial implementation.
30 changes: 15 additions & 15 deletions articles/050_ros_rpc_design.md
Expand Up @@ -16,9 +16,9 @@ published: true

<div class="alert alert-warning" markdown="1">
This article is out-of-date.
It was written at a time before decisions were made to use DDS and RTPS as the underlying communication standards.
It was written at a time before decisions were made to use DDS and RTPS as the underlying communication standards for ROS 2.
It represents an idealistic understanding of what RPC and "actions" should be like in ROS.
It can be considered memoranda and not necessarily the intention on how to develop the system.
It can be considered memoranda and not necessarily the intention of how to develop the system.
</div>

<div class="abstract" markdown="1">
Expand All @@ -28,11 +28,11 @@ It can be considered memoranda and not necessarily the intention on how to devel
Original Author: {{ page.author }}

In ROS there are two types of Remote Procedure Call (RPC) primitives.
ROS Services are basic request-response style RPC's, while ROS Actions additionally are preemptible and offer feedback while requests are being processed.
ROS Services are basic request-response style RPCs, while ROS Actions additionally are preemptible and offer feedback while requests are being processed.

## Ideal System

It is useful to consider the ideal system to understand how it relates to the current system and how a new system could work.
It is useful to consider the ideal system to understand how it relates to the current ROS 1.x system and how a new system could work.
An ideal RPC system would have the qualities laid out in the following paragraphs.

### Asynchronous API
Expand All @@ -47,7 +47,7 @@ Having a timeout allows for recovery behavior in the case of failure conditions

### Preemptibility

Preemption is a desirable feature whenver there may be long-running or non-deterministically running remote procedures.
Preemption is a desirable feature whenever there may be long-running or non-deterministically running remote procedures.
Specifically, we want the ability to preempt a long-running procedure with either a timeout on synchronous requests or an explicit call to cancel on asynchronous requests.
Preemptibility is a required feature for the concept of Actions to be implemented (which is one reason that Actions are built on asynchronous ROS Messages instead of synchronous ROS Services).

Expand All @@ -67,13 +67,13 @@ In ROS 1.x, this lack of reliability has been a problem for ROS Actions, e.g., w

When logging a ROS 1.x system (e.g., using `rosbag`), recording data transmitted on topics is insufficient to capture any information about service calls.
Because service calls are conceptually point to point, rather than broadcast, logging them is difficult.
Still, it should be possible to efficiently record some level of detail regarding RPC interactions, such that they could be later played back in some manner (though it it not clear exactly how playback would work).
Still, it should be possible to efficiently record some level of detail regarding RPC interactions, such that they could be later played back in some manner (though it is not clear exactly how playback would work).

## Proposed Approach

The features outlined above are desirable but if provided as a monolithic implementation will be much more complicated than necessary for most use cases.
E.g., feedback is not always required, but in a monolithic system it would always be an exposed part of the API.
We propose four levels of abstraction into which the above features can be sorted, wich each higher level providing more functionality to the user.
We propose four levels of abstraction into which the above features can be sorted, with each higher level providing more functionality to the user.

![ROS RPC Higherarchy](/img/ros_rpc_design/rpc_diagram.png)

Expand All @@ -92,16 +92,16 @@ For logging/introspection purposes the RPC Server instance might publish all inc

### ROS Preemptible RPC API

The ROS preemptible RPC API will extend the Asynchronous API to enable preemption of an RPC in progress using a UID.
The ROS preemptible RPC API will extend the Asynchronous API to enable preemption of an RPC in progress using a unique identifier (UID).
This UID will be provided by the initial request method.

### ROS Action RPC API (Not effecting RPC Protocol)
### ROS Action RPC API (Not Affecting RPC Protocol)

The feedback topic can be isolated to a separate topic, which avoids integrating the feedback into the core RPC implementation.
The ROS Action RPC API will extend the preemptible RPC API to provide a feedback channel via published ROS topic.
This can be built on top of the preemptible RPC API with the PubSub API thus isolating it from the RPC design.

### ROS Synchronous RPC API (Not Effecting RPC Protocol)
### ROS Synchronous RPC API (Not Affecting RPC Protocol)

For each of the above Asynchronous APIs a thin wrapper can be built on top to provide a single function-based interface for ease of use.
It will block until a response is returned or the timeout is reached.
Expand All @@ -112,9 +112,9 @@ This will just be a thin layer on top of the Asynchronous API requiring no addit

There are some issues with the above proposed approach, which are outlined below.

### Visibility of UID's
### Visibility of Unique Identifiers

UID's are generally necessary for asynchronous communications to pair the requests and the responses.
UIDs are generally necessary for asynchronous communications to pair the requests and the responses.
There are possible ways to build this without a UID embedded in the data type, however it will require some level of heuristics to do data association.

There are two options: (i) require the user to embed the UID into the message, or (ii) add those fields automatically at message-generation time.
Expand All @@ -125,7 +125,7 @@ This also introduces issues when trying to record and potentially play back Serv

Should there be a separate `.action` file type?
Or should it be more like a `.srv` + `.msg` pair?
This is highly influenced by the way UID's are handled.
This is highly influenced by the way UIDs are handled.

### Logging

Expand All @@ -140,8 +140,8 @@ which raises the question of how to embed this association without significantly
This is a generic issue with logging and affects potentially all logging and should be captured in a separate article.
It might be possible to pad communications with debugging data.

The above UID's may be only locally unique (client-specific for instance).
For logging, UID's need to be unique within the entire ROS instance to support debugging.
The above UIDs may be only locally unique (client-specific for instance).
For logging, UIDs need to be unique within the entire ROS instance to support debugging.

### Collapse Preemptible and Asynchronous

Expand Down

0 comments on commit 580664f

Please sign in to comment.