-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design the World Model #565
Comments
On a high level I don't see anything here that I see as a show stopper, but it also seems like this design is trying to bite out alot of things to be generally and extendable to everybody. I might think about reducing scope to get something out there that works and is relatable to existing technologies to build off of. Only bit of input I'd add is that in the "core representation" I believe that there is no single correct way of representing the space (costmap, grid map, traversability map, etc) and should instead be a combination of all of them - such that the costmap is really a vector of of these items stacked on top of each other. Different applications can utilize different representations as they like but the data filled in populates the costmap, and the traversability map, and the ... based on whatever the application warrants. |
From talking about this again this week I had another thought which I'm sure others have thought about but maybe not in words: We would like to not have to redo different planning and controlling algorithms into different ones for different representations of costmaps. Rather than trying to make something like templating everywhere and trying to make everyone happy and readable, I'm thinking a good thought would be to have adaptors. For example right now we're going costmap -> planner. Now we might go costmap -> adaptor -> planner, similarly with traversability map -> adaptor -> planner. This lets us generalize all the new planning and control algorithms independent of the implementation of the world model (making those by itself is hard without having to deal with lots of templates and thinking about ramifications across multiple representations). Then we have a finite set of adaptors we need to build to represent the different models we'd like (costmaps, traversability, elevation, etc) who's job is to take the value of the neighboring cells and apply the vehicle kinematics and dynamics to give back a response "ok" "not ok" "unknown" or other options. The example adaptor for a costmap would be to change the 0-255 to those return types. Elevation map would apply the vehicle dynamics or maximum gradients to return the same type of information. Now all planners or controller that work for 1 will work for all as long as they can run with "ok" "not ok" "unknown" or other options and be extendable easily for sampling based planners by creating methods in the adaptor to get the value at a certain location in global frame coordinates which the adaptor could project into a costmap cell, voxel grid pose, or elevation gradient. |
Are you envisioning the output of the adaptor in this example as a sort of simplified costmap where each cell is filled with a ok, not ok, or unknown value. Or are you thinking more that the client feeds a trajectory to the adaptor and the adaptor returns the evaluation of the trajectory as a ok, not ok, unknown value? This pushes the knowledge of the vehicle dynamics to the world model (adaptors) instead of the algorithms. How confident are you that this is the right place for that knowledge? I had thought of having input and output adaptors as well, but I was thinking it could result N^2 adaptors since we theoretically could want to convert from any representation to any other. I expect it wouldn't be so bad in practice since many conversions are probably not useful. Based on your feedback, I was imagining something like this below. We have a collection of representations. Adaptors to convert incoming data to those representations as needed. Clients that either get data directly from the representation they need or from an adaptor if there is a mismatch in the representation provided and the type they need. |
Thanks for making that diagram, I'm certainly a pretty lazy guy when it comes to visualizations :) First off, I'm not sure why you have a separate pipeline for laser scans, that defeats the purpose of having a general purpose world model to take in arbitrary sensors to generate a view of the world. I'd recommend completely scrapping that, I don't see anything special about a laser scanner requiring that type of pipeline within the world model. If you'd like to use it as a safety sensor with zones, that should be up stream of this since that's not generalized and many robots today don't use them anymore or have a different sensor suite. What I think should be happening is a similar way of how it's done today. There's a set of optional plugins to buffer arbitrary sensor information (scans, images, depth maps, radar, sonar, etc) when does stuff of interest for that sensor, and inserts it into the map. I'm not thinking greatly into how to generalize those plugins for different plugins. In practice, I dont think you'll find a way to generalize that. A costmap is binning a depth map for collision avoidance, while an elevation map will use them over time to generate an ellipsoid curve or something are very different operations on the same sensor data. Those plugins for buffering and inserting data into their representation are probably representation specific. Looking on the other side however, we have our [representation A] which needs to be utilized by the local/global planners to navigate. In the case of the elevation map, the Z coordinate becomes meaningful so we can't just talk about 2D X-Y coordinates anymore. The planner or controller will say "give me the neighbors" and its then up to the adaptor to say "I'm an elevation map, therefore my neighbors are in a 3D 8-directional curves" or "I'm a costmap, I just need to give the cells on either side of me", and return that information to the planner or controller to do their will with an ask for more things as needed. This lets us do 2D or 3D representations but allow the same algorithms for local and global planning to operate (and moreover, would work for drones as well). When the adaptors return their neighbors, its up to the adaptor's knowledge of the robot dynamics and mechanics to assign them some set of finite states that can be generalized across all representations. That might include OK, not OK, unknown, not recommended, etc but as long as all of the algorithms are built to work with those same finite states, then whatever the representation is, it'll work. And the adaptors can use the Robot class to get those relevant dynamics. For a diff drive robot, its just all OK like in navigation1, but for an ackermann car or an elevation map on a legged robot, that may not be valid |
Yes that's what I'm thinking, with the only exception that the "Sensing and Perception" isn't 1 plugin, but also a series of plugins, but I think that may have just been represented that way for brevity. Each representation will have an associated plugin that converts its values into the limited-enum types and is responsible for answering the query from the controller/planner "give me the neighbors" and "give me the value at (x,y,z)". |
It was meant to represent a possibility. If a sensor could output data directly in representation format, it could talk directly to the model. But that's a stupid idea in context. I was playing with the idea that everything could be a ROS node; each representation could be a node and each adaptor as well.
Do we want to be able to chain adaptors/plugins? As in, there is a plugin that provides data from the representation, but there is a second plugin that grabs the output of the first plugin and provides it in a different way
So we'd need to figure out the queries that can be used by many algorithms. We'd then end up with an output plugin per class of algorithm, where a class could be graph search algorithms like A*, Dijkstra, D* etc. |
Well at the end of the day all the graph search algorithms are going to ask for neighbors, the sampling based planners will ask for the result at a certain position, I'm not totally certain what optimization based planners will ask for, but that's overkill for 2D navigation as far as I know. If not, we can find out what thematically they ask for. I think plugin API implementing neighbors, position in global frame (then it should internally find the value of its representation i.e. costmap would look for X-Y cell index and then the location in the array), and random sample to start with. We can always extend it if this is the way you also think makes sense. It was just a suggestion |
(More thoughts) |
Closing the issue, it seems we're sticking with the current costmap design for now. |
Background
What is the world model?
The robot's image or mental model of the world.
Jay Wright Forrested defined a mental model as:
Navigation decisions and actions are made based on this model.
As shown below, the world model is populated with information coming from sensing, perception, and mapping; and supplies information to the navigation sub-modules.
In order to guide our design decisions on the World Model, let's take a closer look at what are the various kinds of inputs and outputs.
Inputs to World Model
Let's consider the modules that provide information to the world model
Perception
The perception module provides input to the world model mainly to account for changes in the environment from both moving objects but also stationary objects with dynamic attributes, i.e. a traffic light.
Currently, moving objects are mostly accounted for by the obstacle layer of
costmap_2d
which process the raw output of a laser scanner.Design Improvements
Maps & Map Server
The map server provides a priori information of the environment, mostly of stationary objects, in the form of a map. Maps can also contain dynamic information about some of these objects, i.e. traffic, road closures, etc.
Currently, the map server is only capable of processing and providing grid/cell-based (metric) types of map representations.
Design Improvements
Related issues: #18
Open Questions
Outputs from World Model
Let's consider the consumers of the information contained in the world model, aka the clients.
Clients operate on different length-scales and use different layers or aspects of the world model.
(Global) Path Planning
Can operate on a road network, topology map, or global map (sub-sampled occupancy grid or k-d tree).
These are coupled with the map representation being used.
Currently, only planners that operate on a costmap are supported.
Design Improvements
Open Questions
(Local Path Planning) Obstacle Avoidance and Control
Operates on a higher resolution local map representation, for example, an occupancy grid.
Attempts to follow the global path while correcting for obstacles in a dynamic environment. Provides the control inputs to the robot.
These are planner-dependent.
Currently,
nav2
provides a DWA-based controller,nav2_dwb_controller
. This has its own internal representation of the world (nav2_costmap_2d
) with direct access to raw sensor data.Design Improvements
Open Questions
Motion Primitives & Recovery
Currently, motion primitives do not interact with the world model. A pull-request (#516) is open that would add collision checking.
In ROS1 recovery, both the global and local costmap based representations were passed to the world model.
Design Improvements
Design
Goal
Design a world / environmental model for 2D navigation.
Objectives:
Summarizing the design improvements discussed above:
Proposal
Given the extension of the change, we'll have to implement the design in multiple phases.
In the first phase, we can separate the world model from the clients and make them separate nodes.
In the second phase, we can define the new modules and port the current costmap based world model. Below is a high-level diagram, the components are explained below. The main point of this phase is to remove the dependency between the core representation and the type of client, we do this by defining some
plugins
that translate the information of the Core into something useful to the client. Similarly, we also defineplugins
for the inputs.On the following phases, we can extend this by introducing other map formats (beyond grid-based maps) and perception pipelines. We also support multiple internal representations. Eventually, we might have something like this:
Core Representation
The core representation is module rich enough to represent the world with enough expressiveness for at least doing navigation. In an ideal case, this could be an internal simulator where we can ask anything about the world. By querying this internal simulator, we can build a structure needed by a navigation sub-module.
We might want to experiment with different types of core representations with different levels of expressiveness. We can initially use costmaps but eventually move to scene-graphs that support a semantic interface.
Additionally, multiple representations might be appropriate i.e. Robot-centric, World-centric.
Open Questions
Planner Plugin
The planner plugin extracts information from the core representation to create the structure needed by the planner.
Open Questions
Control / Collision Avoidance Plugin
The control plugin extracts information from the core to create a useful structure for a controller/local planner.
Open Questions
Map to Core Plugin
Gets the map from the server and populates the core.
Open Questions
Sensing to Core Plugin
Gets low level (sensor data streams) or high level (objects with meta-data) and populates / updates the core.
Open Questions
Performance
Concerns
Next Steps
Phase 0:
Phase 1: Grid-based core using
costmap_2d
.costmap_2d
.costmap_2d
.costmap_2d
.costmap_2d
used bynavfn_planner
.navfn_planner
to use new interface.dwb
to use new interface.The text was updated successfully, but these errors were encountered: