Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store message definition in bag #782

Closed
jhurliman opened this issue Jun 15, 2021 · 32 comments · Fixed by #1293
Closed

Store message definition in bag #782

jhurliman opened this issue Jun 15, 2021 · 32 comments · Fixed by #1293
Labels
enhancement New feature or request

Comments

@jhurliman
Copy link

Description

Neither the metadata.yaml file or .db3 file(s) contain the ROS2 message definitions used to generate the bag. This makes it impossible to deserialize rosbag2 files without having the correct version of all the original .msg files on the machine that is decoding the bag. Tools like Foxglove Studio are not able to decode rosbag2 files, and if message definitions are ever modified you have to store a full snapshot of message definitions alongside each bag file. If the wrong message definition is used during deserialization, there doesn't appear to be any mechanism to catch this leading to either decoding failures or potentially invalid decoding.

This is a regression from the rosbag1 format, where a .bag file is self describing and can be decoded even if the collection of .msg files used to generate it have been lost, or if a .bag file is shared with someone else.

@jhurliman jhurliman added the enhancement New feature or request label Jun 15, 2021
@emersonknapp emersonknapp changed the title rosbag2 files do not contain message definitions Store message definition in bag Jun 15, 2021
@emersonknapp
Copy link
Collaborator

Thanks for filing this ticket - it's something that has come up as part of other discussions but was not explicitly tracked as its own feature request. I've re-worded the title to name the positive feature to build (rather than the negative lack of feature)

@danthony06
Copy link

@jhurliman I have not personally used this package, and there's definitely some major assumptions made in the codebase, but I think this is doing a lot of what you're envisioning:
https://gitlab.com/ternaris/rosbags

@danthony06
Copy link

Ah, looking at how they do that, it looks like they're manually embedding a bunch of information about the message format, so it can't handle arbitrary messages. I do like the idea of having all of the information to fully extra the messages from a bag embedded in the actual bag file.

@chaoflow
Copy link

@danthony06 Writing as one of the authors: rosbags so far does not store message definitions. It supports reading rosbag2 with custom message types by registering the idl/msg files, see https://ternaris.gitlab.io/rosbags/topics/typesys.html

There are two cases:

  1. conversion from rosbag1 to rosbag2; message definitions from rosbag1 are used to convert and we want to store these along with topic latching info for later deserialization and to convert back to rosbag1.
  2. recorded rosbag2; either the recorder or a post-processing tool needs to add the message definitions, depending on when these are available.

@emersonknapp et al: What are your thoughts on where and how to store message definitions and topic latching info?

@pkess
Copy link

pkess commented Jun 17, 2021

Hi all,
i would like to get this feature as well. But as a first step it might be very usefull if the rosbag would store at least some kind of a hash of the message definition for each topic. With this you could check if you have the correct message definition available. Ich think rosbag1 stored both the full message definiton and some kind of hash for the message type.

@danthony06
Copy link

@chaoflow thank you for the correction, sorry I missed the details on how you're getting the format.

@pkess You're right that rosbag1 is storing the MD5 sum, but it's done on a connection basis, not message type basis, according to the documentation: http://wiki.ros.org/Bags/Format/2.0. I'm guessing this is to catch a corner case in ROS 1 where it might be possible to have nodes using the same message type, but are different versions with different internal fields. For example, Node A and B are compiled with one message definition and use that to communicate, while Nodes C and D have a message with the same name, but has been changed and built at a later time, and they can use that to communicate. So the bag file would have multiple entries for the same message name, but the implementation is different, so it has to track the MD5 of the connection.

@emersonknapp
Copy link
Collaborator

What are your thoughts on where and how to store message definitions and topic latching info?

I see the obvious place as being in the TopicMetadata - this thing already exists and we should be able to safely expand it with some serialized representation of the message definition.

For latching info - this is handled using the Durability QoS using TRANSIENT_LOCAL durability with a history. This behavior provides a superset of the ROS 1 "latching" functionality. http://docs.ros.org/en/galactic/Concepts/About-Quality-of-Service-Settings.html#qos-policies for some info

@durko
Copy link

durko commented Jul 5, 2021

@emersonknapp Saving message definitions per topic could lead to some content duplication for topics sharing message types. Nested types (e.g. std_msgs/msg/Header) are often not part of the definition itself and need to be also stored somewhere. While yaml supports anchors and references, it would be easier to work with a flat storage for message definition metadata, e.g. with a type_definitions array as a sibling to topics_with_message_count.

Suggested content:

rosbag2_bagfile_information:
  type_definitions:
  - format: 'idl'
    definition: '{content of Bool.idl file}'
  - format: 'msg'
    definition: 'MSG: std_msgs/Int32\n{content of Int32.msg file or message_definition from ROS1 bag}'

The idl case is straightforward, as only the message definition content is necessary. The msg case requires one additional piece of information, namely the message type name. In the example above the message type name is prepended to the message definition, similar to concatenated definitions in message_definition fields of ROS1 bag connection headers. Alternatively, message type names could also be saved as a separate key. As both formats can define multiple types in a single message definition, a type name key would probably have to be a list of strings.

@jhurliman
Copy link
Author

My understanding is that DDS (and .idl files as an extension of that) is one possible transport system in ROS2 which has a pluggable transport layer. If that’s correct, making idl storage a first class concept in rosbag2 would be a leaky abstraction.

ROS1 bags handle this issue gracefully. The one thing they add beyond the raw msgdef is a checksum, so you don’t have to do a byte for byte long string comparison to see if a message definition has changed between two bags. ROS1 bags also store msgdefs per connection instead of per topic, which would fix the replay bug when two publishers publish to the same topic with different QoS profiles.

@chaoflow chaoflow mentioned this issue Jul 16, 2021
@chaoflow
Copy link

My understanding is that DDS (and .idl files as an extension of that) is one possible transport system in ROS2 which has a pluggable transport layer. If that’s correct, making idl storage a first class concept in rosbag2 would be a leaky abstraction.

@jhurliman The proposal above is not bound to idl but already contains msg as other format.

ROS1 bags handle this issue gracefully. The one thing they add beyond the raw msgdef is a checksum, so you don’t have to do a byte for byte long string comparison to see if a message definition has changed between two bags.

When considering a checksum for message definitions for rosbag2 we should take normalization into account instead of simply hashing the string definition.

ROS1 bags also store msgdefs per connection instead of per topic, which would fix the replay bug when two publishers publish to the same topic with different QoS profiles.

QoS profile is a concept orthogonal to messages definitions, same as latching in rosbag1, i.e. a topic has a message type with a specific message definition and a topic can have a QoS profile.

ROS1 stores message definitions per connection and there can be multiple connections per topic with theoretically, regarding the storage format, different message types and definitions. I have not seen mixed-type-topics in practice, am not sure whether it is possible to generate them through recording, in contrast to manually writing bags, and don't see a use case for them so far.

@danthony06
Copy link

I agree that the mixed-type-topics is an unusual case, and unlikely to happen in practice. About the only time I can think of when I have encountered this is when there are multiple robots that are running different versions of software. It's generally not a great idea and usually indicates something wrong with the larger system, but it's certainly possible.

@jtbandes
Copy link
Member

Lack of message definitions also affects the JS/WebSocket bridges rosbridge_suite and ros2-web-bridge. I just made a PR for the former that re-constitutes a plausible .msg definition from the generated Python classes (although the definition it generates is missing constants and default values): RobotWebTools/rosbridge_suite#574

I hope this can be a temporary workaround, since it would be nicer to address this by including raw datatypes at the rosidl code generation level, so they can be used by both rosbag2 and dynamic tools like the rosbridge_server.

@chaoflow
Copy link

In #819 we're discussing for db3 files to be self-contained, i.e. the message definitions need to be stored in each db3 file (in case of splitting) and are duplicated into metadata.yaml the same as happens already for TopicMetadata.

@emersonknapp you suggested to add message definitions to TopicMetadata, which would lead to duplication but we have the same pattern already for QoS profiles. Is it ok to duplicate message definitions as well? Do you see a way of storing nested types there as well?

Apart from where to store, for me it is not clear yet from where to get:

  • Does a recorder always have message definitions available?
  • Are there scenarios where these are not installed but only compiled into binaries?
  • Where does a recorder fetch the message definitions from?

@emersonknapp
Copy link
Collaborator

Most of these questions don't have answers yet, because there isn't a core mechanism for retrieving message definitions.

I've created ros2/ros2#1159 to contain the discussion for this core feature. Hopefully, once we have that feature available, the rosbag2-side implementation will be relatively straightforward.

@Karsten1987
Copy link
Collaborator

Is it ok to duplicate message definitions as well?

Maybe I am thinking too naively here, but I can't imagine that the storage size of the message definition is in any case comparable to the amount of data (the actual incoming messages) we store in the actual rosbags. I don't really see a way of not duplicating the message definitions across the bag files, because as said, the bag files should presumably be self contained and independent. That way we can leverage tools like merging various bag files etc.

@emersonknapp
Copy link
Collaborator

Note: discussion ros2 core functionality in ros2/ros2#1159

jtbandes added a commit to RobotWebTools/rosbridge_suite that referenced this issue Aug 3, 2021
This is an adaptation of #452 for ROS 2.

After the initial cherry-pick, a new implementation of the `/rosapi/get_topics_and_raw_types` service was required for ROS 2, because the generated Python msg classes don't contain the original/combined message definition (see related rosbag2 discussion at ros2/rosbag2#782). To accomplish this, I added a `stringify_field_types()` which reconstitutes a plausible .msg definition (although without constants or default values) based on the `_fields_and_field_types`, in the `====`-separated format seen in ROS 1 bags, which can be parsed by ROS 2 .msg parsers such as [@foxglove/rosmsg](https://github.com/foxglove/rosmsg).

I hope that `stringify_field_types` could be removed in the future if message generation is modified to output the original message definition text (ros2/ros2#1159) — I see it as a stopgap solution to get basic compatibility working.

I also modified `get_publications_and_types` to remove the extra `/msg/` from the ROS 2 IDL names, so `std_msgs/msg/String` becomes `std_msgs/String` when the list of topics and datatypes are retrieved. This is more in line with what clients migrating from ROS 1 expect.

I had to make a few launch file / parameter changes to get the `rosbridge_websocket_launch.xml` to work. The test from #452 is not included, because it needs significant work to port to ROS 2 and I noticed that some other tests are also disabled in this branch.
@ros-discourse
Copy link

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/announcing-live-ros-2-connections-in-foxglove-studio/22278/1

@wjwwood
Copy link
Member

wjwwood commented Feb 25, 2022

Just reading this now, as part of some other work I'm doing, and I just wanted to point out that currently rosbag (AFAIK) requires the typesupport be installed locally for a message when recording or playing back, so the message definitions could be stored in those packages and accessed by rosbag locally, therefore not strictly requiring ros2/ros2#1159 in order to address this feature. Obviously we'd like to have ros2/ros2#1159 so that we can do away with this restriction that message packages need to be available locally for rosbag to record, but that's technically a separate issue.

Sorry if someone already pointed this out, because I was only able to skim the previous comments.

@emersonknapp
Copy link
Collaborator

so the message definitions could be stored in those packages and accessed by rosbag locally

This is the approach that @jhurliman is trying to take now, but isn't sure how to implement - see ros2/ros2#1159 (comment)

@jhurliman
Copy link
Author

@emersonknapp
Copy link
Collaborator

Oh I see - nice! I might think best possible scenario is that this functionality is implemented in the ros2 core, and the contents passed to the storage implementation to store. Ideally the storage implementation doesn't have any dependency on the ROS2 system, and also the behavior could be shared between implementations. I do get that this was the fastest way to get it working in the mcap storage, though :).

@ros-discourse
Copy link

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/make-your-ros-2-files-self-contained-smaller-and-more-performant/25590/1

@ros-discourse
Copy link

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-bag-data-in-plotjuggler/27072/1

@amacneil
Copy link
Contributor

We took the ideas from this thread, #819, #855, and several others and incorporated them into the design of MCAP, a highly efficient, self-contained recording format for ROS 2 and more.

A ROS 2 MCAP storage plugin is available for Foxy, Galactic, Humble and Rolling, and using it is as simple as ros2 bag record -s mcap.

Since this feature is now already supported by the MCAP storage plugin, I think we should either close it, or change the title to specifically request adding message definitions to files created by the sqlite plugin.

@jhurliman
Copy link
Author

I’ll close the issue since MCAP is stable and a preferred alternative now.

@chaoflow
Copy link

@jhurliman I see how mcap solves #819, however I agree with @emersonknapp #782 (comment) that storage of message definitions should not be specific to one format and especially should be realized for the default format. To that end I suggest to reopen this issue.

@MichaelOrlov
Copy link
Contributor

@chaoflow FYI. Now we store message definitions in SQLite3 db3 files as well.
I added new table message_definitions and incremented schema_version to 4 in schema table. Please refer to the #1293 for details.
Looking forward for relevant updates in rosbags.

@chaoflow
Copy link

@chaoflow FYI. Now we store message definitions in SQLite3 db3 files as well. I added new table message_definitions and incremented schema_version to 4 in schema table. Please refer to the #1293 for details. Looking forward for relevant updates in rosbags.

@MichaelOrlov Thank you very much! Is there a suggested migration path for previously recorded rosbag2?

@MichaelOrlov
Copy link
Contributor

@chaoflow The migration path is simple. If db schema version < 4 there are no message_definition table.
Please see how we are determining db schema version here

int SqliteStorage::read_db_schema_version()
{
int schema_version = -1;
if (database_->table_exists("schema")) {
// Read schema version
auto statement = database_->prepare_statement("SELECT schema_version from schema;");
auto query_results = statement->execute_query<int>();
schema_version = std::get<0>(*query_results.begin());
} else {
if (database_->field_exists("topics", "offered_qos_profiles")) {
schema_version = 2;
} else {
schema_version = 1;
}
}
return schema_version;
}

Or you can directly check

if (database_->table_exists("message_definition")) {
  // TBD
}

@chaoflow
Copy link

@MichaelOrlov Thanks and sorry for being unclear. I meant whether there is a migration path to get message definitions into bags that were previously recorded ? Something like: ros2 bag migrate oldbag newbag assuming the correct message definitions for the previously recorded bag are available.

@MichaelOrlov
Copy link
Contributor

@chaoflow Oh, in t his case ros2 bag convert could be used https://github.com/ros2/rosbag2#converting-bags
This example should work like "ros2 bag migrate oldbag newbag"

$ ros2 bag convert -i bag1 -o out.yaml

# out.yaml
output_bags:
- uri: /output/bag1  # required
  storage_id: ""  # will use the default storage plugin, if unspecified
  all: true

@chaoflow
Copy link

@MichaelOrlov Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.