Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add topic_id returned by storage to the TopicMetadata #1538

Merged
merged 4 commits into from
Feb 6, 2024

Conversation

MichaelOrlov
Copy link
Contributor

@MichaelOrlov MichaelOrlov commented Jan 14, 2024

  1. Users complain multiple times for instance here Use middleware send and receive timestamps from message_info during recording #1531 (comment) that rosbag2 stores messages in internal buffers in a very inefficient way and that memory footprint for the same data payload could be reduced down to the factor of the 2 or 3 by using topic_id represented as integer type instead of full topic_name represented as a string for each message.
    This is very meaningful when using the snapshot feature and recording on the host HW with limited resources.
  2. Another motivating factor for using topic_id instead of the topic_name in internal data structures and buffers is the ability to differentiate different publishers with the same topic name. It might be a situation for instance when multiple publishers exist on the same topic but have different QoS settings. Another use case could be when multiple publishers exist on the same topic but with different versions of the type definitions.
    Currently, we don't support these use cases in the rosbag2. However, if we will differentiate topics by topic id in inner rosbag2 representation on all layers we would be able to add support for the aforementioned use cases in the future.

About the size of the type for the newly added topic_id field.
In MCAP specification we reserved 16 bit for the topic_id. This is the ChannelId, the MCAP inner type that corresponds to the uint16_t. When we were creating the MCAP format we decided that the ability to differentiate a maximum 65535 topics would be enough in the vast majority of cases. However, in the SQLite3 storage plugin, we are using topic_id as an index to map records from the messages DB table to the corresponding topics table. According to the SQLite3 specification, the index is the int64_t data type and can't be changed.
Therefore the type of the newly added topic_id filed shall be no less than int64_t and shall be with the sign.
Update:
Decided to keep uint16_t for topic_id in the TopicMetada and have a hash map to int64_t indexes inside the sql storage plugin. The performance will not be affected during recording because currently we already have an internal hash map lookup to find match for topic_name to the topic_id index. It will be similar hash map lookup, however we will be using external topic_id instead of topic_name.

@MichaelOrlov MichaelOrlov force-pushed the morlov/add-topic_id-to-topic_metadata branch from 6b28ca2 to d515a62 Compare January 14, 2024 02:34
@MichaelOrlov MichaelOrlov changed the title [WIP] Add topic_id returned by storage to the TopicMetadata Add topic_id returned by storage to the TopicMetadata Jan 29, 2024
@MichaelOrlov MichaelOrlov force-pushed the morlov/add-topic_id-to-topic_metadata branch from d515a62 to 9cde976 Compare January 29, 2024 08:53
@MichaelOrlov MichaelOrlov marked this pull request as ready for review January 29, 2024 09:11
@MichaelOrlov MichaelOrlov requested a review from a team as a code owner January 29, 2024 09:11
@MichaelOrlov MichaelOrlov requested review from emersonknapp, hidmic and fujitatomoya and removed request for a team January 29, 2024 09:11
@MichaelOrlov MichaelOrlov changed the title Add topic_id returned by storage to the TopicMetadata [WIP] Add topic_id returned by storage to the TopicMetadata Jan 29, 2024
@MichaelOrlov
Copy link
Contributor Author

Moving it to the draft again.
Going to implement proposal form the #1553 (comment)
Will keep uint16_t for topic_id in the TopicMetada and have a hash map to int64_t indexes inside the sql storage plugin.

Motivation:
Well, on one hand, it will require keeping a hash map with two types of IDs in the SQLite3 storage plugin side and having a hash lookup operation for each message read/write. The hash map is not very memory efficient but still could take less memory footprint rather than keeping an extra 6 bytes per message. Especially when we are having a lot of high-frequency messages like IMU. Another drawback of this is a little bit of performance hit for lookup for inner ID in the hash map for each message write.
However, on the other hand, we probably can sacrifice a bit of performance penalty on the SQLite3 plugin side for the sake of a better memory footprint.
Please note that MCAP storage plugin will not be affected and users will have benefits using rosbag2 with it.

@MichaelOrlov MichaelOrlov marked this pull request as draft January 29, 2024 17:26
@MichaelOrlov MichaelOrlov marked this pull request as ready for review January 30, 2024 05:07
@MichaelOrlov
Copy link
Contributor Author

Ready for review

@MichaelOrlov MichaelOrlov changed the title [WIP] Add topic_id returned by storage to the TopicMetadata Add topic_id returned by storage to the TopicMetadata Jan 30, 2024
@fujitatomoya
Copy link
Contributor

@MichaelOrlov thanks for the PR. this one looks big change, i will try review in this week. you can go ahead to assign this to me.

@MichaelOrlov
Copy link
Contributor Author

@fujitatomoya Thanks for helping with review.
While it looks like a big PR with a lot of changes although, it is because it affects a lot of tests where we need to add 0 or 1 for initializing the newly added field to the TopicMetadata. The most of the changes where need to look at and review are in the sqlite3_storage package (there a few files)

Copy link
Contributor

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm, a few comments.

- Rationale.
 To be able to distinguish topics by unique topic ID rather than by
 topic name in the future.

Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
@MichaelOrlov MichaelOrlov force-pushed the morlov/add-topic_id-to-topic_metadata branch from 3d65cee to e5d8c3a Compare February 4, 2024 20:48
Co-authored-by: Tomoya Fujita <Tomoya.Fujita@sony.com>
Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
@MichaelOrlov MichaelOrlov force-pushed the morlov/add-topic_id-to-topic_metadata branch from e5d8c3a to 814b513 Compare February 4, 2024 20:53
@MichaelOrlov
Copy link
Contributor Author

@emersonknapp @clalancette Could some of you please give formal approval for this PR to be able to merge it?
@fujitatomoya already reviewed it.

@MichaelOrlov
Copy link
Contributor Author

Gist: https://gist.githubusercontent.com/MichaelOrlov/9d2827dcef7e219879574e49386f1596/raw/16f993bb8bcfd59af7ac24dbeaa4bec51ead6950/ros2.repos
BUILD args: --packages-above-and-dependencies rosbag2_compression rosbag2_cpp rosbag2_examples rosbag2_py rosbag2_storage rosbag2_storage_mcap rosbag2_storage_sqlite3 rosbag2_transport rosbag2_tests
TEST args: --packages-above rosbag2_compression rosbag2_cpp rosbag2_examples rosbag2_py rosbag2_storage rosbag2_storage_mcap rosbag2_storage_sqlite3 rosbag2_transport rosbag2_tests
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/13237

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

Copy link
Collaborator

@emersonknapp emersonknapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@MichaelOrlov
Copy link
Contributor Author

MichaelOrlov commented Feb 5, 2024

Re-run CI with the correctly listed rosbag2_examples_cpp and, rosbag2_examples_py instead rosbag2_examples package.
Gist: https://gist.githubusercontent.com/MichaelOrlov/7b5b5d317f83b855c612ecd3a833a50e/raw/16f993bb8bcfd59af7ac24dbeaa4bec51ead6950/ros2.repos
BUILD args: --packages-above-and-dependencies rosbag2_compression rosbag2_cpp rosbag2_examples_cpp rosbag2_examples_py rosbag2_py rosbag2_storage rosbag2_storage_mcap rosbag2_storage_sqlite3 rosbag2_transport rosbag2_tests
TEST args: --packages-above rosbag2_compression rosbag2_cpp rosbag2_examples_cpp rosbag2_examples_py rosbag2_py rosbag2_storage rosbag2_storage_mcap rosbag2_storage_sqlite3 rosbag2_transport rosbag2_tests
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/13240

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

@MichaelOrlov MichaelOrlov merged commit 1564687 into rolling Feb 6, 2024
14 checks passed
@delete-merged-branch delete-merged-branch bot deleted the morlov/add-topic_id-to-topic_metadata branch February 6, 2024 02:06
@DLu
Copy link
Contributor

DLu commented Jul 12, 2024

There's not much information in the migration notes, and all the examples seem to just use the value 0. So if I'm writing a bag file from scratch (using Python say), should each TopicMetaData i construct use id=0?

MetroRobots/classic_bags#16

@MichaelOrlov
Copy link
Contributor Author

MichaelOrlov commented Jul 12, 2024

The topic_id shall be unique for each topic. The numbering is implementation details for storage plugins.
In the following example

        topic_info = rosbag2_py._storage.TopicMetadata(
            id=0,
            name='synthetic',
            type='example_interfaces/msg/Int32',
            serialization_format='cdr')
        self.writer.create_topic(topic_info)

it is assigned to 0 because it will be overwritten by the storage plugin after a call to the writer.create_topic(topic_info)
You can get assigned value by checking topic_info.id after writer.create_topic(topic_info).

Therefore - no special steps for migration are required.

@DLu
Copy link
Contributor

DLu commented Jul 14, 2024

I guess what I'm trying to figure out is why zero is not the default value, if it is just going to be overwritten. Then the TopicMetadata constructor could have stayed the same.

@MichaelOrlov
Copy link
Contributor Author

Well, it is zero-initialized by default. The TopicMetadata is a struct and doesn't have any explicit constructors.
However, the constructor could be added.
As always, PRs with improvements are welcome!

@DLu
Copy link
Contributor

DLu commented Jul 15, 2024

In the Python bindings introduced by @r7vme in #1569, there is an explicit constructor, and id is first argument. However, I will admit I'm not fully familiar with "stubs" or the ... notation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants