Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use middleware send and receive timestamps from message_info during recording #1531

Merged
merged 9 commits into from Apr 11, 2024

Conversation

jmachowinski
Copy link
Contributor

@jmachowinski jmachowinski commented Jan 8, 2024

This should create a better timestamp in cases of high load.

Note, I am unsure, if we should use the received_timestamp or if the source_timestamp would be the better option.

Disclaimer: The send_timestamp will be available only for the MCAP storage format for a while, since we already have a special field for it in MCAP format specification.

@jmachowinski jmachowinski requested a review from a team as a code owner January 8, 2024 17:11
@jmachowinski jmachowinski requested review from MichaelOrlov and hidmic and removed request for a team January 8, 2024 17:11
@jmachowinski
Copy link
Contributor Author

Note, this was made possible by the merge of ros2/rclcpp#1928

@fujitatomoya
Copy link
Contributor

CC: @MichaelOrlov

Copy link
Contributor

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmachowinski @fujitatomoya Not this way, and unfortunately not so easy.

We have reserved special fields publishTime and sequence for mcap storage format specifically for this use case. However, currently they are populated with dummy data.

mcap_msg.sequence = 0;
if (msg->time_stamp < 0) {
RCUTILS_LOG_WARN_NAMED(LOG_NAME, "Invalid message timestamp %ld", msg->time_stamp);
}
mcap_msg.logTime = mcap::Timestamp(msg->time_stamp);
mcap_msg.publishTime = mcap_msg.logTime;

From mcap vendor Message struct

  /**
   * @brief An optional sequence number. If non-zero, sequence numbers should be
   * unique per channel and increasing over time.
   */
  uint32_t sequence;
  /**
   * @brief Nanosecond timestamp when this message was recorded or received for
   * recording.
   */
  Timestamp logTime;
  /**
   * @brief Nanosecond timestamp when this message was initially published. If
   * not available, this should be set to `logTime`.
   */
  Timestamp publishTime;

The proper implementation will be to add those two fields publish_time and sequence_number to the rosbag2_storage::SerializedBagMessage struct

struct SerializedBagMessage
{
std::shared_ptr<rcutils_uint8_array_t> serialized_data;
rcutils_time_point_value_t time_stamp;
std::string topic_name;
};

and change or add a new rosbag2_cpp::Writer::write(..) API
* Write a serialized message to a bagfile.
* The topic will be created if it has not been created already.
*
* \warning after calling this function, the serialized data will no longer be managed by message.
*
* \param message rclcpp::SerializedMessage The serialized message to be written to the bagfile
* \param topic_name the string of the topic this messages belongs to
* \param type_name the string of the type associated with this message
* \param time The time stamp of the message
* \throws runtime_error if the Writer is not open.
*/
void write(
std::shared_ptr<const rclcpp::SerializedMessage> message,
const std::string & topic_name,
const std::string & type_name,
const rclcpp::Time & time);

to be able to pass those two new fields from the rosbag2 transport layer. i.g. recorder.

@jmachowinski
Copy link
Contributor Author

The proper implementation will be to add those two fields publish_time and sequence_number to the rosbag2_storage::SerializedBagMessage struct

Do we really need the sequence_number ? I don't see any practical value in it, though I might be missing something...

@jmachowinski
Copy link
Contributor Author

Updated the patch.

@MichaelOrlov I also started to implement your suggestion, of passing the send timestamp along.
While doing this, I was wondering, if we should add a recoder option for using the send timestamp as recv timestamp.

The reason for this suggestion would be, that none of the tooling and the API seems to be set up to use the send timestamp, but by switching the timestamps, we could 'hack' our way around this shortcoming.

@MichaelOrlov
Copy link
Contributor

@jmachowinski As reagards:

Do we really need the sequence_number ? I don't see any practical value in it, though I might be missing something...

The sequence number needed to be able to detect on what step messages were lost and how many of them were lost. Or the other way around to make sure that it wasn't data losses during recording. i.e. check for data consistency.

As regards:

While doing this, I was wondering, if we should add a recoder option for using the send timestamp as recv timestamp.

The reason for this suggestion would be, that none of the tooling and the API seems to be set up to use the send timestamp, but by switching the timestamps, we could 'hack' our way around this shortcoming.

No hacking solutions will be accepted for the rosbag2. Sorry, I have a strong opinion on that.
Instead, we would add an extra parameter for the rosbag2 player to be able to playback based on the published timestamp.

@fujitatomoya
Copy link
Contributor

@MichaelOrlov

The sequence number needed to be able to detect on what step messages were lost and how many of them were lost. Or the other way around to make sure that it wasn't data losses during recording. i.e. check for data consistency.

i think use case of sequence is clear but, i guess @jmachowinski was asking what can be filled in it from rmw_message_info_t? if i am not mistaken... @jmachowinski correct me if i got wrong.

i guess, we can generate incremental number for sequence during recording, but what's in it for us?
or are you asking the we need to formulate the sequence number based on publisher_gid and publication_sequence_number to detect the message gap?

@jmachowinski
Copy link
Contributor Author

No hacking solutions will be accepted for the rosbag2. Sorry, I have a strong opinion on that.
Instead, we would add an extra parameter for the rosbag2 player to be able to playback based on the published timestamp.

I also got a strong opinion against workarounds. But in this case I don't see a good way around it. The problem is, that its not only this library. We got rqt_bag and foxglove studio down the line, that would need patches as well.

For our use case the receive timestamp is mostly useless and we are way more interested in analyzing the data in send order. And I guess we are not the only one...

This leads to a more general question, is the receive timestamp actually relevant to anyone ?

The sequence number needed to be able to detect on what step messages were lost and how many of them were lost. Or the other way around to make sure that it wasn't data losses during recording. i.e. check for data consistency.

Okay, that is a valid application.
Which leads to followup questions:

  • Is there anyone really evaluating this field ?
  • Is it worth the memory overhead per message, or would print a warning on recording also be sufficient ?

@fujitatomoya The technical part on how to fill in the sequence number is clear to me, its more about is it worth the memory overhead. Adding a uint32_t might not sound like much, but for comparison, we have a own implementation, of the snapshot feature, which has a memory food print that is about ~1/3 smaller compared to rosbag2, as we don't save the topic name but a 16 bit topic id in every message.

@MichaelOrlov
Copy link
Contributor

@jmachowinski As regards to the:

I also got a strong opinion against workarounds. But in this case I don't see a good way around it. The problem is, that its not only this library. We got rqt_bag and foxglove studio down the line, that would need patches as well.

It only means that if other tools like Foxglove studio and rqt_bag need to be extended for the new functionality they will get it eventually. AFAIK Foxglove studio doesn't have this feature only because we are not yet writing valid data to the relevant fields from the rosbag2.

As regards

This leads to a more general question, is the receive timestamp actually relevant to anyone ?

Well, at least currently everybody else using the receive timestamp and changing it by default to the published timestamp may cause breakage in workflow and other algorithms. The safest way will be to add an option with the new fields and new functionality.

As regards to the:

Adding a uint32_t might not sound like much, but for comparison, we have a own implementation, of the snapshot feature, which has a memory food print that is about ~1/3 smaller compared to rosbag2, as we don't save the topic name but a 16 bit topic id in every message.

This is not true with MCAP file format anymore because:

  1. We don't save the topic name as a string per each message in MCAP format. See
    const auto topic_it = topics_.find(msg->topic_name);
    if (topic_it == topics_.end()) {
    throw std::runtime_error{"Unknown message topic \"" + msg->topic_name + "\""};
    }
    // Get Channel reference
    const auto channel_it = channel_ids_.find(msg->topic_name);
    if (channel_it == channel_ids_.end()) {
    // This should never happen since we adding channel on topic creation
    throw std::runtime_error{"Channel reference not found for topic: \"" + msg->topic_name + "\""};
    }
    mcap::Message mcap_msg;
    mcap_msg.channelId = channel_it->second;
  2. We are not adding new fields to the storage. Those fields are already present in MCAP file format and currently written with dummy data.

@jmachowinski
Copy link
Contributor Author

jmachowinski commented Jan 10, 2024

This is not true with MCAP file format anymore because

I am talking about RAM usage, not SSD/HDD usage.

The problem is within SerializedBagMessage that is stored in CircularMessageCache while using the snapshot feature.
If we add additional members to this structure it will have an impact on the RAM usage.

This leads to a more general question, is the receive timestamp actually relevant to anyone ?

Well, at least currently everybody else using the receive timestamp and changing it by default to the published timestamp may cause breakage in workflow and other algorithms. The safest way will be to add an option with the new fields and new functionality.

I agree that it is the safest route, that is out of the question.

I am more wondering, if it was a deliberate decision to use the receive timestamp in the first place, or was it just used because it was available.
And as a followup question, would it make sense to do a transition away from the receive timestamp in the long run ?

@fujitatomoya
Copy link
Contributor

if it was a deliberate decision to use the receive timestamp in the first place, or was it just used because it was available.
And as a followup question, would it make sense to do a transition away from the receive timestamp in the long run ?

good question.
in the application perspective especially dealing with sensing data, i do need published time(close to data generation time) more than received time to see if we can use this data. but that is just my opinion, maybe there could be more history for this decision.

@MichaelOrlov
Copy link
Contributor

@jmachowinski @fujitatomoya
As regards to the

if it was a deliberate decision to use the receive timestamp in the first place, or was it just used because it was available.
And as a followup question, would it make sense to do a transition away from the receive timestamp in the long run ?

I also don't know the answer to this question. It was before me.
I don't think that it make sense to do a complete transition away from the receive timestamp in the long run.
I envision it to be useful when will need to determine message delay on the transport layer at least.

Another valuable point that was likely driving the decision is clock synchronization on different ECUs and nodes.
On a heterogenous system with multiple HW units, clock synchronization is not an easy task, especially during the R&D phase or quick prototyping.
Without precise clock synchronization on such a configuration, it will be impossible to playback messages using the sent timestamp since they likely will be out of order in the time domain after recording.

@jmachowinski As regards:

am talking about RAM usage, not SSD/HDD usage.

The problem is within SerializedBagMessage that is stored in CircularMessageCache while using the snapshot feature.
If we add additional members to this structure it will have an impact on the RAM usage.

Noted. As mitigation for such a problem, I only can suggest to start using topic_id instead of the topic_name in the SerializedBagMessage as you mentioned above. Perhaps topic_id should be returned by the underlying storage plugin e.g MCAP or SQLIte3 when adding a new topic to the storage. And will need yet to figure out how to match topic_id to the topic_name during the playback.
However, anyway, we need two new additional fields for the published_timestamp and sequence_number in the rosbag2_storage::SerializedBagMessage struct.

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-for-2024-01-18/35779/1

@jmachowinski
Copy link
Contributor Author

jmachowinski commented Feb 6, 2024

I added the publisher timestamp. And renamed the existing one to receive timestamp.

The sequence number is a different beast, and I would put it into a separate pull request.
The problem with the sequence number is, you have either the receive sequence number, which is continuous, but does not tell you, if you lost a msg in transmission, as it is incremented every time a message is received. So there is no real point in recording it.

The second available sequence number is the one of the sender, which would tell you that you lost messages in transmission, but it will jump all over the place if you got multiple senders. Don't have an idea how to solve this...

Copy link
Contributor

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmachowinski Thanks for your contribution.

I think it does make sense to use rcutils_timestamp_value_t instead of the rclcpp::Time in the newly added write(..) API for efficiency. See explanations and details in the dedicated comment.
Also, I would like to ask a renames for the time stamp variables. See motivation and explanation in the dedicated comment.

rosbag2_cpp/include/rosbag2_cpp/writer.hpp Outdated Show resolved Hide resolved
rosbag2_cpp/src/rosbag2_cpp/converter.cpp Outdated Show resolved Hide resolved
rosbag2_cpp/src/rosbag2_cpp/writer.cpp Outdated Show resolved Hide resolved
rosbag2_cpp/src/rosbag2_cpp/writer.cpp Outdated Show resolved Hide resolved
rosbag2_py/src/rosbag2_py/_writer.cpp Outdated Show resolved Hide resolved
rosbag2_transport/src/rosbag2_transport/recorder.cpp Outdated Show resolved Hide resolved
@MichaelOrlov
Copy link
Contributor

MichaelOrlov commented Feb 14, 2024

@jmachowinski As regards

I added the publisher timestamp. And renamed the existing one to receive timestamp.

The sequence number is a different beast, and I would put it into a separate pull request. The problem with the sequence number is, you have either the receive sequence number, which is continuous, but does not tell you, if you lost a msg in transmission, as it is incremented every time a message is received. So there is no real point in recording it.

The second available sequence number is the one of the sender, which would tell you that you lost messages in transmission, but it will jump all over the place if you got multiple senders. Don't have an idea how to solve this...

Yeah, that is not easy. We tried to brainstorm a situation around the sequence number during the last Rosbag2 and Tooling Working Group meeting last Friday.
The straightforward solution would be to record or encode in the sequence number the publisher id along with each message to distinguish and rule out the situation when multiple publishers exist on the same topic.
However, it turns out that publisher_gid type defined as

typedef struct RMW_PUBLIC_TYPE rmw_gid_s
{
  /// Name of the rmw implementation
  const char * implementation_identifier;

  /// Byte data GID value
  uint8_t data[RMW_GID_STORAGE_SIZE];
} rmw_gid_t;

where #define RMW_GID_STORAGE_SIZE 16u
i.e. We would need an extra 16 bytes for each message to store the publisher id along with it.
This sounds like a prohibitive overhead which we can't afford.

After some more consideration, it seems I found an elegant and I think proper solution.
If we start differentiating topics by topic_id assigned by the storage plugin instead of the topic_name i.e. finish with #1553. We can add publisher_id to the TopicMetadata and create unique topic_id aka channel_id for each publisher. Then we will split the recording per each publisher as a separate topic. And the problem will go away since the sequence number will be unique per channel. We can keep unorderd_map on the rosbag2_transport layer to find a match between publisher_id and generated topic_id and will not need to save publisher id with each message.
However will need to figure out yet how to fuse those topic channels during the playback. Likely there will not be extra effort and they will be organically aligned with the current design since the messages will be saved and playback in the same order since they are arriving in the same order and timestamp. We usually just read the next message from storage. If we will need some sorting before playback then we will need to account that topic could be split on multiple channels. But this is another story...

As regards the current implementation.
I am ok if we will add a sequence number field in the separate follow-up PR.
However, it would be better preferable to add it to the scope of this PR at least to the newly added write(..) API with timestamps as a placeholder. It could be assigned to 0 for now. The motivation is that each time when we have to change API we can't really add some parameters to the method or function we have to create a new one and deprecate the old one. To avoid breakage in someone's downstream code and dependent packages.

cc: @fujitatomoya @clalancette

@jmachowinski jmachowinski force-pushed the better_timestamps branch 3 times, most recently from 6fe4592 to f7f7a51 Compare February 14, 2024 16:40
@jmachowinski
Copy link
Contributor Author

@MichaelOrlov adopted code to the request changes

@MichaelOrlov
Copy link
Contributor

@jmachowinski The TEST_F(RecordIntegrationTestFixture, record_all_with_sim_time)
https://github.com/cellumation/rosbag2/blob/aa9fed89abe82fe25d45518a2315ee9d65c8e5f6/rosbag2_transport/test/rosbag2_transport/test_record_all_use_sim_time.cpp#L140-L142
failed on Windows CI job
https://ci.ros2.org/job/ci_windows/21087/testReport/junit/rosbag2_transport/RecordIntegrationTestFixture/record_all_with_sim_time/

It turns out that send_timestamp is equal to 0 on Windows.

cc: @clalancette Do you know if we have a different implementation for message_info::source_timestamp on Windows?

@clalancette
Copy link
Contributor

cc: @clalancette Do you know if we have a different implementation for message_info::source_timestamp on Windows?

I don't really know. It looks like it only happens on Connext, so maybe @asorbini knows.

@MichaelOrlov
Copy link
Contributor

@jmachowinski @clalancette I've found the place where it is settled up to zero.
https://github.com/ros2/rmw_connextdds/blob/d2bd90826e19097a429ba150b12e67d3822c1ccc/rmw_connextdds_common/src/common/rmw_impl.cpp#L1970-L1983

rmw_connextdds_message_info_from_dds(
  rmw_message_info_t * const to,
  const DDS_SampleInfo * const from)
{
  rmw_connextdds_ih_to_gid(from->publication_handle, to->publisher_gid);
// Message timestamps are disabled on Windows because RTI Connext DDS
// does not support a high enough clock resolution by default (see: _ftime()).
#if !RTI_WIN32
  to->source_timestamp = dds_time_to_u64(&from->source_timestamp);
  to->received_timestamp = dds_time_to_u64(&from->reception_timestamp);
#else
  to->source_timestamp = 0;
  to->received_timestamp = 0;
#endif /* !RTI_WIN32 */

We need either a workaround for it or a fix in the rmw_connextdds_message_info_from_dds(..).
@asorbini Where is RTI_WIN32 defined? I can't find it. I am curious if we could use it in the rosbag2 or maybe if would need to create the same definition in the rosbag2 to what should we map?

@jmachowinski
Copy link
Contributor Author

@MichaelOrlov Should I add a workaround ? I would go with an ifdef in the recorder.cpp that would disable this feature if RTI_WIN32 is set.

@MichaelOrlov
Copy link
Contributor

@jmachowinski Yes, please. However, I am not sure that RTI_WIN32 will be visble/(ever defined) on rosbag2 level.
Anyway, let's give it a try.

Please note that we need to fallback to the old behavior in recorded if RTI_WIN32 is defined not only in test.

@jmachowinski
Copy link
Contributor Author

@MichaelOrlov added a workaround, that disables this feature on windows.

Copy link
Contributor

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmachowinski It turns out that in this workaround we are falling back to the old behavior for all RMWs under the Windows.
However, we need to make an exclusion only for the RTI Connext DDS.
Perhaps we can use following if construct to detect it

 if (std::string(rmw_get_implementation_identifier()).find("rmw_connextdds") != 0) {

Also, will need to adjust a newly added test for this workaround as well.

@jmachowinski
Copy link
Contributor Author

I was under the impression, that we don't have any rwm timestamp support on windows, regardless of the DDS implementation. I updated the patch, to only blacklist connextdds. I'll have a look at the test later.

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-for-2024-03-21/36814/1

Janosch Machowinski and others added 7 commits April 10, 2024 19:43
… case

This should create better timestamp in cases of high load.

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
…ime_stamp to receive_time_stamp

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
…amp to send_timestamp

Co-authored-by: Michael Orlov <morlovmr@gmail.com>
Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
…extdds.

The windows rmw connextdds does not provide a value in source timestamp
nor in receive timestamp, therefore we fall back to current node time.

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Copy link
Contributor

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmachowinski Thanks! LGTM.

Copy link
Contributor

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmachowinski Sorry. Completely forgot.
Can you please regenerate Python stub files (.pyi)?
The instruction is in the https://github.com/ros2/rosbag2/blob/rolling/rosbag2_py/README.md

@MichaelOrlov
Copy link
Contributor

Gist: https://gist.githubusercontent.com/MichaelOrlov/7fafb68b299a4679142305c2a3cf6c99/raw/02674aa34517082806b39b18fdf48b8cbb37791d/ros2.repos
BUILD args: --packages-above-and-dependencies rosbag2_compression rosbag2_cpp rosbag2_performance_benchmarking rosbag2_py rosbag2_storage rosbag2_storage_mcap rosbag2_storage_sqlite3 rosbag2_transport rosbag2_tests
TEST args: --packages-above rosbag2_compression rosbag2_cpp rosbag2_performance_benchmarking rosbag2_py rosbag2_storage rosbag2_storage_mcap rosbag2_storage_sqlite3 rosbag2_transport rosbag2_tests
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/13673

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Copy link
Contributor

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with green CI

@MichaelOrlov MichaelOrlov merged commit bee10b4 into ros2:rolling Apr 11, 2024
13 of 14 checks passed
mikaelarguedas added a commit to mikaelarguedas/plotjuggler-ros-plugins that referenced this pull request Apr 27, 2024
Signed-off-by: Mikael Arguedas <mikael.arguedas@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants