Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ros2_tracing tracepoints #120

Merged
merged 3 commits into from Jan 23, 2024
Merged

Conversation

cwecht
Copy link
Contributor

@cwecht cwecht commented May 15, 2023

Depends on ros2/ros2_tracing#74; implements: ros2/ros2_tracing#73

This PR introduces ros2_tracing tracepoints which are really helpful for examining system performance. The dependency to the ros2_tracing PR stems from the fact, that one of the original tracepoints had to be extended in order to get the timestamp during writing to DDS.

In order to do that, DDS_DataWriter_write_untypedI had to be replaced with DDS_DataWriter_write_w_timestamp_untypedI. At first it seemed reasonable to use` in order to get the respective timestamp.

This seems to work, but DDS_DomainParticipant_get_current_time introduces a massive performance penality. The flamegraph below shows the result of profiling it using perf.
grafik

From https://community.rti.com/kb/how-clocks-are-used-connext-dds I'd conclude, that we need the "external clock" used by Connext here. Looking at the perf profile, it seems that there is a method/function DDS_DomainParticipant_get_external_clockI which would provide to us said clock, but it seems that this function is not part of Connexts public API right? Is there any way to get this clock or the timestamp which Connext would use if DDS_DataWriter_write_untypedI is used without paying such a massive performance penalty?

Copy link
Collaborator

@asorbini asorbini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cwecht, thank you for your contribution! The changes look ok, I only had a few minor comments.

One thing to consider is that adding a single new argument to the write() functions is fine, but we might be venturing on a "slippery slope", and it might be better to refactor the functions to accept a single struct argument that contains all extra arguments. E.g.

// New structure for extra arguments. We could include the `serialized` flag too,
// but that's already part of RMW_Connext_Message
struct RMW_Connext_WriteParams {
  DDS_Time_t timestamp{DDS_TIME_INVALID};
  int64_t * sn_out{nullptr};
};

// Updated write() functions
rmw_connextdds_write_message(
  RMW_Connext_Publisher * const pub,
  RMW_Connext_Message * const message,
  RMW_Connext_WriteParams * const params);

class RMW_Connext_Publisher {
  write(
    const void * const ros_message,
    const bool serialized,
    RMW_Connext_WriteParams * consts params = nullptr);
}

@@ -2973,6 +2994,14 @@ RMW_Connext_Service::send_response(
rr_msg.gid.implementation_identifier = RMW_CONNEXTDDS_ID;
rr_msg.payload = const_cast<void *>(ros_response);

DDS_Time_t timestamp;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are using Connext's C API we should initialize the variable to a known state, e.g.

Suggested change
DDS_Time_t timestamp;
DDS_Time_t timestamp = DDS_TIME_INVALID;

if (DDS_RETCODE_OK !=
DDS_DomainParticipant_get_current_time(this->reply_pub->dds_participant(), &timestamp))
{
RMW_CONNEXT_LOG_ERROR_SET("failed to write message to DDS")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change the message to be a bit more specific to this failure, e.g.

Suggested change
RMW_CONNEXT_LOG_ERROR_SET("failed to write message to DDS")
RMW_CONNEXT_LOG_ERROR_SET("failed to get current time from DDS DomainParticipant")

@asorbini
Copy link
Collaborator

From https://community.rti.com/kb/how-clocks-are-used-connext-dds I'd conclude, that we need the "external clock" used by Connext here. Looking at the perf profile, it seems that there is a method/function DDS_DomainParticipant_get_external_clockI which would provide to us said clock, but it seems that this function is not part of Connexts public API right? Is there any way to get this clock or the timestamp which Connext would use if DDS_DataWriter_write_untypedI is used without paying such a massive performance penalty?

I'm not very familiar with the clock configuration in Connext, and I'm looking into possible alternative solutions. I'll get back after I have a better idea of available options.

@cwecht
Copy link
Contributor Author

cwecht commented May 23, 2023

@asorbini Thank you for the review! I just applied the changes you requested and (forced) pushed them.

@asorbini
Copy link
Collaborator

@cwecht are you sure you pushed the change? The code looks unchanged.

In general, I would prefer if you didn't force push but rather pushed an incremental commit so that the additional changes can be reviewed without having to re-scan through all of the code.

One reason to force push in this case would be to fix the commit message, which should include a DCO (e.g. "Signed-off-by: Your Name your@email"). I guess we are not strictly enforcing this rule and we are missing a CONTRIBUTING document in this repository (like this one), but the convention has been followed by all contributions, as far as I'm aware.

@cwecht cwecht force-pushed the introduce_tracepoints branch 2 times, most recently from 024ad33 to b2a6939 Compare May 25, 2023 06:28
@cwecht
Copy link
Contributor Author

cwecht commented May 25, 2023

@asorbini now it's actually pushed including the signoff.

@cwecht
Copy link
Contributor Author

cwecht commented Aug 30, 2023

I've got some news regarding DDS_DomainParticipant_get_current_time.

I was able to test this PR with connext micro and using DDS_DomainParticipant_get_current_time in this case is just fine; there is basically no significant performance overhead.

For Connext PRO it is a different story:

  1. The traces seem to indicate that the majority of the time in DDS_DomainParticipant_get_current_time is spend looking up the clock which is configured in the respective QoS settings. The documentation seems to indicate that this has to be done since these QoS settings may change at any time which makes it reasonable to look up the clock which is supposed to be used every time.
  2. However, if I understand the current implementation in rmw_connextdds correctly, the clock in use will always be the default clock, as PROPERY attribute of DDS_DomainParticipantQos is never changed. From my understanding this can't even be changed via e.g. XML.
  3. If the above is correct, we should be fine with using the system clock here directly as this should be always the clock which is used by DDS_DomainParticipant_get_current_time by default.

@asorbini could you double check this conclusion?

@cwecht cwecht force-pushed the introduce_tracepoints branch 2 times, most recently from 94f0287 to 02773b9 Compare September 12, 2023 07:38
@cwecht
Copy link
Contributor Author

cwecht commented Sep 12, 2023

@asorbini I've added now rmw_connextdds_get_current_time which encapsulates the distinction between Pro and Micro regarding time retrieval. The performance difference between Pro's DDS_DomainParticipant_get_current_time and this one based on RTIOsapiUtility_getTime is quite impressive: RTIOsapiUtility_getTime takes usually a few hunders of nanoseconds where DDS_DomainParticipant_get_current_time takes usually more like 6-20 microseconds. Do you have any objections regarding this approach?

@cwecht
Copy link
Contributor Author

cwecht commented Nov 14, 2023

@asorbini May I ask you kindly to have another look at this PR? Thank you very much!

Signed-off-by: Christopher Wecht <cwecht@mailbox.org>
Signed-off-by: Christopher Wecht <cwecht@mailbox.org>
@cwecht cwecht force-pushed the introduce_tracepoints branch 3 times, most recently from 610afee to 1acd284 Compare December 18, 2023 07:43
@asorbini
Copy link
Collaborator

asorbini commented Dec 18, 2023

Running full CI with repos for this PR, ros2/rmw_cyclonedds#454, ros2/rmw_fastrtps#694, ros2/ros2_tracing#74:

  • Linux: Build Status
  • Linux aarch64: Build Status
  • Windows: Build Status

ETA: Build failed because of typo in dds_api.hpp.

Signed-off-by: Christopher Wecht <cwecht@mailbox.org>
@asorbini
Copy link
Collaborator

Running CI again:

  • Linux Build Status
  • Linux aarch64 Build Status
  • Windows Build Status

@cwecht
Copy link
Contributor Author

cwecht commented Dec 21, 2023

I'm note quite sure why the latest build failed. Looks more like a jenkins hiccup to me.

@asorbini
Copy link
Collaborator

@cwecht agreed, seems like an unrelated failure. Running Linux again to check: Build Status.

The other two builds are yellow because of a formatting error unrelated to this PR.

@clalancette
Copy link
Contributor

The other two builds are yellow because of a formatting error unrelated to this PR.

I will point out that the nightlies have no issues: https://ci.ros2.org/view/nightly/job/nightly_linux-aarch64_release/2582/ . So it is worth investigating those before merging.

@asorbini
Copy link
Collaborator

@clalancette thank you for the clarification. I totally agree that those issues should be resolved, but they are C++ linter errors and they stem from a different PR (ros2/rmw_cyclonedds#454), hence why I didn't think they'd affect this PR.

Copy link
Collaborator

@asorbini asorbini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI passed (only failure comes from ros2/rmw_cyclonedds#454).

Changes can be merged once ros2/ros2_tracing#74 is in.

@mjcarroll mjcarroll self-assigned this Jan 4, 2024
@mjcarroll mjcarroll merged commit 8d34e7e into ros2:rolling Jan 23, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants