-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tracing instrumentation for intra-process #2091
Conversation
TracepointsWe propose to add the following tracepoints mainly targeting ring buffers.
|
The relationship between each tracepoint is summarized in the following ER diagram. We are now able to track messages communicated using intra-process as below. The above method of tracking messages using thread id can be used based on the following condition
We are concerned that changes in specifications due to future ROS2 development may make the above methods unavailable. |
NotesSince this PR has multiple repositories in scope, it is necessary to make changes to multiple repositories at the same time. See also: |
Is this intention? i do not really know the history, but adding intra-process communication tracepoint could be useful? @ymski once it is ready to review, we are happy to review. and probably it would be nice to talk about this Client WG as well. |
Happy to see this PR @ymski!
@fujitatomoya we indeed never instrumented intra-process communications in a way that allows tracking messages from publication to callback, because supporting the default use-case (i.e., network) was enough. If you search for "intra" in the |
I understand that this works currently, but yeah it might stop working in the future. Instead of relying on the TID to link an I've taken a quick look at the intra-process code, and this seems a bit hard to do due to all the layers ( This would allow linking publisher handle/message -> buffer address -> index -> callback, which would be more robust and would probably not break in the future. Note that tracepoints are part of the |
Thank you for your reply. The draft PR was used to confirm the error in the GitHub Actions, but it does not seem to be resolved without merging the relevant PR. I have now changed from draft PR to PR. I think we should continue this discussion on the issues mentioned above. |
Thank you for your comment. @fujitatomoya
I am glad you said this. Thank you for your comment. @christophebedard |
@ymski thanks for iterating, i am happy to do review on this 👍 |
rclcpp/include/rclcpp/experimental/buffers/ring_buffer_implementation.hpp
Outdated
Show resolved
Hide resolved
rclcpp/include/rclcpp/experimental/buffers/ring_buffer_implementation.hpp
Outdated
Show resolved
Hide resolved
waiting for another review, and after that i will start CI. |
@fujitatomoya |
@christophebedard By changing the index argument of the trace point of ring_buffer to message address, the following link can be created. I think the link on the enqueue side is now OK, but the link without TID on the dequeue side is still unresolved. I think this is the same problem as the way I think if there is a trace point like |
Hi, @christophebedard ! The following is an extract from the trace data, showing only the trace points of
In this case, the matching between case1
case2
The same problem can occur when copying messages. This is due to the lack of information about where the messages used to execute the callback came from. Therefore, the method described above is proposed to trace the messages. case1 with message track:
(note: It might be more ideal if What do you think of the robustness of this approach? |
@ymski thank you for the example. Just to take a step back: On the publisher side, I think we might be able to assume that On the subscription side, we shouldn't assume that @wjwwood what do you think? |
Then the So the |
@christophebedard thanks for sharing how to link in ros2 tracing. I also understand the method you suggested.
I think the same. Here is an extract of the trace data when the subscription of the cyclic_pipeline demo is simply increased to two. They have the same TID and can be linked.
I understand the linking method of the publisher's side as follows. The linking method by TID can be used to obtain a list of enqueued addresses. On the other hand, the linking method using the Is this understanding correct? |
I'm not sure I understand what the |
I am sorry @christophebedard I thought we wanted to make sure the tid matched for successive enqueues. In that case, the following repository, which contains examples of measured applications and trace data, would be helpful. By the way, I am not going to stick to the tid linking method. My concern was the magnitude of the impact of adding trace points to track copies, and I think a linking method that does not use tid is a healthier and preferable implementation for the ros2 project. |
Thank you, that repository and example traces are helpful. Let me know once you implement the solution to avoid linking using TIDs for both pub and sub. Also, how do you plan on linking the ring buffer to its subscription, i.e.,
|
No. It would not be a good idea to make the above assumption if we do not use TIDs for links. As you know, the tracepoints added in the current PR are not enough to link subscriptions and buffers. Therefore, I am considering adding a tracepoint called
|
I'm guessing your diagram should link Also, what if you have 2 subscription with the same topic name? You wouldn't be able to know which buffer corresponds to which subscription. Or do you assume that a publisher with a given topic name will always put the message in the ring buffers of subscriptions with the same topic name? |
Thanks fou your comment @christophebedard I will rethink it, including your advice. This may take some time. |
@christophebedard
This shows the relationship between trace points, including newly added. |
@ymski looking at it quickly, that looks good. I'll try generating a trace using your intra_process_demo and look at the output to properly validate. One question: do you link the dispatched message to the callback (buffer dequeue -> callback start/end) by simply taking the next callback start event for the subscription corresponding to the buffer? Or do you want to add another tracepoint to link the dispatched message to the callback start event (as you suggested), or possibly add a new |
@christophebedard thank you for your review.
Thank you for your question about dispatched message to the callback. I was just about to discuss that as well. I think from a robustness point of view it would be desirable to add trace points using message addresses. As for how to add information about the message address, I think it should be convenient for future development of ros2_tracing. As a maintainer, what do you think is better? |
In general, I would prefer adding a message address field to the existing However, I think we don't strictly need it for the moment: it currently works fine in ROS 2 (as I explained in #2091 (comment)) and it should work fine for this (as I mentioned above). Since the message address can be re-used anyway (so you could get multiple |
@christophebedard By the way, we use the buffer index to trace messages, but now that I think about it, it seems more appropriate to use addresses. Is it OK to switch to tracepoints using addresses? Also, as I mentioned in the discussion above, it would be useful to be able to see how much data has accumulated in the buffer. If there is demand, the number of data could be included in |
I think addresses make sense for consistency.
In my mind, while the value can be reconstructed, this seems like a relatively cheap thing to trace to avoid having going through that exercise. |
@mjcarroll I am glad to hear that comment. Unfortunately though, when I thought about the implementation, it turned out that replacing the index with a message address might be a bit difficult. This is due to the template implementation of the ring buffer. As the results of the analysis are not affected by either index or message address, I will adopt the index. I have also added If there are no problems with the trace points to be added, we can proceed to code review. What do you think @christophebedard ? |
Sorry for the delay, I've been pretty busy. I'll take a look tomorrow.
Makes sense. Indexes are fine then. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these suggestions correspond to my suggestions in the ros2_tracing
PR.
Sorry again for the delay.
rclcpp/include/rclcpp/experimental/buffers/intra_process_buffer.hpp
Outdated
Show resolved
Hide resolved
rclcpp/include/rclcpp/experimental/buffers/ring_buffer_implementation.hpp
Outdated
Show resolved
Hide resolved
rclcpp/include/rclcpp/experimental/buffers/ring_buffer_implementation.hpp
Outdated
Show resolved
Hide resolved
rclcpp/include/rclcpp/experimental/subscription_intra_process_buffer.hpp
Outdated
Show resolved
Hide resolved
rclcpp/include/rclcpp/experimental/buffers/ring_buffer_implementation.hpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good from my side (see ros2/ros2_tracing#30). I'll let @fujitatomoya rebase the branch on rolling
and review, then I'll run CI. Then we can squash and merge this eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately though, when I thought about the implementation, it turned out that replacing the index with a message address might be a bit difficult. This is due to the template implementation of the ring buffer.
Makes sense. Indexes are fine then.
i think that in design, this information is better to be cached in the data object (in this particular case, ring buffer), not handlers nor users.
and indexes are just fine to use for these cases, i believe.
@ymski can you squash the commits into a single commit and rebase on the latest version of the |
Signed-off-by: Kodai Yamasaki <114902604+ymski@users.noreply.github.com> applied tracepoints for intra_publish Signed-off-by: Kodai Yamasaki <114902604+ymski@users.noreply.github.com> add tracepoints for linking buffer and subscription Signed-off-by: Kodai Yamasaki <114902604+ymski@users.noreply.github.com> rename buf_to_typedIPB Signed-off-by: Kodai Yamasaki <114902604+ymski@users.noreply.github.com> added accumulated data Signed-off-by: Kodai Yamasaki <114902604+ymski@users.noreply.github.com> commit sugesstion Signed-off-by: Kodai Yamasaki <114902604+ymski@users.noreply.github.com> refactor: split long lines Signed-off-by: Kodai Yamasaki <114902604+ymski@users.noreply.github.com> added prefix rclcpp Signed-off-by: Kodai Yamasaki <114902604+ymski@users.noreply.github.com>
c94ba1b
to
4dbeb17
Compare
@fujitatomoya @christophebedard |
CI for this PR and ros2/ros2_tracing#30 is over at ros2/ros2_tracing#30 (comment) |
CI looks good. I merged the |
applied tracepoints for intra_publish add tracepoints for linking buffer and subscription Signed-off-by: Kodai Yamasaki <114902604+ymski@users.noreply.github.com>
Overview
Hi, I'm a member of CARET development team. We would like to officially support some tracepoints added by caret in the rclcpp fork and LD_PRELOAD.
The current tracetools do not support intra-process communication. This Pull Request adds tracing instrumentation to calculate the processing time from
publish
tosubscription callback
. These trace points are important in performance analysis. Processing that may become a bottleneck, such as strict time constraints or handling large data, often uses intra-process communication.