Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make subbuffer size configurable with Trace action #51

Merged

Conversation

cwecht
Copy link
Contributor

@cwecht cwecht commented Mar 27, 2023

Turns out: in a sufficiently large system with sufficiently much going LTTng's buffers may overflow quite quickly - especially during startup. So it is quite convenient to have the ability to configure the buffer size right in the Trace action.

@cwecht cwecht force-pushed the make_subbuffer_size_configurable branch from d477b2a to 915fac9 Compare March 27, 2023 11:48
@christophebedard
Copy link
Member

Thanks for the PR.

We've discussed this before, see #20 and https://gitlab.com/ros-tracing/ros2_tracing/-/issues/129. The conclusion was that there are a lot of parameters, and exposing them all through Python function parameters would be a lot. We were looking for a simple/generic way to let users configure more advanced parameters such as this one, like YAML configuration files, or letting users provide lttng ... commands that we could just run after the main configuration, see https://gitlab.com/ros-tracing/ros2_tracing/-/merge_requests/248.

Of course, in this case, you're only looking to be able to modify the sub-buffer size; you're not looking to support all the parameters I listed in the original GitLab issue.

However, I just wonder if this is really the right fix. What I've done internally at Apex.AI is to just increase default sub-buffer size to 16777216 == 16 MiB, which is 512 times bigger than the default here but still not huge. We've never had a problem with it since then. If the Trace action was only used with the ros2:*/dds:* instrumentation, then I would argue that we should just set a more appropriate/bigger default value (potentially smaller than 16 MiB). However, you could also enable other tracepoints which generate a lot more data (e.g., lttng_ust_libc:*, lttng_ust_cyg_profile*:*, etc.), so that argument doesn't work. Therefore, I think this is fair.

The only thing I would ask is: why not make the kernel sub-buffer size configurable too, potentially as a separate parameter?

@cwecht
Copy link
Contributor Author

cwecht commented Mar 28, 2023

There can be massive amounts of events really quickly if

  1. the system is sufficiently large
  2. the cycle times are sufficiently small
  3. one wants to record things like malloc/realloc/free or posix functions like mutex acquisition and such.

Have the buffer size easily configurabe on a case by case basis made sense to me because it's probably pretty hard to find a sufficienlty large buffer size for every use case.

I didn't consider the kernel buffer size because I didn't run into issues there yet.

However, it's a bit unfortunate, that lttng-load is not exposed via the python API, as it seems to be quite the solution you wanted to have in the first place.

@christophebedard
Copy link
Member

christophebedard commented Mar 29, 2023

There can be massive amounts of events really quickly if

I've done a lot of that, and it's been fine so far with 16 MiB, but anyway I still think this is a good improvement.

What about bumping the default to, say, 16 * 4096 (or more), though? The goal is to make the default value "good enough" for like 95% of "normal" cases.

Also, I think exposing this option to ros2 trace would be good too.

I didn't consider the kernel buffer size because I didn't run into issues there yet.

I've definitely run into issues there, for example when enabling sched_switch. That's why the kernel sub-buffer size is currently way bigger than the userspace one. However, we can just leave it there for now.

However, it's a bit unfortunate, that lttng-load is not exposed via the python API, as it seems to be quite the solution you wanted to have in the first place.

Yeah, it's unfortunate that the Python API is not complete. We could always just call lttng load using subprocess, though. Something to thinking about for the future.

@christophebedard
Copy link
Member

@cwecht FYI the feature freeze for Iron is next week, on April 17th. If you don't have time before then, I can take care of the rest if you want.

A rebase would be good, by the way.

@cwecht cwecht force-pushed the make_subbuffer_size_configurable branch from 915fac9 to 21dc81e Compare April 13, 2023 06:07
@cwecht
Copy link
Contributor Author

cwecht commented Apr 13, 2023

I also added the respective parameter for the kernel events buffer: we ran in the same issue here too.

cwecht and others added 5 commits April 13, 2023 17:22
Signed-off-by: Christopher Wecht <cwecht@mailbox.org>
Signed-off-by: Christopher Wecht <cwecht@mailbox.org>
Signed-off-by: Christophe Bedard <christophe.bedard@apex.ai>
Signed-off-by: Christophe Bedard <christophe.bedard@apex.ai>
Signed-off-by: Christophe Bedard <christophe.bedard@apex.ai>
@christophebedard
Copy link
Member

I rebased, fixed some linter-related stuff, fixed the frontend parsing, and updated the test_tracetools_launch tests to cover this. I also renamed {ust,kernel}_subbuffer_size to subbuffer_size_{ust,kernel} since we already have events_{ust,kernel}.

I'll let you have a look and then I'll run CI.

Signed-off-by: Christophe Bedard <christophe.bedard@apex.ai>
@cwecht
Copy link
Contributor Author

cwecht commented Apr 14, 2023

LGTM!

Copy link
Member

@christophebedard christophebedard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing --packages-above tracetools_trace:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

@christophebedard
Copy link
Member

christophebedard commented Apr 14, 2023

Thanks for the improvement! I'll eventually expose this through ros2 trace.

@christophebedard christophebedard changed the title Make subbuffer size configurable Make subbuffer size configurable with Trace action Apr 14, 2023
@christophebedard christophebedard merged commit 67d3579 into ros2:rolling Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants