Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non interleaving multi-event profiling #107

Closed
tomershafir opened this issue Aug 15, 2023 · 6 comments
Closed

Support non interleaving multi-event profiling #107

tomershafir opened this issue Aug 15, 2023 · 6 comments
Assignees

Comments

@tomershafir
Copy link
Contributor

tomershafir commented Aug 15, 2023

Concurrent multi-event collection can interfere each other and skew the results, for example, a bad case, where an app in steady state allocates and computes correspondingly with each other with respect to sampling config, threads performing tlab sampling may be captured by cpu profiler, skewing the flame graph.

A solution may be a fixed wall time window mechanism. Then, you can for example on a cpu-bound app sample cpu 1m then sample tlab 1m then sleep 3m. I may be able to help if you think its useful

@korniltsev
Copy link
Collaborator

I dont necessarily agree on skewing but this may be useful for other reasons.

We have a ProfilingScheduler interface and default ContinuousProfilingScheduler implementation.
If you only want this scheduler only for your services, I highly recommend implement it on your side(not in the library)

If you want this scheduler to be in the library : There's another community submited scheduler SamplingProfilingScheduler which is disabled by default. It should go there, it preserve capabilities it currently has. Bear in mind it is experimental, we do not commit supporting it, it and may be changed or removed or broken in the future.

@tomershafir
Copy link
Contributor Author

indeed, it may also be useful to manage multi-event aspects like overhead.

Btw, I think SamplingProfilingScheduler should definitely be supported, I think it is the most useful in real productions.

@tomershafir
Copy link
Contributor Author

@korniltsev why not merge the 2 schedulers?

@korniltsev
Copy link
Collaborator

why not merge the 2 schedulers?

  1. Sampling will make it not continuous profiling but "almost continuous" profiling and it may miss something important. The overhead is small if configured properly. We also have no sampling in our other sdks, java sdk should be aligned with others, we should discuss introducing sampling not for java but for all sdks.
  2. I dont know how to implement it in a way that will satisfy everyone. Like the previous contributor wanted one thing, and now you want a slightly different thing. And next week someone else will come with some other ideas. Thus as I said earlier it is experimental, we do not commit supporting it, it and may be changed or removed or broken in the future.

@korniltsev
Copy link
Collaborator

сс @Rperry2174

@tomershafir
Copy link
Contributor Author

tomershafir commented Aug 23, 2023

  1. Continuous profiling is inherently sampling based, so even without sleep() you miss. Also, if you try to configure no sleep policy to match sleep based one by distributing the same load over larger time window to manage overhead, the probability to catch steady state paths of first option is lower than the second.

  2. You should decide if a feature is valuable. If so, I think you should do eventual consistency, rather than block a sdk. I also dont see such strict consistency in the oss wild

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants