Support non interleaving multi-event profiling #107

tomershafir · 2023-08-15T16:04:03Z

Concurrent multi-event collection can interfere each other and skew the results, for example, a bad case, where an app in steady state allocates and computes correspondingly with each other with respect to sampling config, threads performing tlab sampling may be captured by cpu profiler, skewing the flame graph.

A solution may be a fixed wall time window mechanism. Then, you can for example on a cpu-bound app sample cpu 1m then sample tlab 1m then sleep 3m. I may be able to help if you think its useful

korniltsev · 2023-08-16T06:43:11Z

I dont necessarily agree on skewing but this may be useful for other reasons.

We have a ProfilingScheduler interface and default ContinuousProfilingScheduler implementation.
If you only want this scheduler only for your services, I highly recommend implement it on your side(not in the library)

If you want this scheduler to be in the library : There's another community submited scheduler SamplingProfilingScheduler which is disabled by default. It should go there, it preserve capabilities it currently has. Bear in mind it is experimental, we do not commit supporting it, it and may be changed or removed or broken in the future.

tomershafir · 2023-08-16T07:00:10Z

indeed, it may also be useful to manage multi-event aspects like overhead.

Btw, I think SamplingProfilingScheduler should definitely be supported, I think it is the most useful in real productions.

tomershafir · 2023-08-17T15:36:21Z

@korniltsev why not merge the 2 schedulers?

korniltsev · 2023-08-23T06:26:22Z

why not merge the 2 schedulers?

Sampling will make it not continuous profiling but "almost continuous" profiling and it may miss something important. The overhead is small if configured properly. We also have no sampling in our other sdks, java sdk should be aligned with others, we should discuss introducing sampling not for java but for all sdks.
I dont know how to implement it in a way that will satisfy everyone. Like the previous contributor wanted one thing, and now you want a slightly different thing. And next week someone else will come with some other ideas. Thus as I said earlier it is experimental, we do not commit supporting it, it and may be changed or removed or broken in the future.

korniltsev · 2023-08-23T06:26:33Z

сс @Rperry2174

tomershafir · 2023-08-23T08:48:57Z

Continuous profiling is inherently sampling based, so even without sleep() you miss. Also, if you try to configure no sleep policy to match sleep based one by distributing the same load over larger time window to manage overhead, the probability to catch steady state paths of first option is lower than the second.
You should decide if a feature is valuable. If so, I think you should do eventual consistency, rather than block a sdk. I also dont see such strict consistency in the oss wild

korniltsev assigned korniltsev and Rperry2174 Aug 23, 2023

tomershafir mentioned this issue Aug 28, 2023

Feat/prio seq multi event profiling #109

Merged

korniltsev closed this as completed Sep 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support non interleaving multi-event profiling #107

Support non interleaving multi-event profiling #107

tomershafir commented Aug 15, 2023 •

edited

korniltsev commented Aug 16, 2023

tomershafir commented Aug 16, 2023

tomershafir commented Aug 17, 2023

korniltsev commented Aug 23, 2023

korniltsev commented Aug 23, 2023

tomershafir commented Aug 23, 2023 •

edited

Support non interleaving multi-event profiling #107

Support non interleaving multi-event profiling #107

Comments

tomershafir commented Aug 15, 2023 • edited

korniltsev commented Aug 16, 2023

tomershafir commented Aug 16, 2023

tomershafir commented Aug 17, 2023

korniltsev commented Aug 23, 2023

korniltsev commented Aug 23, 2023

tomershafir commented Aug 23, 2023 • edited

tomershafir commented Aug 15, 2023 •

edited

tomershafir commented Aug 23, 2023 •

edited