New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Enable multiple profiler consumers and add a timeline/tracing profiler #1788
Conversation
That is very interesting. I want to see this as an opt-in. And given the implementation is not large I think we could merge it into the source code. public delegate void EventProcessedHandler(ref ProfilingEvent profilingEvent, ProfilingEventType eventType);
public static event EventProcessedHandler EventProcessed;
// at the end of Profiler.ProcessEvent
EventProcessed?.Invoke(ref profilingEvent, eventType);
// In your implementation a new static method
TraceProfiler.StartCollectingEvents() => Profiler.EventProcessed += OnProfilingEventProcessed;
TraceProfiler.StopCollectingEvents() => Profiler.EventProcessed -= OnProfilingEventProcessed; And then just add more And if there's something in the current Profiler implementation that prevents it from being that simple - maybe we can look at why is that? |
We can easily add caller info into the Profiler. Other than that I see mainly that you're adding thread info and maintaining a per thread stack in order to stop the Stopwatch in order for Begin/End pairs (already done by ProfilingState). |
I'm curious how we ensure profiling call sites get stripped for release/shipping builds? |
My first concern here would be performance. How big of a deal that is really depends on what you're profiling: A few hundred scopes spread around should be fine, thousands of scopes in a small function that gets called by worker threads is perf death by synchronization overhead. As a quick best-cast vs worst-case example this is just collecting 100k events, without any output, either in a regular BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19045.3324)
AMD Ryzen 5 3600, 1 CPU, 12 logical and 6 physical cores
.NET SDK=8.0.100-preview.7.23376.3
[Host] : .NET 6.0.14 (6.0.1423.7309), X64 RyuJIT AVX2
Job-JPLRAA : .NET 6.0.14 (6.0.1423.7309), X64 RyuJIT AVX2
InvocationCount=1 UnrollFactor=1
I also thought that adding lower level profiling keys would just clutter the existing profiling UI. I think my other issues were really about |
They don't need to. When the profiler is off the JIT can inline the check and skip the calls to it quite well. If the Profiler perf is not great it would be good to determine why and see if we can improve it. Maybe it would make sense to gut it even more and use something closer to the runtime? What we need is basically a simple solution to start/stop scopes and gather perf info and output it to some stream once per frame - either consumed by in game UI or by external tools or written to a file, etc. |
In the current implementation of TracingProfiler/Scope I don't see any is active checks, so how would it avoid that work? Additionally. what about AOT platforms like iOS, etc? |
I think I've finally figured out what I want to do with this. In a first step, replace the Profiler internal buffers with Channels to get rid of the locking and to get support for multiple consumers. Then I can convert the TraceProfiler callsites to the existing functions/datatypes and add the JSON writer as a second consumer. The only functional change I would want to make here is to stop sending Then, as a second PR, we could take another look at the existing API and datatypes:
And then, as a last step, take another look at GPU profiling. Usually I would expect GPU information per queue (on modern APIs), but I don't think that's currently exposed. But I might very well be wrong about that, didn't want to dig to deeply into it for now. As an alternative to rolling our own we could also look at integrating https://github.com/clibequilibrium/Tracy-CSharp, it's definitely well optimized and feature rich, but ofc it's another (non-C#) dependency (+ imgui) and I don't know what limitations the .NET integration might have. |
The stack is only there to support profiling across function boundaries, with ProfilingState you have to pass it along yourself somehow, if begin and end are not in the same scope. (That also means my attempt at limiting event capturing by stack depth was not working, oops. I will remove it, did not seem useful in practice anyway.)
Yeah, no checks in the current version. I will add runtime checks / use the existing ones in Profiler/ProfilingKey. We could add a compile time way of disabling profiling entirely, if it's needed, but I'd prefer having the profiler available in all builds by default. |
I think we might be able to separate the message logging from the profiler. There isn't really a good reason for it to be tied together (except being enabled when profiling is enabled) and it will simplify things a bit. For having variants of the same profiling key attributes should be used. |
- Flip ordering of Profiling results, to sort from longest to shortest. - Only record `End` events in the Histogram, Begin/Mark events don't have a duration.
…ternally. - Channels simplified the addition of multiple subscribers and should be a big perf win, when collecting events from many threads. - Adds thread information to `ProfilingEvent` and `ProfilingState` - Adapted `GameProfilingSystem` to the changes
to output profiling events in 'chrome://tracing' JSON format, which can be analyzed by ui.perfetto.dev and other tools
Probably could have been more explicit here. var url = newUrl ?? reference.Url;
using (var profile = Profiler.Begin(ContentProfilingKeys.ContentReload, url))
{
DeserializeObject(reference.Url, url, obj.GetType(), obj, settings);
} For many assets with different URLs we run the same profiling key called Whereas in this example using (processor.UpdateProfilingState = Profiler.Begin(processor.UpdateProfilingKey, "Entities: {0}", entities.Count))
{
processor.Update(gameTime);
} There isn't any variation in the parameters. using (var profile = Profiler.Begin(GameProfilingKeys.GameSystemLoadContent, gameSystemName))
contentable.LoadContent(); Another example where the parameter passed is making this a variant, because we want to distinctly identify the load time of different game systems. So variants are basically a quick way to (!) dynamically separate a profiling key into a few subkeys. Hope this makes a bit more sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very promising. The perf improvements sound lovely and I see you found that I made a mistake with where I was emitting events to the meter.
Thanks to you I'm now more familiar with built-in channels.
For interested folks, have a look at An Introduction to System.Threading.Channels
This is why there's a concept of a Histogram in meters. It basically looks at the data in terms of duration and allows to view it by max, min, average and percentiles. I think adding a display mode with average aggregation over a second would be a great option! And for the threadpool specifically wouldn't the issue also be around it running on multiple threads in parallel? Thus squeezing more compute time into the same ms? |
Yes, exactly that.
|
That should be it for now. I've marked it as breaking change, but unless I've missed something, it would only be breaking if someone replaced And I imagine the option to output tracing .json and view it externally should also be mentioned in the profiling section of the docs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I could nitpick the whitespace here and there, but it's fine.
I ran Starbreach project to test those changes. I'm happy to merge it. However, you may need to look into performance of |
It seems Android build is broken. FYI, builds don't auto-run on PR from people who are not yet marked as collaborator or contributors (for security reasons). |
It would be nice to mention this new ability in the documentation somewhere. |
PR Details
Description
This PR adds a profiler outputting data in chrome://tracing format (for now) and changes
Profiler
to make that possible.ChromeTracingProfileWriter
consumes events and dumps them to a file which can then be analyzed with tools like chrome://tracing, perfetto or speedscope.Usage
Related Issue
N/A
Motivation and Context
I had trouble pinpointing performance issues with both the existing Profiler and external tools like dotTrace, so I started experimenting with adding something in the vein of chrome://tracing or tracy. In particular the exisiting
Profiler
is quite heavy-weight and doesn't play well with multithreading, while everything happening inside theThreadPool
/Dispatcher
is difficult to understand with sampling profilers as a lot of work happens in anonymous functions.The
TraceProfiler
lets you see work on all threads and lets you look at individual frames to analyze spikes instead of just averages. For me this was also very helpful in understanding the structure of the engine.An example trace from a rendering stresstest opened in perfetto:
The example trace:
tracing_example.json.gz
Open Questions / Todo
Profiler
seemed quite challenging, but maybe it's a bit easier after the refactoring in Refactor Profiler and submit metrics to .NET Diagnostics API #1774 and I should take another look.Types of changes
Checklist