Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added memory optimisations for GetLastActiveSpan #2642

Merged
merged 4 commits into from
Sep 24, 2023

Conversation

jamescrosswell
Copy link
Collaborator

@jamescrosswell jamescrosswell commented Sep 20, 2023

Overview

When load testing an ASP.NET Core application that creates a lot of spans, GetLastActiveSpan was responsible for a significant proportion of memory allocation:

                    // The endpoint being load tested
                    endpoints.MapGet("/spans", async context =>
                    {
                        for (int i = 0; i < 1000; i++)
                        {
                            var span = SentrySdk.GetSpan()?.StartChild($"child {i}");
                            span?.Finish();
                        }
                        await context.Response.WriteAsync("Hello Spans!");
                    });

image

This appears to be due to the call to Spans.OrderByDescending:

Spans.OrderByDescending(x => x.StartTimestamp).FirstOrDefault(s => !s.IsFinished);

Iteration 1

By replacing this with a for loop, memory consumption in our benchmark dropped by 65.68% from 2109.85 KB to 724.02 KB

Before

Method SpanCount Mean Error StdDev Gen0 Gen1 Allocated
'Create spans for scope access' 1 45.39 us 13.71 us 0.752 us 5.6152 1.7700 19.77 KB
'Create spans for scope access' 10 205.46 us 276.93 us 15.179 us 23.9258 4.3945 85.51 KB
'Create spans for scope access' 100 4,055.16 us 5,006.05 us 274.399 us 601.5625 85.9375 2109.85 KB

After

Method SpanCount Mean Error StdDev Gen0 Gen1 Allocated
'Create spans for scope access' 1 52.05 us 242.80 us 13.309 us 5.2490 1.6479 18.92 KB
'Create spans for scope access' 10 164.61 us 55.00 us 3.015 us 12.2070 4.1504 63.88 KB
'Create spans for scope access' 100 2,287.86 us 1,098.36 us 60.205 us 183.5938 50.7813 724.02 KB

Iteration 2

By instead keeping all the spans on a Stack and inspecting the most recent one(s) to return the last active span when required we were able to reduce memory usage for the 100 spans scenario drops by a further 63.25% from 724.02 KB to 266.09 KB. So about an ~87% reduction vs the original benchmarks.

After

Method SpanCount Mean Error StdDev Gen0 Gen1 Allocated
'Create spans for scope access' 1 28.45 us 13.270 us 0.727 us 4.6387 1.5869 16.64 KB
'Create spans for scope access' 10 84.98 us 6.361 us 0.349 us 9.3994 2.5635 39.17 KB
'Create spans for scope access' 100 657.54 us 54.581 us 2.992 us 57.6172 16.6016 266.09 KB

And finally, a sanity check on a memory snapshot from the same load tests run on the new code:

image

Copy link
Member

@bruno-garcia bruno-garcia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the wrong synchronization mechanism (RWLock seems to be a better fit) but since this isn't mostly just reads once data is written, in which case Immutable type might be better, how about ConcurrentStack instead?

src/Sentry/TransactionTracer.cs Outdated Show resolved Hide resolved
src/Sentry/TransactionTracer.cs Outdated Show resolved Hide resolved
src/Sentry/TransactionTracer.cs Outdated Show resolved Hide resolved
@bruno-garcia
Copy link
Member

bruno-garcia commented Sep 20, 2023

Worth noting; These microbenchmarks works on tightloop on a single thread. This is code that will run concurrently and contention is a big concern, so just as much test on a server scenario under a load test is worth going after.

edit: This will be mainly used by a single thread on a server mode (non hub global mode) so shouldn't be a big concern with contention

@jamescrosswell
Copy link
Collaborator Author

It looks like the wrong synchronization mechanism (RWLock seems to be a better fit) but since this isn't mostly just reads once data is written, in which case Immutable type might be better, how about ConcurrentStack instead?

I thought about ConcurrentStack but I need a lock around both the Peek and the Pop operations in this method (to ensure another thread doesn't do a Push between calling Peek and Pop... which would cause the wrong element to be popped):

public ISpan? PeekActive()
{
lock(_lock)
{
while (TrackedSpans.Count > 0)
{
// Stop tracking inactive spans
var span = TrackedSpans.Peek();
if (!span.IsFinished)
{
return span;
}
TrackedSpans.Pop();
}
return null;
}
}
}

So I think just a vanilla Stack<T> with a lock is the way to go.

@jamescrosswell
Copy link
Collaborator Author

@bruno-garcia off the back of our conversation about upgradable locks, I took a look at ReaderWriterLockSlim.EnterUpgradeableReadLock and I'm still not sure it will work in a multithreaded scenario.

For example:

sequenceDiagram
    Thread1->>LastActiveSpanTracker: EnterUpgradeableReadLock
    Thread2->>LastActiveSpanTracker: EnterUpgradeableReadLock
    Thread1->>LastActiveSpanTracker: EnterWriteLock (Waits for reads to finish)
    Thread2->>LastActiveSpanTracker: EnterWriteLock (Also waits for reads to finish -> Deadlock!)
Loading

And I noticed this in the in the remarks:

Only one thread can enter upgradeable mode at any given time.

I assume that's to prevent the above deadlock scenario from playing out.

If I understand correctly and if we only have blocks of code leveraging either EnterUpgradeableReadLock or EnterWriteLock then I think ReaderWriterLockSlim is basically doing the same thing as lock. Have I understood this correctly or there something I'm still missing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants