-
Notifications
You must be signed in to change notification settings - Fork 732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve GetUri #2947
Improve GetUri #2947
Conversation
} | ||
}); | ||
#else | ||
return new StringBuilder(length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not directly related to this PR - would a ThreadLocal StringBuilder (or something from a pool) help to reduce allocations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ThreadStatic would be one solution, Andrew Lock describes it in his blog: https://andrewlock.net/a-deep-dive-on-stringbuilder-part-5-reducing-allocations-by-caching-stringbuilders-with-stringbuildercache/ and the source for his tests is https://github.com/dotnet/runtime/blob/main/src/libraries/Common/src/System/Text/StringBuilderCache.cs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
var length = scheme.Length + Uri.SchemeDelimiter.Length + host.Length + pathBase.Length | ||
+ path.Length + queryString.Length; | ||
#if NETSTANDARD2_1 | ||
return string.Create(length, (scheme, host, pathBase, path, queryString), (span, parts) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would hoist the delegate to a static field. Eg:
#if NETSTANDARD2_1
private readonly static SpanAction<char, (string scheme, string host, string pathBase, string path, string queryString)> s_CreateStringFunc = (span, state) =>
{
CopyTo(ref span, parts.scheme);
CopyTo(ref span, Uri.SchemeDelimiter);
CopyTo(ref span, parts.host);
CopyTo(ref span, parts.pathBase);
CopyTo(ref span, parts.path);
CopyTo(ref span, parts.queryString);
static void CopyTo(ref Span<char> buffer, ReadOnlySpan<char> text)
{
if (!text.IsEmpty)
{
text.CopyTo(buffer);
buffer = buffer.Slice(text.Length);
}
}
};
#endif
Invoke like this:
return string.Create(length, (scheme, host, pathBase, path, queryString), s_CreateStringFunc);
Why do that?
As it is written we are relying on the compiler to do the right thing and not allocate a delegate for each invocation. Sometimes it does the right thing, sometimes it does not. It is a bit mysterious to me as to the reasons for that. But even if it does decide to do the right thing, the emitted IL will be to check if it has created a static, and if not, create the static, and then do the invocation. So it is a bit better perf to just make the static initially and avoid the check on each call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mysterious?, it depends on capturing variables, that's why you can now have static expressions and static local methods etc. You're correct that the current code will do a null check against the static method and initiate if required.
I still think it's a bad idea, because the static field would be located (if I follow the other source codes with static fields, including this type) up above the constructor, between other fields:
opentelemetry-dotnet/src/OpenTelemetry.Instrumentation.AspNetCore/Implementation/HttpInListener.cs
Lines 44 to 46 in aa1f3f0
private const string UnknownHostName = "UNKNOWN-HOST"; | |
private static readonly Func<HttpRequest, string, IEnumerable<string>> HttpRequestHeaderValuesGetter = (request, name) => request.Headers[name]; | |
private readonly PropertyFetcher<HttpContext> startContextFetcher = new("HttpContext"); |
And that reduces the readability of the class way too much.
As for performance:
The null check could be elided by Tier2 compilation with static/dynamic pgo, I don't know if that happens here though. The performance difference is not high, the noise in my benchmark runs is higher (I've seen everything between 1ns to 3ns difference):
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=6.0.200
[Host] : .NET 6.0.2 (6.0.222.6406), X64 RyuJIT
DefaultJob : .NET 6.0.2 (6.0.222.6406), X64 RyuJIT
Method | Mean | Error | StdDev | Gen 0 | Allocated |
---|---|---|---|---|---|
StaticLocalMethod | 26.71 ns | 0.123 ns | 0.115 ns | 0.0043 | 72 B |
StringBuilder | 53.75 ns | 0.425 ns | 0.355 ns | 0.0167 | 280 B |
StringBuilderFixedLength | 35.88 ns | 0.311 ns | 0.290 ns | 0.0114 | 192 B |
StaticFunctor | 25.97 ns | 0.071 ns | 0.066 ns | 0.0043 | 72 B |
// See https://aka.ms/new-console-template for more information
using System.Text;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkRunner.Run<StringCreate>();
[MemoryDiagnoser()]
public class StringCreate
{
[Benchmark]
public string StaticLocalMethod()
{
var scheme = "http";
var host = "localhost";
var pathBase = string.Empty;
var path = "/api/foo";
var queryString = string.Empty;
var length = scheme.Length + Uri.SchemeDelimiter.Length + host.Length + pathBase.Length
+ path.Length + queryString.Length;
return string.Create(length, (scheme, host, pathBase, path, queryString), (span, parts) =>
{
CopyTo(ref span, parts.scheme);
CopyTo(ref span, Uri.SchemeDelimiter);
CopyTo(ref span, parts.host);
CopyTo(ref span, parts.pathBase);
CopyTo(ref span, parts.path);
CopyTo(ref span, parts.queryString);
static void CopyTo(ref Span<char> buffer, ReadOnlySpan<char> text)
{
if (!text.IsEmpty)
{
text.CopyTo(buffer);
buffer = buffer.Slice(text.Length);
}
}
});
}
private static readonly System.Buffers.SpanAction<char, (string scheme, string host, string pathBase, string path, string queryString)> s_CreateStringFunc = (span, parts) =>
{
CopyTo(ref span, parts.scheme);
CopyTo(ref span, Uri.SchemeDelimiter);
CopyTo(ref span, parts.host);
CopyTo(ref span, parts.pathBase);
CopyTo(ref span, parts.path);
CopyTo(ref span, parts.queryString);
static void CopyTo(ref Span<char> buffer, ReadOnlySpan<char> text)
{
if (!text.IsEmpty)
{
text.CopyTo(buffer);
buffer = buffer.Slice(text.Length);
}
}
};
[Benchmark]
public string StringBuilder()
{
var scheme = "http";
var host = "localhost";
var pathBase = string.Empty;
var path = "/api/foo";
var queryString = string.Empty;
return new StringBuilder()
.Append(scheme)
.Append(Uri.SchemeDelimiter)
.Append(host)
.Append(pathBase)
.Append(path)
.Append(queryString)
.ToString();
}
[Benchmark]
public string StringBuilderFixedLength()
{
var scheme = "http";
var host = "localhost";
var pathBase = string.Empty;
var path = "/api/foo";
var queryString = string.Empty;
var length = scheme.Length + Uri.SchemeDelimiter.Length + host.Length + pathBase.Length
+ path.Length + queryString.Length;
return new StringBuilder(length)
.Append(scheme)
.Append(Uri.SchemeDelimiter)
.Append(host)
.Append(pathBase)
.Append(path)
.Append(queryString)
.ToString();
}
[Benchmark]
public string StaticFunctor()
{
var scheme = "http";
var host = "localhost";
var pathBase = string.Empty;
var path = "/api/foo";
var queryString = string.Empty;
var length = scheme.Length + Uri.SchemeDelimiter.Length + host.Length + pathBase.Length
+ path.Length + queryString.Length;
return string.Create(length, (scheme, host, pathBase, path, queryString), s_CreateStringFunc);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mysterious?, it depends on capturing variables,
A local function with captures is always going to allocate. The mysterious part I was referring to is sometimes it will allocate a delegate even for static local functions.
Don't believe me? Check this out...
private static readonly Action<(string, int)> FieldCallback = ((string Name, int Id) state) => { };
public static void CallDoSomethingNoAllocation()
{
DoSomething(FieldCallback, ("mystate", 0));
}
public static void CallDoSomethingAllocation()
{
DoSomething(LocalCallback, ("mystate", 0));
static void LocalCallback((string Name, int Id) state)
{
}
}
public static void DoSomething<T>(Action<T> callback, T state)
{
callback(state);
}
CallDoSomethingNoAllocation
compiles to...
IL_0000: ldsfld class [System.Runtime]System.Action`1<valuetype [System.Runtime]System.ValueTuple`2<string, int32>> OpenTelemetry.Benchmarks.Program::FieldCallback
IL_0005: ldstr "mystate"
IL_000a: ldc.i4.0
IL_000b: newobj instance void valuetype [System.Runtime]System.ValueTuple`2<string, int32>::.ctor(!0/*string*/, !1/*int32*/)
IL_0010: call void OpenTelemetry.Benchmarks.Program::DoSomething<valuetype [System.Runtime]System.ValueTuple`2<string, int32>>(class [System.Runtime]System.Action`1<!!0/*valuetype [System.Runtime]System.ValueTuple`2<string, int32>*/>, !!0/*valuetype [System.Runtime]System.ValueTuple`2<string, int32>*/)
CallDoSomethingAllocation
compiles to...
IL_0000: ldnull
IL_0001: ldftn void OpenTelemetry.Benchmarks.Program::'<CallDoSomethingAllocation>g__LocalCallback|2_0'(valuetype [System.Runtime]System.ValueTuple`2<string, int32>)
IL_0007: newobj instance void class [System.Runtime]System.Action`1<valuetype [System.Runtime]System.ValueTuple`2<string, int32>>::.ctor(object, native int)
IL_000c: ldstr "mystate"
IL_0011: ldc.i4.0
IL_0012: newobj instance void valuetype [System.Runtime]System.ValueTuple`2<string, int32>::.ctor(!0/*string*/, !1/*int32*/)
IL_0017: call void OpenTelemetry.Benchmarks.Program::DoSomething<valuetype [System.Runtime]System.ValueTuple`2<string, int32>>(class [System.Runtime]System.Action`1<!!0/*valuetype [System.Runtime]System.ValueTuple`2<string, int32>*/>, !!0/*valuetype [System.Runtime]System.ValueTuple`2<string, int32>*/)
Why did CallDoSomethingAllocation
decide to allocate a delegate? Who knows! That's the mystery I'm referring to. Something to do with the generic, I think.
Anyway from what I have seen the pattern I'm recommending always works to bypass allocations and has a speed benefit so for me the tradeoff in readability is worth it 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both do ldsfld. So depending on where you've seen that code, it might be some bug or unoptimized code.
Apparently the old netfx does it your way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I sent is compiled/optimized with .NET 6.
Not familiar with sharplab. Looks like your link is using a newer roslyn? Cool to see they are making improvements to this! Here is a link that shows the same:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The newer Version includes dotnet/roslyn#58288 which is what you're missing in the other type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Tornhoof Thanks for the link. You are kind of making my point for me 😄 It is a magical thing that requires (currently) experimental compiler features to work sometimes (there are limitations on that PR). I prefer to use the fail-safe pattern (which also has the nice benefit of being faster).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case your problem does not apply, my link just shows that the behavior you're seeing is fixed for C#11 for the work on caching delegates for method group conversions, because that's what your code does.
If you do it without method group call conversion, e.g. DoSomething(q => LocalCallback(q), ("mystate", 0));
it will generate the ldsfld correctly even in old compiler versions (including netfx).
My link was just to tell you that you can skip doing the static functor dance in C#11 and use method groups without them to your heart's content.
In case of the span code, it never did that as it is not a method group conversion call, but a proper functor call with arguments.
As I've shown that the performance difference is negligable, the only remaining point is readability. And I highly value readability of a class over a minor perf enhancement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To unblock, it'll propose to merge this now, and create a new issue tracking further improvements/discussions on this.
This is an internal implementation detail, and involves no public API change, so we can change it anytime.
Could anyone with maintainer access trigger the workflows again, to see if the warning with the unused using is gone with my changes? |
Codecov Report
@@ Coverage Diff @@
## main #2947 +/- ##
==========================================
+ Coverage 84.14% 84.16% +0.01%
==========================================
Files 258 258
Lines 9090 9093 +3
==========================================
+ Hits 7649 7653 +4
+ Misses 1441 1440 -1
|
This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day. |
Anyone interested in merging this PR before the stale bot hits again? |
@Tornhoof Could you add a line to changelog.md file, as this is user visible change (in terms of reduced alloc/improved perf.) |
Merging. This is a nice improvement. There is an unresolved comment from @CodeBlanch , but it is not a blocker, and nice to follow up separately. |
The current GetUri method looks very similar to the GetDisplayUrl() extension method in ASP.NET Core. For that method an open issue exists dotnet/aspnetcore#28906 to improve performance and allocations.
In ASP.NET Core the method can be called by the developer to get an url for display purposes, in the Open Telemetry Listener it is called for each and every request with IsAllDataRequested. So the allocation impact is higher here than in ASP.NET Core.
This PR replaces the existing method with two code paths, one for .NET Standard 2.0 and one for 2.1. The 2.0 code path is the old method with added length calculation to the StringBuilder. The new 2.1 code path uses string.Create to directly create the Display Url based on the calculated length and the supplied values. It is slightly different from the suggested method in the issue above as not everything is available in .NET Standard 2.1.
I changed the tests for the listener to include a query string.