Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve GetUri #2947

Merged
merged 5 commits into from
Mar 9, 2022
Merged

Improve GetUri #2947

merged 5 commits into from
Mar 9, 2022

Conversation

Tornhoof
Copy link
Contributor

The current GetUri method looks very similar to the GetDisplayUrl() extension method in ASP.NET Core. For that method an open issue exists dotnet/aspnetcore#28906 to improve performance and allocations.
In ASP.NET Core the method can be called by the developer to get an url for display purposes, in the Open Telemetry Listener it is called for each and every request with IsAllDataRequested. So the allocation impact is higher here than in ASP.NET Core.

This PR replaces the existing method with two code paths, one for .NET Standard 2.0 and one for 2.1. The 2.0 code path is the old method with added length calculation to the StringBuilder. The new 2.1 code path uses string.Create to directly create the Display Url based on the calculated length and the supplied values. It is slightly different from the suggested method in the issue above as not everything is available in .NET Standard 2.1.

I changed the tests for the listener to include a query string.

@Tornhoof Tornhoof requested a review from a team as a code owner February 25, 2022 10:04
}
});
#else
return new StringBuilder(length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly related to this PR - would a ThreadLocal StringBuilder (or something from a pool) help to reduce allocations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

var length = scheme.Length + Uri.SchemeDelimiter.Length + host.Length + pathBase.Length
+ path.Length + queryString.Length;
#if NETSTANDARD2_1
return string.Create(length, (scheme, host, pathBase, path, queryString), (span, parts) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would hoist the delegate to a static field. Eg:

#if NETSTANDARD2_1
		private readonly static SpanAction<char, (string scheme, string host, string pathBase, string path, string queryString)> s_CreateStringFunc = (span, state) =>
		{
                CopyTo(ref span, parts.scheme);
                CopyTo(ref span, Uri.SchemeDelimiter);
                CopyTo(ref span, parts.host);
                CopyTo(ref span, parts.pathBase);
                CopyTo(ref span, parts.path);
                CopyTo(ref span, parts.queryString);

                static void CopyTo(ref Span<char> buffer, ReadOnlySpan<char> text)
                {
                    if (!text.IsEmpty)
                    {
                        text.CopyTo(buffer);
                        buffer = buffer.Slice(text.Length);
                    }
                }
		};
#endif

Invoke like this:

return string.Create(length, (scheme, host, pathBase, path, queryString), s_CreateStringFunc);

Why do that?

As it is written we are relying on the compiler to do the right thing and not allocate a delegate for each invocation. Sometimes it does the right thing, sometimes it does not. It is a bit mysterious to me as to the reasons for that. But even if it does decide to do the right thing, the emitted IL will be to check if it has created a static, and if not, create the static, and then do the invocation. So it is a bit better perf to just make the static initially and avoid the check on each call.

Copy link
Contributor Author

@Tornhoof Tornhoof Feb 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mysterious?, it depends on capturing variables, that's why you can now have static expressions and static local methods etc. You're correct that the current code will do a null check against the static method and initiate if required.

I still think it's a bad idea, because the static field would be located (if I follow the other source codes with static fields, including this type) up above the constructor, between other fields:

private const string UnknownHostName = "UNKNOWN-HOST";
private static readonly Func<HttpRequest, string, IEnumerable<string>> HttpRequestHeaderValuesGetter = (request, name) => request.Headers[name];
private readonly PropertyFetcher<HttpContext> startContextFetcher = new("HttpContext");

And that reduces the readability of the class way too much.

As for performance:
The null check could be elided by Tier2 compilation with static/dynamic pgo, I don't know if that happens here though. The performance difference is not high, the noise in my benchmark runs is higher (I've seen everything between 1ns to 3ns difference):

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=6.0.200
  [Host]     : .NET 6.0.2 (6.0.222.6406), X64 RyuJIT
  DefaultJob : .NET 6.0.2 (6.0.222.6406), X64 RyuJIT

Method Mean Error StdDev Gen 0 Allocated
StaticLocalMethod 26.71 ns 0.123 ns 0.115 ns 0.0043 72 B
StringBuilder 53.75 ns 0.425 ns 0.355 ns 0.0167 280 B
StringBuilderFixedLength 35.88 ns 0.311 ns 0.290 ns 0.0114 192 B
StaticFunctor 25.97 ns 0.071 ns 0.066 ns 0.0043 72 B
// See https://aka.ms/new-console-template for more information

using System.Text;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkRunner.Run<StringCreate>();

[MemoryDiagnoser()]
public class StringCreate
{

	[Benchmark]
	public string StaticLocalMethod()
	{
		var scheme = "http";
		var host = "localhost";
		var pathBase = string.Empty;
		var path = "/api/foo";
		var queryString = string.Empty;
		var length = scheme.Length + Uri.SchemeDelimiter.Length + host.Length + pathBase.Length
		             + path.Length + queryString.Length;
		return string.Create(length, (scheme, host, pathBase, path, queryString), (span, parts) =>
		{
			CopyTo(ref span, parts.scheme);
			CopyTo(ref span, Uri.SchemeDelimiter);
			CopyTo(ref span, parts.host);
			CopyTo(ref span, parts.pathBase);
			CopyTo(ref span, parts.path);
			CopyTo(ref span, parts.queryString);

			static void CopyTo(ref Span<char> buffer, ReadOnlySpan<char> text)
			{
				if (!text.IsEmpty)
				{
					text.CopyTo(buffer);
					buffer = buffer.Slice(text.Length);
				}
			}
		});
    }

	private static readonly System.Buffers.SpanAction<char, (string scheme, string host, string pathBase, string path, string queryString)> s_CreateStringFunc = (span, parts) =>
	{
		CopyTo(ref span, parts.scheme);
		CopyTo(ref span, Uri.SchemeDelimiter);
		CopyTo(ref span, parts.host);
		CopyTo(ref span, parts.pathBase);
		CopyTo(ref span, parts.path);
		CopyTo(ref span, parts.queryString);

		static void CopyTo(ref Span<char> buffer, ReadOnlySpan<char> text)
		{
			if (!text.IsEmpty)
			{
				text.CopyTo(buffer);
				buffer = buffer.Slice(text.Length);
			}
		}
	};

	[Benchmark]
	public string StringBuilder()
	{
		var scheme = "http";
		var host = "localhost";
		var pathBase = string.Empty;
		var path = "/api/foo";
		var queryString = string.Empty;
		return new StringBuilder()
			.Append(scheme)
			.Append(Uri.SchemeDelimiter)
			.Append(host)
			.Append(pathBase)
			.Append(path)
			.Append(queryString)
			.ToString();
	}

	[Benchmark]
	public string StringBuilderFixedLength()
	{
		var scheme = "http";
		var host = "localhost";
		var pathBase = string.Empty;
		var path = "/api/foo";
		var queryString = string.Empty;
		var length = scheme.Length + Uri.SchemeDelimiter.Length + host.Length + pathBase.Length
		             + path.Length + queryString.Length;
		return new StringBuilder(length)
			.Append(scheme)
			.Append(Uri.SchemeDelimiter)
			.Append(host)
			.Append(pathBase)
			.Append(path)
			.Append(queryString)
			.ToString();
	}

	[Benchmark]
	public string StaticFunctor()
	{
		var scheme = "http";
		var host = "localhost";
		var pathBase = string.Empty;
		var path = "/api/foo";
		var queryString = string.Empty;
		var length = scheme.Length + Uri.SchemeDelimiter.Length + host.Length + pathBase.Length
		             + path.Length + queryString.Length;
		return string.Create(length, (scheme, host, pathBase, path, queryString), s_CreateStringFunc);
	}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mysterious?, it depends on capturing variables,

A local function with captures is always going to allocate. The mysterious part I was referring to is sometimes it will allocate a delegate even for static local functions.

Don't believe me? Check this out...

        private static readonly Action<(string, int)> FieldCallback = ((string Name, int Id) state) => { };

        public static void CallDoSomethingNoAllocation()
        {
            DoSomething(FieldCallback, ("mystate", 0));
        }

        public static void CallDoSomethingAllocation()
        {
            DoSomething(LocalCallback, ("mystate", 0));

            static void LocalCallback((string Name, int Id) state)
            {
            }
        }

        public static void DoSomething<T>(Action<T> callback, T state)
        {
            callback(state);
        }

CallDoSomethingNoAllocation compiles to...

    IL_0000: ldsfld       class [System.Runtime]System.Action`1<valuetype [System.Runtime]System.ValueTuple`2<string, int32>> OpenTelemetry.Benchmarks.Program::FieldCallback
    IL_0005: ldstr        "mystate"
    IL_000a: ldc.i4.0
    IL_000b: newobj       instance void valuetype [System.Runtime]System.ValueTuple`2<string, int32>::.ctor(!0/*string*/, !1/*int32*/)
    IL_0010: call         void OpenTelemetry.Benchmarks.Program::DoSomething<valuetype [System.Runtime]System.ValueTuple`2<string, int32>>(class [System.Runtime]System.Action`1<!!0/*valuetype [System.Runtime]System.ValueTuple`2<string, int32>*/>, !!0/*valuetype [System.Runtime]System.ValueTuple`2<string, int32>*/)

CallDoSomethingAllocation compiles to...

    IL_0000: ldnull
    IL_0001: ldftn        void OpenTelemetry.Benchmarks.Program::'<CallDoSomethingAllocation>g__LocalCallback|2_0'(valuetype [System.Runtime]System.ValueTuple`2<string, int32>)
    IL_0007: newobj       instance void class [System.Runtime]System.Action`1<valuetype [System.Runtime]System.ValueTuple`2<string, int32>>::.ctor(object, native int)
    IL_000c: ldstr        "mystate"
    IL_0011: ldc.i4.0
    IL_0012: newobj       instance void valuetype [System.Runtime]System.ValueTuple`2<string, int32>::.ctor(!0/*string*/, !1/*int32*/)
    IL_0017: call         void OpenTelemetry.Benchmarks.Program::DoSomething<valuetype [System.Runtime]System.ValueTuple`2<string, int32>>(class [System.Runtime]System.Action`1<!!0/*valuetype [System.Runtime]System.ValueTuple`2<string, int32>*/>, !!0/*valuetype [System.Runtime]System.ValueTuple`2<string, int32>*/)

Why did CallDoSomethingAllocation decide to allocate a delegate? Who knows! That's the mystery I'm referring to. Something to do with the generic, I think.

Anyway from what I have seen the pattern I'm recommending always works to bypass allocations and has a speed benefit so for me the tradeoff in readability is worth it 😄

Copy link
Contributor Author

@Tornhoof Tornhoof Feb 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newer Version includes dotnet/roslyn#58288 which is what you're missing in the other type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tornhoof Thanks for the link. You are kind of making my point for me 😄 It is a magical thing that requires (currently) experimental compiler features to work sometimes (there are limitations on that PR). I prefer to use the fail-safe pattern (which also has the nice benefit of being faster).

Copy link
Contributor Author

@Tornhoof Tornhoof Feb 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case your problem does not apply, my link just shows that the behavior you're seeing is fixed for C#11 for the work on caching delegates for method group conversions, because that's what your code does.
If you do it without method group call conversion, e.g. DoSomething(q => LocalCallback(q), ("mystate", 0)); it will generate the ldsfld correctly even in old compiler versions (including netfx).
My link was just to tell you that you can skip doing the static functor dance in C#11 and use method groups without them to your heart's content.

In case of the span code, it never did that as it is not a method group conversion call, but a proper functor call with arguments.

As I've shown that the performance difference is negligable, the only remaining point is readability. And I highly value readability of a class over a minor perf enhancement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To unblock, it'll propose to merge this now, and create a new issue tracking further improvements/discussions on this.

This is an internal implementation detail, and involves no public API change, so we can change it anytime.

@Tornhoof
Copy link
Contributor Author

Could anyone with maintainer access trigger the workflows again, to see if the warning with the unused using is gone with my changes?

@codecov
Copy link

codecov bot commented Feb 28, 2022

Codecov Report

Merging #2947 (f0e8c1b) into main (8b8b934) will increase coverage by 0.01%.
The diff coverage is 100.00%.

❗ Current head f0e8c1b differs from pull request most recent head d54a24b. Consider uploading reports for the commit d54a24b to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2947      +/-   ##
==========================================
+ Coverage   84.14%   84.16%   +0.01%     
==========================================
  Files         258      258              
  Lines        9090     9093       +3     
==========================================
+ Hits         7649     7653       +4     
+ Misses       1441     1440       -1     
Impacted Files Coverage Δ
...tation.AspNetCore/Implementation/HttpInListener.cs 88.81% <100.00%> (+2.38%) ⬆️
...emetry.Api/Internal/OpenTelemetryApiEventSource.cs 76.47% <0.00%> (-5.89%) ⬇️

@github-actions
Copy link
Contributor

github-actions bot commented Mar 8, 2022

This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day.

@github-actions github-actions bot added the Stale label Mar 8, 2022
@Tornhoof
Copy link
Contributor Author

Tornhoof commented Mar 9, 2022

Anyone interested in merging this PR before the stale bot hits again?

@cijothomas
Copy link
Member

@Tornhoof Could you add a line to changelog.md file, as this is user visible change (in terms of reduced alloc/improved perf.)

@cijothomas
Copy link
Member

Merging. This is a nice improvement. There is an unresolved comment from @CodeBlanch , but it is not a blocker, and nice to follow up separately.

@cijothomas cijothomas merged commit af6a5d2 into open-telemetry:main Mar 9, 2022
@Tornhoof Tornhoof deleted the ImproveGetUri branch March 9, 2022 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants