-
Notifications
You must be signed in to change notification settings - Fork 745
Remove allocations in PartitionFilter + cut time by ~30% #4500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove allocations in PartitionFilter + cut time by ~30% #4500
Conversation
@manfred-brands I ran out of time to benchmark the thread local idea but had intended to push this to a shared please to make collaboration easier. Let me know if you'd like me to still do that |
Just shoutout and a big kudos to @stevenaw , this is some solid engineering work with benchmarks backing the solution and studied alternatives! 🚀 Generally I'd think that it would be enough to just run against NET 6.0 (or 8.0 later on) as it's not significant to compare how MS is doing between releases, just the comparison between full vs modern in this case at least. |
@stevenaw Thanks for your benchmarks. The default stack size is 1MB and the longest name would not be more than 1024 characters or 3kB bytes, so we could drop the whole array pool and only use As you know, I don't like changing two items at once, so I changed the baseline benchmark to also use the static SHA256 Using Benchmarks .NET6.0 different methodsResultsResult for .NET6.0 only as that is our target framework.
CodeCodeusing BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
using System;
using System.Buffers;
using System.Security.Cryptography;
using System.Text;
using System.Threading;
namespace HashPerformance
{
[MemoryDiagnoser]
[ShortRunJob(RuntimeMoniker.Net60)]
[HideColumns("Error", "StdDev", "Ratio", "RatioSD", "Job")]
public class PartitionFilterHashingBenchmark
{
private const int MaxStack = 256;
[Params(64, 128, 256)]
public int TestNameLength { get; set; }
#pragma warning disable CS8618 // Non-nullable field must contain a non-null value when exiting constructor. Consider declaring as nullable.
public string TestMethodName { get; set; }
#pragma warning restore CS8618 // Non-nullable field must contain a non-null value when exiting constructor. Consider declaring as nullable.
[GlobalSetup]
public void GlobalSetup()
{
TestMethodName = new string('A', TestNameLength);
}
[Benchmark]
public uint ComputeHashValueNonStatic()
{
return ComputeHashValue(TestMethodName);
static uint ComputeHashValue(string name)
{
using var hashAlgorithm = SHA256.Create();
// SHA256 ComputeHash will return 32 bytes, we will use the first 4 bytes of that to convert to an unsigned integer
var hashValue = hashAlgorithm.ComputeHash(Encoding.UTF8.GetBytes(name));
return BitConverter.ToUInt32(hashValue, 0);
}
}
[Benchmark(Baseline = true)]
public uint ComputeHashValue()
{
return ComputeHashValue(TestMethodName);
static uint ComputeHashValue(string name)
{
// SHA256 ComputeHash will return 32 bytes, we will use the first 4 bytes of that to convert to an unsigned integer
var hashValue = SHA256.HashData(Encoding.UTF8.GetBytes(name));
return BitConverter.ToUInt32(hashValue, 0);
}
}
[Benchmark]
public uint ComputeHashValue_ArrayPool()
{
return ComputeHashValue(TestMethodName);
static uint ComputeHashValue(string name)
{
var encoding = Encoding.UTF8;
var bufferLength = encoding.GetMaxByteCount(name.Length);
var buffer = ArrayPool<byte>.Shared.Rent(bufferLength);
try
{
var bytesWritten = encoding.GetBytes(name, buffer);
Span<byte> hashValue = stackalloc byte[32];
SHA256.HashData(new Span<byte>(buffer, 0, bytesWritten), hashValue);
return BitConverter.ToUInt32(hashValue[..4]);
}
finally
{
ArrayPool<byte>.Shared.Return(buffer);
}
}
}
[Benchmark]
public uint ComputeHashValue_ArrayPoolStackAlloc()
{
return ComputeHashValue(TestMethodName);
static uint ComputeHashValue(string name)
{
var encoding = Encoding.UTF8;
var bufferLength = encoding.GetMaxByteCount(name.Length);
byte[]? pooledBuffer = null;
var buffer = bufferLength <= MaxStack ? stackalloc byte[MaxStack] : (pooledBuffer = ArrayPool<byte>.Shared.Rent(bufferLength));
try
{
var bytesWritten = encoding.GetBytes(name, buffer);
Span<byte> hashValue = stackalloc byte[32];
SHA256.HashData(buffer[..bytesWritten], hashValue);
return BitConverter.ToUInt32(hashValue[..4]);
}
finally
{
if (pooledBuffer is not null)
ArrayPool<byte>.Shared.Return(pooledBuffer);
}
}
}
[Benchmark]
public uint ComputeHashValue_StackAlloc()
{
return ComputeHashValue(TestMethodName);
static uint ComputeHashValue(string name)
{
var encoding = Encoding.UTF8;
var bufferLength = encoding.GetMaxByteCount(name.Length);
Span<byte> buffer = stackalloc byte[bufferLength];
var bytesWritten = encoding.GetBytes(name, buffer);
Span<byte> hashValue = stackalloc byte[32];
SHA256.HashData(buffer[..bytesWritten], hashValue);
return BitConverter.ToUInt32(hashValue[..4]);
}
}
private readonly ThreadLocal<SHA256> Sha256 = new(() => SHA256.Create());
private readonly ThreadLocal<byte[]> Buffer = new(() => new byte[4096]);
[Benchmark]
public uint ComputeHashValue_ThreadLocal_Buffer()
{
return ComputeHashValue(TestMethodName);
uint ComputeHashValue(string name)
{
byte[] buffer = Buffer.Value!;
var bytesWritten = Encoding.UTF8.GetBytes(name, buffer);
Span<byte> hashValue = stackalloc byte[32];
SHA256.HashData(buffer.AsSpan(0, bytesWritten), hashValue);
return BitConverter.ToUInt32(hashValue[..4]);
}
}
[Benchmark]
public uint ComputeHashValue_ThreadLocal_BufferSha256()
{
return ComputeHashValue(TestMethodName);
uint ComputeHashValue(string name)
{
byte[] buffer = Buffer.Value!;
var bytesWritten = Encoding.UTF8.GetBytes(name, buffer);
byte[] hashValue = Sha256.Value!.ComputeHash(buffer, 0, bytesWritten);
return BitConverter.ToUInt32(hashValue, 0);
}
}
}
} .NET Framework vs .NET 6.0Results
CodeCode[Benchmark]
public uint ComputeHashValue_ThreadLocal_BufferSha256()
{
return ComputeHashValue(TestMethodName);
uint ComputeHashValue(string name)
{
byte[] buffer = Buffer.Value!;
var bytesWritten = Encoding.UTF8.GetBytes(name, 0, name.Length, buffer, 0);
byte[] hashValue = Sha256.Value!.ComputeHash(buffer, 0, bytesWritten);
return BitConverter.ToUInt32(hashValue, 0);
}
} |
Thanks @manfred-brands ! I'll try and reproduce that on my side. I'd still like to keep this PR net6-only for simplicity, but good to see the thread local side shows promise there too |
@manfred-brands I'm starting to consider finding myself the time to do the net462 side of this too - especially if the method of buffer pooling is thread local for both. Any concerns about storing a disposable class (SHA256) at class level in a thread local without the containing class implementing IDisposable? Or were you suggesting that PartitionFilter itself also implement the interface on net462? |
I'm not to worried about a few instances of If you want to do it nicely ..., that would allow for future filters that are The following classes need updating:
|
Thanks @manfred-brands . I just wanted to make sure we're on the same page. If you were also thinking the same, I'm inclined to keep this simple and avoid the Dispose() for now. As you say, it uses |
I've just pushed some changes which should closely align to the faster zero-alloc .NET6 benchmark of @manfred-brands |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides my one remark, looks good.
You might want to change the title of the PR as this is not limited to .NET6.0+
src/NUnitFramework/tests/Internal/Filters/PartitionFilterTests.cs
Outdated
Show resolved
Hide resolved
Thanks for the prompt feedback and approval @manfred-brands . I'm going to go ahead and merge |
Fixes #4489
This focuses just on .NET 6+ optimizations. I did this to allow different approaches to eventually be taken for net462 + net6 while also deferring any considerations about new dependencies we'd need to take on to support the net462 side.
The benchmarks below all use the one-shot hashing function (except the baseline). I tried benchmarking differences between that and the disposable API and didn't see any noticeable timing difference, only allocation savings (of about 200b) so I didn't include those in the final benchmarks. Similarly,
MemoryMarshal.Read<uint>()
didn't provide a noticeable difference in any way overBitConverter.ToUInt32
- if anything it was 1-2ns slower. I did include a[SkipLocalsInit]
benchmark though to show the 1-2ns savings likely aren't worth switching to unsafe compilation.Simple ArrayPooling seems to give the largest single benefit, though the 64 char length benchmarks below show a further 10-20ns could be saved by stackalloc'ing below a certain threshold. Given that we also need to stackalloc 32 bytes for the hashing buffer made me skeptical that we should target as high as 512 here. My PR targets 256 as an attempt at balancing this. It will still allow us to stackalloc for sizes of 85 characters or less.
Benchmarks
Results
See Code