Skip to content

Use buffer pooling when calculating partition filters #4489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stevenaw opened this issue Oct 10, 2023 · 12 comments · Fixed by #4500
Closed

Use buffer pooling when calculating partition filters #4489

stevenaw opened this issue Oct 10, 2023 · 12 comments · Fixed by #4500
Assignees
Milestone

Comments

@stevenaw
Copy link
Member

stevenaw commented Oct 10, 2023

The partition filter currently generates a new byte array for each method name. Given partitioning is most likely to be helpful (and thus most likely to be used) when splitting up a large suite, there's the potential for this to allocate quite a few times.

https://github.com/nunit/nunit/blob/master/src/NUnitFramework/framework/Internal/Filters/PartitionFilter.cs#L110

It might be worth looking into using ArrayPool<byte> or a single large buffer at instance level here instead of making a new one each time. The one-shot hashing helpers and stackalloc could be helpful here too on the .NET6 build.

I think this should be possible for all library targets but may require taking a dependency on System.Buffers on the net462 build if ArrayPool is used.

@nunit/framework-team thoughts or objections (particularly about the dependency this would add)?

@stevenaw
Copy link
Member Author

Perhaps stackalloc is overly ambitious on the net462 side. We could add the System.Memory package too to get Span<> support, but there's not really any APIs to target for it.

@stevenaw stevenaw changed the title Use buffer pooling or stackalloc when calculating partition filters Use buffer pooling when calculating partition filters Oct 10, 2023
@manfred-brands
Copy link
Member

manfred-brands commented Oct 11, 2023

@stevenaw I thought as you once and made a DotNetBenchmark for ArrayPool vs new and was disappointed. I couldn't find it anymore nor do I remember the exact use case. This array is very short lived and the overhead of the ArrayPool was a lot. Before you make any change make a DotNetBenchmark test for this situation as well.

I'm not sure, but I think that filters are only called from a single thread as part of building the test suite.
If that is correct, we can allocate a large buffer as well as an instance of the SHA256 class in the constructor.

@stevenaw
Copy link
Member Author

Oh that is a good idea about the single buffer thanks @manfred-brands !

I've done some tinkering to validate approach as a POC but nothing really substantial enough to assign this to myself. Absolutely, any change here should be accompanied by a benchmark

@stevenaw stevenaw self-assigned this Oct 11, 2023
@stevenaw
Copy link
Member Author

stevenaw commented Oct 11, 2023

I ran some benchmarks on net48 + net60 and I think the arraypool + oneshot hashing approach has promise.

netfx

  • Negligible timing difference
  • Memory-wise it saves about the same number of bytes as the string length. Memory still grows with string size

net60

  • Consistently about 150ns faster on net60
  • Flat memory profile of 56 bytes

There's still more gains to be found here, but I think it's worth pursuing at least on net6 +

| Method                     | Job                         | Runtime            | TestNameLength | Mean       | RatioSD | Gen0   | Allocated | Alloc Ratio |
|--------------------------- |---------------------------- |------------------- |--------------- |-----------:|--------:|-------:|----------:|------------:|
| ComputeHashValue           | ShortRun-.NET 6.0           | .NET 6.0           | 64             |   335.4 ns |    0.00 | 0.0520 |     328 B |        1.00 |
| ComputeHashValue_ArrayPool | ShortRun-.NET 6.0           | .NET 6.0           | 64             |   197.1 ns |    0.00 | 0.0088 |      56 B |        0.17 |
|                            |                             |                    |                |            |
|        |           |             |
| ComputeHashValue           | ShortRun-.NET Framework 4.8 | .NET Framework 4.8 | 64             | 1,814.3 ns |    0.00 | 0.1774 |    1123 B |        1.00 |
| ComputeHashValue_ArrayPool | ShortRun-.NET Framework 4.8 | .NET Framework 4.8 | 64             | 1,834.7 ns |    0.00 | 0.1640 |    1035 B |        0.92 |
|                            |                             |                    |                |            |
|        |           |             |
| ComputeHashValue           | ShortRun-.NET 6.0           | .NET 6.0           | 128            |   416.6 ns |    0.00 | 0.0625 |     392 B |        1.00 |
| ComputeHashValue_ArrayPool | ShortRun-.NET 6.0           | .NET 6.0           | 128            |   243.1 ns |    0.03 | 0.0086 |      56 B |        0.14 |
|                            |                             |                    |                |            |
|        |           |             |
| ComputeHashValue           | ShortRun-.NET Framework 4.8 | .NET Framework 4.8 | 128            | 2,690.6 ns |    0.00 | 0.1869 |    1188 B |        1.00 |
| ComputeHashValue_ArrayPool | ShortRun-.NET Framework 4.8 | .NET Framework 4.8 | 128            | 2,699.8 ns |    0.01 | 0.1640 |    1035 B |        0.87 |
|                            |                             |                    |                |            |
|        |           |             |
| ComputeHashValue           | ShortRun-.NET 6.0           | .NET 6.0           | 512            |   623.0 ns |    0.00 | 0.1230 |     776 B |        1.00 |
| ComputeHashValue_ArrayPool | ShortRun-.NET 6.0           | .NET 6.0           | 512            |   427.4 ns |    0.01 | 0.0086 |      56 B |        0.07 |
|                            |                             |                    |                |            |
|        |           |             |
| ComputeHashValue           | ShortRun-.NET Framework 4.8 | .NET Framework 4.8 | 512            | 6,590.7 ns |    0.00 | 0.2441 |    1573 B |        1.00 |
| ComputeHashValue_ArrayPool | ShortRun-.NET Framework 4.8 | .NET Framework 4.8 | 512            | 6,569.6 ns |    0.00 | 0.1602 |    1035 B |        0.66 |
public uint ComputeHashValue_ArrayPool()
    {
        var name = TestMethodName;
        var encoding = Encoding.UTF8;

        var bufferLength = encoding.GetMaxByteCount(name.Length);
        var buffer = ArrayPool<byte>.Shared.Rent(bufferLength);

        byte[] hashValue;

        try
        {
            var bytesWritten = encoding.GetBytes(name, 0, name.Length, buffer, 0);
#if NETFRAMEWORK
                // SHA256 ComputeHash will return 32 bytes, we will use the first 4 bytes of that to convert to an unsigned integer
                using var hashAlgorithm = SHA256.Create();
                hashValue = hashAlgorithm.ComputeHash(buffer, 0, bytesWritten);
#else
            hashValue = SHA256.HashData(new Span<byte>(buffer, 0, bytesWritten));
#endif
        }
        finally
        {
            ArrayPool<byte>.Shared.Return(buffer);
        }

        return BitConverter.ToUInt32(hashValue, 0);
    }

@lahma
Copy link
Contributor

lahma commented Oct 11, 2023

As an exception from hashing should be quite unlikely, you could save some CPU cycles from not having a try-catch.

@lahma
Copy link
Contributor

lahma commented Oct 11, 2023

If encoding.GetMaxByteCount(name.Length) returns some reasonable value (< 512?) you could just do a faster stackalloc against span, might require more preprocessing directives though.

@stevenaw
Copy link
Member Author

Thanks @lahma ! Indeed, some of your past contributions and the size of suites you've helped optimize Nunit for is partly what inspired me to look for optimizations here too

@manfred-brands
Copy link
Member

@lhama encoding.GetMaxByteCount(name.Length) returns just name.Length * 3 where 3 is MaxUtf8BytesPerChar
We know there is a practical upper limit of names, e.g. 1k, so you could use a 4k buffer.

@manfred-brands
Copy link
Member

@stevenaw Never change two parameters at once.
As there is no difference in netfx, is all your performance improvement in net6 run from the static Hash?
If so it means that the ArrayPool didn't give you any benefit, just made the code more complex.

@stevenaw
Copy link
Member Author

Very true @manfred-brands
Up until now it's been very experimental to see if it was worth more investigation so I put a few low-hanging fruit together to try to understand what bottlenecks there are and what approach to take to multi-targeting.
The more rigorous iterative "change something and measure" is still to come.

@stevenaw
Copy link
Member Author

As an aside, how important would thread safety be in this and other filters?

While filters are currently applied sequentially, there's always the possibility a higher-level optimization could change that. For example, within this filter each test's partition is calculated independently meaning a parallel divide and conquer approach could also theoretically apply here for very large suites. There would, of course, be ways we could work around any lack of thread safety but it still may be something worth considering too

@manfred-brands
Copy link
Member

Indeed a future TestSuiteBuilder could build fixtures/tests in parallel.

For the .net core build, I suggest to use stackalloc and the static hash method.
For the .net framework build, you could try ThreadLocal for both the global buffer and the hasher. Even in case of a single thread we would be re-using these and in case of multiple threads we would you a few more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants