Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add [ThreadStatic] StringBuilder in JsonTextTokenizer #15794

Conversation

TrayanZapryanov
Copy link
Contributor

While profiling allocations in our code, I've found a lot of StringBuilder instances.
image

Changes add similar functionality to the dotnet StringBuilderCache class in JsonTextTokenizer.
They are not modifying existing logic and only will reduce relatively small text/number allocations by using cached instance.

@TrayanZapryanov TrayanZapryanov requested a review from a team as a code owner February 11, 2024 10:08
@TrayanZapryanov TrayanZapryanov requested review from jskeet and removed request for a team February 11, 2024 10:08
Copy link

google-cla bot commented Feb 11, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@TrayanZapryanov
Copy link
Contributor Author

CLA signed

@TrayanZapryanov
Copy link
Contributor Author

Is it okay if I push another PR for changing JsonToken to readonly struct?
It is also quite high in memory allocations and is prepared to be immutable.

@jskeet
Copy link
Contributor

jskeet commented Feb 11, 2024

Is it okay if I push another PR for changing JsonToken to readonly struct?
It is also quite high in memory allocations and is prepared to be immutable.

Certainly happy to see a PR - the fact that it's internal is helpful here.

(I'll try to review this PR some time next week. Looking very briefly though, I'd encourage you to consider ThreadLocal vs ThreadStatic; I was under the impression that these days ThreadLocal was generally preferred, but I could be wrong.)

@TrayanZapryanov
Copy link
Contributor Author

@jskeet I've mimic code from here StringBuilderCache and it is using [ThreadStatic]. If you prefer I can try with ThreadLocal, but this code is "battle tested" in runtime and also in our software :).

Copy link
Contributor

@jskeet jskeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative to consider would be to have a private nested StringBuilderCache class instead of adding these methods directly to the JsonTextTokenizer - that we wouldn't need to add "StringBuilder" into all the names.

csharp/src/Google.Protobuf/JsonTokenizer.cs Outdated Show resolved Hide resolved
csharp/src/Google.Protobuf/JsonTokenizer.cs Outdated Show resolved Hide resolved
csharp/src/Google.Protobuf/JsonTokenizer.cs Outdated Show resolved Hide resolved
csharp/src/Google.Protobuf/JsonTokenizer.cs Outdated Show resolved Hide resolved
csharp/src/Google.Protobuf/JsonTokenizer.cs Outdated Show resolved Hide resolved
@TrayanZapryanov
Copy link
Contributor Author

TrayanZapryanov commented Feb 12, 2024

Another alternative to consider would be to have a private nested StringBuilderCache class instead of adding these methods directly to the JsonTextTokenizer - that we wouldn't need to add "StringBuilder" into all the names.

@jskeet
In general I also prefer this to be extracted as a general class and used in on every place with new StringBuilder().
If you like I can expand changes in this direction and apply all the other comments.

@jskeet
Copy link
Contributor

jskeet commented Feb 12, 2024

In general I also prefer this to be extracted as a general class and used in on every place with new StringBuilder().
If you like I can expand changes in this direction and apply all the other comments.

I think it's worth extracting as a general class, but I'd start with it in JsonTextTokenizer rather than applying it to all uses of StringBuilder in the project. We can easily expand its usage further later.

@jskeet
Copy link
Contributor

jskeet commented Feb 12, 2024

Right, I'll fetch and try to do some benchmarking when I can. It may not be this week though.

@jskeet jskeet self-assigned this Feb 13, 2024
@jskeet jskeet added the c# label Feb 13, 2024
@jskeet
Copy link
Contributor

jskeet commented Feb 13, 2024

Benchmark results:

Before optimizations:

Method Mean Error StdDev Gen0 Gen1 Allocated
TestAllTypesProto3_Medium 84.52 us 1.639 us 2.013 us 8.0566 0.3662 49.39 KB

After stringbuilder:

Method Mean Error StdDev Gen0 Gen1 Allocated
TestAllTypesProto3_Medium 78.62 us 1.509 us 2.066 us 5.3711 0.2441 33.14 KB

So it looks like this is probably worth doing (and the benefit will very much depend on the JSON being parsed).
The current code introduces new warnings though, which should be fixed.

@TrayanZapryanov
Copy link
Contributor Author

@jskeet Where do you see "new warnings"?
As I cannot compile it - it is hard for me to see if compiler warns.
Also tests are somehow paused :
image

Maybe there is somewhere a check that should be set in order to be executed....(Check for Safety)

@jskeet
Copy link
Contributor

jskeet commented Feb 14, 2024

@jskeet Where do you see "new warnings"?

Where you're trying to use Nullable Reference Types when the project doesn't support it.

As I cannot compile it - it is hard for me to see if compiler warns.

It's going to be very hard for you to contribute if you're unable to compile the code. Can you not temporarily change the configuration on your machine to allow the use of nuget.org?

@jskeet
Copy link
Contributor

jskeet commented Feb 14, 2024

I've triggered the tests.

@jskeet jskeet added the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Feb 14, 2024
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Feb 14, 2024
@jskeet jskeet added the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Feb 15, 2024
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Feb 15, 2024
Copy link
Contributor

@jskeet jskeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still like @JamesNK to have a look at this before merging, in case he has thoughts. But otherwise, this looks fine to me.

@JamesNK
Copy link
Contributor

JamesNK commented Feb 15, 2024

I'm not fimilar with the pros and cons of thread static. I guess overall memory usage will be higher.

The code looks fine, although you should add a note that it is copied from https://github.com/dotnet/runtime/blob/72c488e37818e18557b47292f519489ea60af2f8/src/libraries/Common/src/System/Text/StringBuilderCache.cs#L7

@TrayanZapryanov
Copy link
Contributor Author

@JamesNK I've added note from where I've copied code.

Here it is a result with allocations once I've build this PS + JsonToken structure of the same benchmarked app.
image

In general it is 120MB less

@JamesNK
Copy link
Contributor

JamesNK commented Feb 15, 2024

I mean the steady state memory usage of the app will be higher. Cached StringBuilders stick around.

@TrayanZapryanov
Copy link
Contributor Author

@jskeet Do you need something more from my side? Or this is enough to be merged?

@jskeet
Copy link
Contributor

jskeet commented Mar 1, 2024

@jskeet Do you need something more from my side? Or this is enough to be merged?

Just time to get round to final checks and things. Sorry - I'm really busy and Protobuf is only one small part of my work, so it's hard to get round to things.

{
if (sb.Capacity <= MaxCachedStringBuilderSize)
{
cachedInstance = sb;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep the bigger of the two here? There's a chance the cached one has bigger capacity than the one being released and we might want to keep the bigger one to reduce the chances of line 758 evaluating to true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to not keep too big StringBuilder instances as they are static per thread and can increase total memory consumption of the app. At least this are considerations that people in .net runtime had. As written above the class - this is a copy of internal class used there.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm not saying we should keep anything that's above MaxCachedStringBuilderSize. I'm saying we should do something like the following code, to make sure that we keep cache the biggest string builder possible of the ones we have already allocated, within the MaxCachedStringBuilderSize boundaries.

cachedInstance = cachedInstance?.Capacity >= sb.Capacity ? cachedInstance : sb;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you check the code in Acquire - it clears cachedInstance and set it to null in happy path.
So when releasing cachedInstance will be null and there is no need to check it's capacity.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm refering to this scenario:

  • Acquiere sb of 50 capacity
  • Release sb of 50 capacity => sb of 50 is now cached
  • Acquire sb of 100 capacity => new sb of 100 is created, sb of 50 continues to be cached
  • Release sb of 100 capacity => sb of 100 is discarded, even though it's better to discard the 50 capacity and keep the 100 capacity
  • Acquire sb of 70 capacity => new sb of 70 capacity is created, sb of 50 continues to be cached, we could have reused the sb of 100 capacity

It's probably easy to write a test that demontrates this, unless I'm missunderstanding something. This seems like a simple change that would maximize the benefits of caching.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I understand your concerns and pushed suggested changes. Really nice catch. Maybe you should propose it also in runtime repo.

Copy link

@amanda-tarafa amanda-tarafa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!

@TrayanZapryanov
Copy link
Contributor Author

TrayanZapryanov commented Mar 6, 2024

Same benchmark as in here

main:
// * Summary *

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3235/23H2/2023Update/SunValley3)
11th Gen Intel Core i9-11900K 3.50GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.200
[Host] : .NET 8.0.2 (8.0.224.6711), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
DefaultJob : .NET 8.0.2 (8.0.224.6711), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Job=DefaultJob Error=3.84 ns StdDev=3.21 ns

Method Mean Allocated
Small 281.1 ns 1.01 KB
Big 256,221.4 ns 642.7 KB

PR:
Stringbuilder cache

Method Mean Allocated
Small 292.6 ns 512 B
Big 231,857.0 ns 260792 B

@jskeet jskeet added the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Mar 6, 2024
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Mar 6, 2024
@jskeet
Copy link
Contributor

jskeet commented Mar 6, 2024

@mkruskal-google: This has been reviewed from a C# standpoint, and I've just marked it safe to run tests again. If you could review it to kick off the merge process, that'd be great.

copybara-service bot pushed a commit that referenced this pull request Mar 6, 2024
COPYBARA_INTEGRATE_REVIEW=#15794 from TrayanZapryanov:cache_stringbuilder 596147e
PiperOrigin-RevId: 613251480
@jskeet
Copy link
Contributor

jskeet commented Mar 18, 2024

Implemented in fac929d

@jskeet jskeet closed this Mar 18, 2024
@TrayanZapryanov TrayanZapryanov deleted the cache_stringbuilder branch March 18, 2024 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants