fix: for perf improvement using fast random #49

kotharironak · 2023-08-31T16:26:27Z

Description

Java's UUID.randomUUID uses SecureRandom, which internally relies on the system's /dev/random. This method requires a sufficient amount of entropy to generate random numbers and may lead to blocking.

In the present context, we are focused on generating request IDs. Therefore, a faster UUID generator based on ThreadLocalRandom should be adequate.

For a similar use case, the OpenTelemetry system also utilizes a trace ID based on ThreadRandomLocal, serving as a reference.

As part of this PR, using fastRadom for request-context.

github-actions · 2023-08-31T16:28:47Z

Test Results

70 tests ±0 70 ✔️ ±0 16s ⏱️ -7s
12 suites ±0   0 💤 ±0
12 files ±0   0 ❌ ±0

Results for commit ed3a0c5. ± Comparison against base commit cb3600a.

♻️ This comment has been updated with latest results.

kotharironak · 2023-08-31T16:29:16Z

grpc-context-utils/src/main/java/org/hypertrace/core/grpcutils/context/RequestContext.java

    return new RequestContext()
        .put(RequestContextConstants.TENANT_ID_HEADER_KEY, tenantId)
-        .put(RequestContextConstants.REQUEST_ID_HEADER_KEY, UUID.randomUUID().toString());
+        .put(RequestContextConstants.REQUEST_ID_HEADER_KEY, UuidCreator.getRandomBasedFast().toString());


Ref usage of ThreadLocalRandom:

https://github.com/f4b6a3/uuid-creator/blob/3b6471642e95ca91901ce5efe817537aa0d1068b/src/main/java/com/github/f4b6a3/uuid/UuidCreator.java#L174

https://github.com/f4b6a3/uuid-creator/blob/3b6471642e95ca91901ce5efe817537aa0d1068b/src/main/java/com/github/f4b6a3/uuid/UuidCreator.java#L1265

https://github.com/f4b6a3/uuid-creator/blob/3b6471642e95ca91901ce5efe817537aa0d1068b/src/main/java/com/github/f4b6a3/uuid/factory/AbstRandomBasedFactory.java#L84

My concerns here aren't the new impl but

Adding a new dependency to a base lib like this impacts the dependency tree of every single service

This is just covering up an upstream problem - that we're generating request contexts inappropriately - many times for a single "request".

Adding a new dependency to a base lib like this impacts the dependency tree of every single service

I can extract out the function from this library based on ThreadLocalRandom.

This is just covering up an upstream problem - that we're generating request contexts inappropriately - many times for a single "request".

I didn't fully get this. Can you elaborate?

I didn't fully get this. Can you elaborate?

The reason this came up - at least the reason I'm aware of - as an issue is because ingester is generating many request contexts for a single trace ingestion. Each time it needs a context, it generates a new one. Because this now occurs across many enrichers and many threads in parallel, we're running into these performance problems. But not only is that unnecessary, it's actually wrong - we want the same ID to be shared for ingesting the same trace in the first place.

A couple different approaches there would be to either

update the enricher interface to accept the context, and generate it up front before we delegate the enrichment off to an async thread.

Create a request context from the trace details - use the trace id as the "request id". This would probably be defined in the ingestion pipeline repo.

The request-id attribute is attached to the grpc-context while making a downstream call. This also seems to be getting added only for RXJava calls, if someone is not using RXJava, this request-id is not attached to the call.
How is this request-id used by the downstream services? Given that its attached only for RXJava based calls and not for all GRPC calls, do we even need to attach this request-id?

The reason this came up - at least the reason I'm aware of - as an issue is because ingester is generating many request contexts for a single trace ingestion. Each time it needs a context, it generates a new one. Because this now occurs across many enrichers and many threads in parallel, we're running into these performance problems. But not only is that unnecessary, it's actually wrong - we want the same ID to be shared for ingesting the same trace in the first place.

A couple different approaches there would be to either

update the enricher interface to accept the context, and generate it up front before we delegate the enrichment off to an async thread.
Create a request context from the trace details - use the trace id as the "request id". This would probably be defined in the ingestion pipeline repo.

I agree that for tracing the processing of StructureTrace, it's important to establish a context based on unique keys or attributes that can be used to generate a trace ID. And if request_id is primarily used internally to link outgoing requests to the same trace, we can consider option (2) having new request context based trace.

However, for any other consumers of this interface forTenantId, do we really need a request_id generated from a secure random source? Since this is primarily for tracing purposes and not for resource identification, I think it's more practical to use a fast UUID-based request_id generated with ThreadLocalRandom.

I can extract out the function from this library based on ThreadLocalRandom.

If we don't plan to use the above library here. I am thinking of having our own function like below:

public static UUID generateFastRandomUUID() { long mostSigBits = ThreadLocalRandom.current().nextLong(); long leastSigBits = ThreadLocalRandom.current().nextLong(); // Set the version (4) For random UUID mostSigBits &= 0xFFFFFFFFFFFF0FFFL; mostSigBits |= 0x0000000000004000L; // Set variant to RFC 4122 leastSigBits &= 0x3FFFFFFFFFFFFFFFL; leastSigBits |= 0x8000000000000000L; return new UUID(mostSigBits, leastSigBits); }

Few stacks showing up the contention at jdk level

codecov · 2023-08-31T16:34:59Z

Codecov Report

Merging #49 (ed3a0c5) into main (cb3600a) will increase coverage by 0.30%.
The diff coverage is 90.00%.

@@             Coverage Diff              @@
##               main      #49      +/-   ##
============================================
+ Coverage     74.57%   74.88%   +0.30%     
- Complexity      145      146       +1     
============================================
  Files            20       21       +1     
  Lines           409      418       +9     
  Branches         22       22              
============================================
+ Hits            305      313       +8     
- Misses           85       86       +1     
  Partials         19       19

Flag	Coverage Δ
unit	`74.88% <90.00%> (+0.30%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed	Coverage Δ
...race/core/grpcutils/context/FastUUIDGenerator.java	`87.50% <87.50%> (ø)`
...ertrace/core/grpcutils/context/RequestContext.java	`75.00% <100.00%> (+0.28%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

aaron-steinfeld · 2023-09-01T12:04:41Z

The request-id attribute is attached to the grpc-context while making a downstream call. This also seems to be getting added only for RXJava calls, if someone is not using RXJava, this request-id is not attached to the call.
How is this request-id used by the downstream services? Given that its attached only for RXJava based calls and not for all GRPC calls, do we even need to attach this request-id?

This is unrelated to rxjava. RequestContext and everything it contains (tenant id, request id, auth context where applicable) is used by all of our grpc calls (a handful of which are themselves managed by rxjava). This request ID is used to correlate errors and logs across services.

I agree that for tracing the processing of StructureTrace, it's important to establish a context based on unique keys or attributes that can be used to generate a trace ID. And if request_id is primarily used internally to link outgoing requests to the same trace, we can consider option (2) having new request context based trace.

The unique key is just the trace id itself though, right? Either way I think it would make sense to update the enricher interface so however we generate the request context is immaterial to the enrichers, it's just a question of level of effort. If that's painful, then having a locally consistent version that does something like buildRequestContextForTrace(structuredTrace) seems reasonable, since that's what we should be doing at the beginning of the enricher regardless.

However, for any other consumers of this interface forTenantId, do we really need a request_id generated from a secure random source? Since this is primarily for tracing purposes and not for resource identification, I think it's more practical to use a fast UUID-based request_id generated with ThreadLocalRandom.

So here is where I think we may be getting too far in the weeds. I agree, it doesn't need to be secure and am certainly glad to have a thread local implementation (yours looks fine) - but I think it becomes an unnecessary optimization if we fix the request context generation itself. We're fixing a symptom of an upstream bug rather than the bug itself.

Few stacks showing up the contention at jdk level

Right, this is the one coming from the context generation being inside the async threads rather than up front. Edited to remove internal details.

laxmanchekka · 2023-09-01T12:14:16Z

if we fix the request context generation itself

GRPC request context generation and implementation is the responsibility of the grpc framework itself. Why do we want to push this responsibility down to the grpc client applications and leak/duplicate this code everywhere.

While I agree that trace enricher refactoring is needed for several other good reasons. Definitely not worth for this one.

laxmanchekka · 2023-09-01T12:16:44Z

@kotharironak : I vote for a simple util based implementation without a new third party dependency for this alone.

aaron-steinfeld · 2023-09-01T12:48:17Z

GRPC request context generation and implementation is the responsibility of the grpc framework itself. Why do we want to push this responsibility down to the grpc client applications and leak/duplicate this code everywhere.

I would still leave it there, it's just a new constructor allowing callers to bring their own ID when that makes sense.

  public static RequestContext forTenantId(String tenantId) {
    return RequestContext.forTenantAndRequestIds(tenantId, UUID.randomUUID().toString()); // This would be the added api
  }

There's no real duplicate code, as this only impacts clients in that they're adding the extra arg from data they already have where applicable. And that's unknowable data to the framework.

While I agree that trace enricher refactoring is needed for several other good reasons. Definitely not worth for this one.

We can choose how major of a refactor we want this to be. The interface can be left as is or updated, but we can't fix the actual bug here without changing the duplicated RequestContext.forTenantId calls. We could fix the thread contention with the secure random change, but not the broken correlation. IMO, the minimum/maximum effort here isn't really that significant - on the scale of hours.

mihirgt · 2023-09-01T13:01:23Z

Given that UUID.randomUUID takes a lock and causes thread contention, we should replace it with a a ThreadLocalRandom based implementation. This will help in general.
In follow up PRs we can fix the callers to fix calling context in trace enricher.

aaron-steinfeld · 2023-09-01T13:31:48Z

Given that UUID.randomUUID takes a lock and causes thread contention, we should replace it with a a ThreadLocalRandom based implementation. This will help in general. In follow up PRs we can fix the callers to fix calling context in trace enricher.

I just don't want us to lose sight of the root cause here, but otherwise no issues with that.

kotharironak · 2023-09-04T09:09:46Z

Given that UUID.randomUUID takes a lock and causes thread contention, we should replace it with a a ThreadLocalRandom based implementation. This will help in general. In follow up PRs we can fix the callers to fix calling context in trace enricher.

I just don't want us to lose sight of the root cause here, but otherwise no issues with that.

@laxmanchekka @mihirgt @aaron-steinfeld
I have added the static function, and removes the usage of third-party lib.
I will work on separate PR for context change for enricher.

laxmanchekka · 2023-09-04T09:48:58Z

grpc-context-utils/src/main/java/org/hypertrace/core/grpcutils/context/UuidGenerator.java

+   * the default randomUUID method that relies on /dev/random. It's suitable for most random UUID
+   * needs.
+   */
+  public static UUID generateFastRandomUUID() {


Suggested change

public static UUID generateFastRandomUUID() {

public static UUID randomUUID() {

laxmanchekka · 2023-09-04T09:49:35Z

grpc-context-utils/src/main/java/org/hypertrace/core/grpcutils/context/UuidGenerator.java

+import java.util.UUID;
+import java.util.concurrent.ThreadLocalRandom;
+
+public class UuidGenerator {


Suggested change

public class UuidGenerator {

public class FastUUIDGenerator {

aaron-steinfeld · 2023-09-04T13:33:28Z

grpc-context-utils/src/main/java/org/hypertrace/core/grpcutils/context/FastUUIDGenerator.java

+import java.util.UUID;
+import java.util.concurrent.ThreadLocalRandom;
+
+public class FastUUIDGenerator {


Does this need to be public? Doesn't need a new release, but next time you're in here can you move this to package private since it's an internal impl detail.

Initially thought of it to our own lib so it can be used. But, I think, we can make it private, anywhere else we can use the third-party library that is already there in most of the projects.

@laxmanchekka @aaron-steinfeld https://github.com/hypertrace/java-grpc-utils/pull/50/files

fix: for perf improvemet using fast random

38458b5

kotharironak requested a review from a team as a code owner August 31, 2023 16:26

kotharironak requested review from satish-mittal, laxmanchekka and mihirgt August 31, 2023 16:26

kotharironak commented Aug 31, 2023

View reviewed changes

fix spotless issues

395b637

kotharironak requested a review from aaron-steinfeld September 1, 2023 10:13

addressed the latest comments

1ebb589

need to extend supress warning for jackson lib

0566d8f

laxmanchekka reviewed Sep 4, 2023

View reviewed changes

addressed the comments

803ee86

laxmanchekka previously approved these changes Sep 4, 2023

View reviewed changes

pushing refactored code

ed3a0c5

kotharironak dismissed laxmanchekka’s stale review via ed3a0c5 September 4, 2023 10:21

laxmanchekka approved these changes Sep 4, 2023

View reviewed changes

kotharironak merged commit c303dc0 into main Sep 4, 2023

kotharironak deleted the fast-random-uuid branch September 4, 2023 10:38

aaron-steinfeld reviewed Sep 4, 2023

View reviewed changes

kotharironak mentioned this pull request Sep 4, 2023

chore: made fast UUID to internal #50

Merged

	public static UUID generateFastRandomUUID() {
	public static UUID randomUUID() {

	public class UuidGenerator {
	public class FastUUIDGenerator {

fix: for perf improvement using fast random #49

fix: for perf improvement using fast random #49

Uh oh!

Conversation

kotharironak commented Aug 31, 2023

Description

Uh oh!

github-actions bot commented Aug 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

kotharironak Aug 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

laxmanchekka Sep 1, 2023 • edited by aaron-steinfeld Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Aug 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

aaron-steinfeld commented Sep 1, 2023

Uh oh!

laxmanchekka commented Sep 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

laxmanchekka commented Sep 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aaron-steinfeld commented Sep 1, 2023

Uh oh!

mihirgt commented Sep 1, 2023

Uh oh!

aaron-steinfeld commented Sep 1, 2023

Uh oh!

kotharironak commented Sep 4, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Aug 31, 2023 •

edited

Loading

kotharironak Aug 31, 2023 •

edited

Loading

laxmanchekka Sep 1, 2023 •

edited by aaron-steinfeld

Loading

codecov bot commented Aug 31, 2023 •

edited

Loading

laxmanchekka commented Sep 1, 2023 •

edited

Loading

laxmanchekka commented Sep 1, 2023 •

edited

Loading