Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add logger to OutlierDetectionLoadBalancer #9880

Merged

Conversation

s-matyukevich
Copy link
Contributor

Right now there is no visibility at all into what OutlierDetection lb is doing.

This PR is intended to fix this problem by adding ChannelLogger to the load balancer. Here are the different types of message that will be logged (taken from my end-to-end test):

[2023-02-07 05:37:12 525] [FINEST ] [Channel<1>: (logs-backend-rpc-test-server.service-discovery.all-clusters.local-dc.fabric.dog:50051)] OutlierDetection lb created.
[2023-02-07 05:37:12 532] [FINEST ] [Channel<1>: (logs-backend-rpc-test-server.service-discovery.all-clusters.local-dc.fabric.dog:50051)] Received resolution result: ResolvedAddresses{addresses=[[[logs-backend-rpc-test-server.service-discovery.all-clusters.local-dc.fabric.dog/10.131.230.154:50051]/{}], [[logs-backend-rpc-test-server.service-discovery.all-clusters.local-dc.fabric.dog/10.135.101.119:50051]/{}], [[logs-backend-rpc-test-server.service-discovery.all-clusters.local-dc.fabric.dog/10.131.193.5:50051]/{}], [[logs-backend-rpc-test-server.service-discovery.all-clusters.local-dc.fabric.dog/10.135.43.140:50051]/{}]], attributes={io.grpc.internal.RetryingNameResolver.RESOLUTION_RESULT_LISTENER_KEY=io.grpc.internal.RetryingNameResolver$ResolutionResultListener@4612c26d}, loadBalancingPolicyConfig=io.grpc.util.OutlierDetectionLoadBalancer$OutlierDetectionLoadBalancerConfig@5c9a2c13} 
[2023-02-07 05:37:22 546] [FINEST ] [Channel<1>: (logs-backend-rpc-test-server.service-discovery.all-clusters.local-dc.fabric.dog:50051)] FailurePercentage algorithm detected outlier: AddressTracker{subchannels=[OutlierDetectionSubchannel{addresses=[[[logs-backend-rpc-test-server.service-discovery.all-clusters.local-dc.fabric.dog/10.135.101.119:50051]/{}]]}]}

Also, a similar message will be added for the SuccessRate algorithm.

[2023-02-07 05:37:22 546] [FINER  ] [Channel<1>: (logs-backend-rpc-test-server.service-discovery.all-clusters.local-dc.fabric.dog:50051)] Subchannel ejected: OutlierDetectionSubchannel{addresses=[[[logs-backend-rpc-test-server.service-discovery.all-clusters.local-dc.fabric.dog/10.135.101.119:50051]/{}]]} 

A similar message will be added for the uneject event.

I used FINEST level everywhere except for actual ejection/unejection events, which are logged with FINER.
I also added the list of affected addresses to every log message, because this allows us to easily find misbehaving upstream server.

@temawi temawi self-requested a review February 7, 2023 15:56
@temawi temawi added the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Feb 7, 2023
@grpc-kokoro grpc-kokoro removed the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Feb 7, 2023
Copy link
Contributor

@temawi temawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this. Generally looks good, I just had a question on one of the log statements.

@@ -814,6 +854,8 @@ public void ejectOutliers(AddressTrackerMap trackerMap, long ejectionTimeNanos)
// If the failure rate is above the threshold, we should eject...
double maxFailureRate = ((double)config.failurePercentageEjection.threshold) / 100;
if (tracker.failureRate() > maxFailureRate) {
logger.log(ChannelLogLevel.DEBUG,
"FailurePercentage algorithm detected outlier: {0}", tracker);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please get the parameters for the failure percentage algorithm logged the same way you did with success rate? After that I think we can get this merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. b360b89

I added failureRate parameter to the FailurePercentage algorithm log message, as this is the only dynamically calculated parameter. I also added successRate to the SuccessRate algorithm log message for consistency.

@s-matyukevich
Copy link
Contributor Author

Right now the CI is failing with javadoc: error - Error fetching URL: https://developers.google.com/protocol-buffers/docs/reference/java/ which I don't think is realted to my change. I rebased on master and got the same error.

@s-matyukevich s-matyukevich force-pushed the s-matyukevich/add-outlierdetection-logger branch from b360b89 to 3f2638c Compare February 9, 2023 20:45
Copy link
Contributor

@temawi temawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@temawi temawi added the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Feb 9, 2023
@grpc-kokoro grpc-kokoro removed the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Feb 9, 2023
@ejona86 ejona86 added the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Feb 14, 2023
@grpc-kokoro grpc-kokoro removed the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Feb 14, 2023
@temawi temawi merged commit 67d6600 into grpc:master Feb 14, 2023
@temawi
Copy link
Contributor

temawi commented Feb 14, 2023

Right now the CI is failing with javadoc: error - Error fetching URL: https://developers.google.com/protocol-buffers/docs/reference/java/ which I don't think is realted to my change. I rebased on master and got the same error.

@ejona pushed an empty commit to have the PR pick up the fix to the build from master. This is merged now, thanks for putting this together.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants