Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xds: Add Outlier Detection configuration and CDS handling #5299

Merged
merged 6 commits into from
May 9, 2022

Conversation

zasweq
Copy link
Contributor

@zasweq zasweq commented Apr 6, 2022

This PR adds Outlier Detection configuration, with a placeholder OutlierDetectionBalancer that will be implemented in the future, and also the CDS handling of this (conversion from xdsclient representation of Outlier Detection data -> LB Configuration) as described in A50.
The next steps involve the Cluster Resolver handling of this configuration (Outlier Detection as top level configuration for the priority, current cluster_impl LB Configuration as child of this configuration). After Cluster Resolver handling comes the actual Outlier Detection LB Policy implementation. We won't be able to merge the Cluster Resolver changes until the Outlier Detection LB Policy is merged.

RELEASE NOTES: None

@zasweq zasweq requested a review from menghanl April 6, 2022 20:42
@zasweq zasweq added the Type: Feature New features or improvements in behavior label Apr 6, 2022
@zasweq zasweq added this to the 1.47 Release milestone Apr 6, 2022
// when an outlier status is detected through success rate statistics. This
// setting can be used to disable ejection or to ramp it up slowly. Defaults
// to 100.
EnforcingSuccessRate uint32 `json:"enforcingSuccessRate,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field name in the gRFC is enforcement_percentage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, moving target :/ from when I first implemented. Switched. (I will double check again)

// ejected when an outlier status is detected through failure percentage
// statistics. This setting can be used to disable ejection or to ramp it up
// slowly. Defaults to 0.
EnforcingFailurePercentage uint32 `json:"enforcingFailurePercentage,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also enforcement_percentage in the gRFC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched.

// algorithm. If set, failure rate ejections will be performed.
FailurePercentageEjection *FailurePercentageEjection `json:"failurePercentageEjection,omitempty"`
// ChildPolicy is the config for the child policy.
ChildPolicy/*[] why is this repeated in design?*/ *internalserviceconfig.BalancerConfig `json:"childPolicy,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the common practice to specify a list, and we will use the first valid one.
The BalancerConfig type you use here handles the list, so this field doesn't need to be a go slice: https://github.com/grpc/grpc-go/blob/master/internal/serviceconfig/serviceconfig.go#L65

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, noted. I remember Doug mentioning that now. Why is it designed as such?

@@ -224,11 +225,14 @@ func makeNewSubConn(ctx context.Context, edsCC balancer.ClientConn, parentCC *te
// the address attributes added as part of the intercepted NewSubConn() method
// indicate the use of fallback credentials.
func (s) TestSecurityConfigWithoutXDSCreds(t *testing.T) {
oldOutlierDetection := envconfig.XDSOutlierDetection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be moved to the setup functions? Like setupWithWatch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why do you need to make the new env var true for all the tests?
I was expecting one set of new tests that set this to true, but the others are untouched.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, fair enough. I added it to everything to have the no-op Outlier Detection LB config be generated. I'll leave the env var off for all these tests, and then send nil LB Config.

// Name is the name of the outlier detection balancer.
const Name = "outlier_detection_experimental"

func init() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you probably don't need this balancer (or register()) for this PR.
Only the config type should be good enough (and you will need to move that test function to a config_test.go file maybe).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Deleted.

@menghanl menghanl assigned zasweq and unassigned menghanl Apr 7, 2022
Copy link
Contributor Author

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments :D!

// when an outlier status is detected through success rate statistics. This
// setting can be used to disable ejection or to ramp it up slowly. Defaults
// to 100.
EnforcingSuccessRate uint32 `json:"enforcingSuccessRate,omitempty"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, moving target :/ from when I first implemented. Switched. (I will double check again)

// ejected when an outlier status is detected through failure percentage
// statistics. This setting can be used to disable ejection or to ramp it up
// slowly. Defaults to 0.
EnforcingFailurePercentage uint32 `json:"enforcingFailurePercentage,omitempty"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched.

// algorithm. If set, failure rate ejections will be performed.
FailurePercentageEjection *FailurePercentageEjection `json:"failurePercentageEjection,omitempty"`
// ChildPolicy is the config for the child policy.
ChildPolicy/*[] why is this repeated in design?*/ *internalserviceconfig.BalancerConfig `json:"childPolicy,omitempty"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, noted. I remember Doug mentioning that now. Why is it designed as such?

@@ -224,11 +225,14 @@ func makeNewSubConn(ctx context.Context, edsCC balancer.ClientConn, parentCC *te
// the address attributes added as part of the intercepted NewSubConn() method
// indicate the use of fallback credentials.
func (s) TestSecurityConfigWithoutXDSCreds(t *testing.T) {
oldOutlierDetection := envconfig.XDSOutlierDetection
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, fair enough. I added it to everything to have the no-op Outlier Detection LB config be generated. I'll leave the env var off for all these tests, and then send nil LB Config.

// Name is the name of the outlier detection balancer.
const Name = "outlier_detection_experimental"

func init() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Deleted.

Comment on lines +276 to +283
if od == nil {
// "If the outlier_detection field is not set in the Cluster message, a
// "no-op" outlier_detection config will be generated, with interval set
// to the maximum possible value and all other fields unset." - A50
return &outlierdetection.LBConfig{
Interval: 1<<63 - 1,
}
}
Copy link
Contributor

@menghanl menghanl Apr 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we will always have a outlier detection balancer, even when the xDS response doesn't specify the config? And the outlier detection balancer will always keep the counts and everything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, IIUC. (another reason I asked about increasing complexity of system on Monday). The no-op balancer takes away the functionality of the Outlier Detection balancer, and simply acts as a forwarder.

Copy link
Contributor

@menghanl menghanl Apr 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The no-op balancer

But is it a no-op balancer? Are you going to treat Interval:maxint differently?
Otherwise it's going to do all the counting, just that it will never use the numbers to eject any connections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I think I'll treat Interval:maxint the same, almost logically a no-op, but not quite. WDYT?

Copy link
Contributor

@menghanl menghanl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor comments. LGTM otherwise

@@ -654,15 +700,17 @@ func (s) TestClose(t *testing.T) {
// xdsClient, and makes sure that the CDS balancer registers a watch on the
// provided xdsClient.
xdsC, cdsB, edsB, _, cancel := setupWithWatch(t)
defer cancel()
defer func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 848 to 852
oldOutlierDetection := envconfig.XDSOutlierDetection
envconfig.XDSOutlierDetection = true
defer func() {
envconfig.XDSOutlierDetection = oldOutlierDetection
}()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to set env var for this test, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@menghanl menghanl assigned zasweq and unassigned menghanl May 9, 2022
@zasweq
Copy link
Contributor Author

zasweq commented May 9, 2022

Thanks for the comments :D!

@zasweq zasweq assigned menghanl and unassigned zasweq May 9, 2022
@menghanl menghanl assigned zasweq and unassigned menghanl May 9, 2022
@zasweq zasweq merged commit 462d867 into grpc:master May 9, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants