Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xds: Add support for Custom LB Policies #6224

Merged
merged 11 commits into from May 9, 2023

Conversation

zasweq
Copy link
Contributor

@zasweq zasweq commented Apr 26, 2023

This PR adds support for Custom LB Policies, configured through xDS. This implements the full functionality defined in https://github.com/grpc/proposal/blob/master/A52-xds-custom-lb-policies.md.

Summary of changes:

  • Delete the old LBPolicy field and switch it to new one in the xDS Client -> cds_balancer -> cluster_resolver flow
  • In the cluster resolver, just send this config downward, rather than have a branch of different logic for ring hash and round robin. Consolidate and add logic to addresses passed down. Previously it was different for round robin vs. ring hash. Add locality weight as an attribute, and now always set ew to lw * ew, and set the full hierarchy path regardless.
  • Complete the implementation for wrr_locality balancer, which only function is to prepare weighted_target configuration for it’s child. Synchronization was trivial here, as not a Client Conn, and balancer.Balancer operations are guaranteed to be called synchronously. Child balancer built at build time, since never expected to change, and wrr_locality is simply a wrapper on the child balancer.
  • Force a state update (which will be Transient Failure) from the Weighted Target balancer if passed in a config with no targets. This fixes a bug, where previously Transient Failure was not being reported in this case and failover was not happening correctly.
  • Improved String() method on resolver.Address. Prints only the non deprecated fields, and has support for printing Attributes now (in order to trigger, the key or value of a specific attribute has to implement the fmt.Stringer interface). This was added for easier debuggability for failing cluster_resolver tests.

RELEASE NOTES:

  • xds: Add support for Custom LB Policies as per gRFC A52

Copy link
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small pass. Figured that I'm going to be able to get to everything in one pass. So, this is going to involve a bit of back and forth. Sorry about that.

attributes/attributes.go Show resolved Hide resolved
balancer/weightedtarget/weightedtarget.go Outdated Show resolved Hide resolved
balancer/weightedtarget/weightedtarget.go Show resolved Hide resolved
@easwars easwars assigned zasweq and unassigned easwars May 4, 2023
@easwars easwars modified the milestones: 1.55 Release, 1.56 Release May 4, 2023
@zasweq zasweq assigned easwars and unassigned zasweq May 4, 2023
attributes/attributes.go Outdated Show resolved Hide resolved
balancer/weightedtarget/weightedtarget_test.go Outdated Show resolved Hide resolved
balancer/weightedtarget/weightedtarget_test.go Outdated Show resolved Hide resolved
resolver/resolver.go Outdated Show resolved Hide resolved
resolver/resolver.go Outdated Show resolved Hide resolved
resolver/resolver.go Outdated Show resolved Hide resolved
xds/internal/balancer/cdsbalancer/cdsbalancer.go Outdated Show resolved Hide resolved
xds/internal/balancer/clusterresolver/configbuilder.go Outdated Show resolved Hide resolved
@easwars easwars assigned zasweq and unassigned easwars May 5, 2023
@zasweq
Copy link
Contributor Author

zasweq commented May 5, 2023

Got to all the comments except trailing comma. I'll get to that after doing a pass on Doug's PR. Should be ready for another pass though :D.

@zasweq zasweq assigned easwars and unassigned zasweq May 5, 2023
resolver/resolver.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/xdsresource/unmarshal_cds.go Outdated Show resolved Hide resolved
xds/internal/xdsclient/xdsresource/unmarshal_cds.go Outdated Show resolved Hide resolved
xds/internal/balancer/wrrlocality/balancer.go Outdated Show resolved Hide resolved
xds/internal/balancer/wrrlocality/balancer_test.go Outdated Show resolved Hide resolved
xds/internal/balancer/wrrlocality/balancer_test.go Outdated Show resolved Hide resolved
xds/internal/balancer/wrrlocality/balancer_test.go Outdated Show resolved Hide resolved
xds/internal/balancer/wrrlocality/balancer_test.go Outdated Show resolved Hide resolved
xds/internal/balancer/wrrlocality/balancer_test.go Outdated Show resolved Hide resolved
@easwars easwars assigned zasweq and unassigned easwars May 5, 2023
@zasweq zasweq assigned easwars and unassigned zasweq May 8, 2023
attributes/attributes.go Outdated Show resolved Hide resolved
resolver/resolver.go Outdated Show resolved Hide resolved
xds/internal/balancer/wrrlocality/balancer.go Outdated Show resolved Hide resolved
xds/internal/balancer/wrrlocality/balancer.go Outdated Show resolved Hide resolved
xds/internal/balancer/wrrlocality/balancer.go Show resolved Hide resolved
xds/internal/balancer/wrrlocality/balancer.go Outdated Show resolved Hide resolved
@easwars easwars assigned zasweq and unassigned easwars May 8, 2023
"fmt"
"testing"

v3 "github.com/cncf/xds/go/xds/type/v3"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to match the string used in other imports (and specified in the copybara config).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

test/xds/xds_client_custom_lb_test.go Outdated Show resolved Hide resolved
test/xds/xds_client_custom_lb_test.go Outdated Show resolved Hide resolved
test/xds/xds_client_custom_lb_test.go Outdated Show resolved Hide resolved
internal/testutils/xds/e2e/clientresources.go Outdated Show resolved Hide resolved
internal/testutils/xds/e2e/clientresources.go Outdated Show resolved Hide resolved
// be passed 5 ports, and the first two ports will be put in the first locality,
// and the last three will be put in the second locality. It also configures the
// proto message passed in as the Locality + Endpoint picking policy in CDS.
func clientResourcesNewFieldSpecifiedAndPortsInMultipleLocalities2(params e2e.ResourceParams, ports []uint32, m proto.Message) e2e.UpdateOptions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do enhance EndpointOptions to contain LocalityOptions, we wouldn't need this helper function. This can be written as part of the test, so that the reader will clearly be able to see which ports belong to which localities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted this helper. Thanks.

Copy link
Contributor Author

@zasweq zasweq May 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think you can delete the endpoint flow/helpers to build the EDS configuration, but this overall flow is much cleaner with a documented function (about the 5 ports passed in, proto.Message that gets plumbed as the Locality + Endpoint picking policy in CDS).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think getting rid of the helper would be cleaner since the reader of the test will know exactly what is happening by reading the body of the test instead of having to navigate to this helper, and read its comment and then figure out what is happening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine. I don't agree, but switched.


managementServer, nodeID, _, r, cleanup := e2e.SetupManagementServer(t, e2e.ManagementServerOptions{})
defer cleanup()
backend1 := stubserver.StartTestService(t, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For loop instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer it this way to hold ref to backend to grab backend.Address, and also backend.Port which I pass to helper you commented about above. If you feel strongly about this I'm willing to change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can still hold the reference to the address and the port in a slice. But if you dont want to do it, then thats fine. But do add a new line after this block and before the test table starts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, forgot to mention would require creating two length 5 slices which I don't like.

test/xds/xds_client_custom_lb_test.go Outdated Show resolved Hide resolved
@zasweq zasweq assigned easwars and unassigned zasweq May 8, 2023
xds/internal/balancer/wrrlocality/balancer.go Show resolved Hide resolved
Comment on lines 95 to 98
return nil, fmt.Errorf("xds: invalid LBConfig for wrrlocality: %s, error: %v", string(s), err)
}
if lbCfg == nil || lbCfg.ChildPolicy == nil {
return nil, errors.New("xds: invalidw LBConfig for wrrlocality: child policy field must be set")
return nil, errors.New("xds: invalid LBConfig for wrrlocality: child policy field must be set")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want the xds prefix here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to same prefix as the errors returned from UpdateClientConnState xds_wrr_locality:

}
ai, ok := getAddrInfo(addr)
if !ok {
return fmt.Errorf("xds_wrr_locality: addr: %v is misisng locality weight information", addr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be instead?
return fmt.Errorf("xds_wrr_locality: misisng locality weight information in address %q", addr)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You kept my missing typo in your suggestion :). Switched, without the typo :).

wtCfgJSON, err := json.Marshal(wtCfg)
if err != nil {
// Shouldn't happen.
return fmt.Errorf("xds_wrr_locality: error marshalling prepared wtCfg: %v", wtCfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this error message more readable:
return fmt.Errorf("xds_wrr_locality: error marshalling prepared config: %v", wtCfg)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I remember having a big discussion between marshalling vs marshaling with Doug :). Maybe you should rekindle it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ummmmmmm yeah I thought it was marshaling. What's difference? Switched to what you had.

}
var sc serviceconfig.LoadBalancingConfig
if sc, err = b.childParser.ParseConfig(wtCfgJSON); err != nil {
return fmt.Errorf("xds_wrr_locality: config generated %v by wrr_locality_experimental is invalid: %v", wtCfgJSON, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is wrr_locality twice in this error string

Copy link
Contributor Author

@zasweq zasweq May 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted "by wrr_locality_experimental."

Copy link
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments remaining. LGTM.

"google.golang.org/grpc/xds/internal"
)

var (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched.

// be passed 5 ports, and the first two ports will be put in the first locality,
// and the last three will be put in the second locality. It also configures the
// proto message passed in as the Locality + Endpoint picking policy in CDS.
func clientResourcesNewFieldSpecifiedAndPortsInMultipleLocalities2(params e2e.ResourceParams, ports []uint32, m proto.Message) e2e.UpdateOptions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think getting rid of the helper would be cleaner since the reader of the test will know exactly what is happening by reading the body of the test instead of having to navigate to this helper, and read its comment and then figure out what is happening.

@easwars easwars assigned zasweq and unassigned easwars May 9, 2023
@zasweq zasweq merged commit 5e58734 into grpc:master May 9, 2023
11 checks passed
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants