Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter endpoint in LB based on metadata #1279

Closed
htuch opened this issue Jul 18, 2017 · 14 comments
Closed

Filter endpoint in LB based on metadata #1279

htuch opened this issue Jul 18, 2017 · 14 comments
Assignees
Labels

Comments

@htuch
Copy link
Member

htuch commented Jul 18, 2017

As decided in https://github.com/lyft/envoy-api/issues/81, we added metadata support to EDS and matching in routes to the API in envoyproxy/data-plane-api@abcd5ba. This issue will track the implementation work around endpoint subsetting in the LB stage.

@mccv @9len @rshriram

@htuch htuch added the api/v2 label Jul 18, 2017
@9len
Copy link

9len commented Jul 18, 2017

cc @zuercher too

@htuch
Copy link
Member Author

htuch commented Jul 21, 2017

@9len @zuercher Will you folks be able to provide some implementation bandwidth on this feature?

@9len
Copy link

9len commented Jul 21, 2017

Hi @htuch; we're focused on #128 right now, as a means to make development easier for us, but once we land (or time-out trying to land) that, @zuercher can take a look.

@htuch
Copy link
Member Author

htuch commented Jul 21, 2017

Sure. It's going to be a few weeks until we've fully plumbed in protos into the CDS/EDS/RDS aspects of Envoy, was just trying to get a provisional notion of who might own this one. Will assign to @zuercher for now.

@9len
Copy link

9len commented Jul 21, 2017

our general focus will be on implementing the xDS APIs with our stuff, but this is definitely something we'll need, so we can sign up for it if nobody else needs it sooner.

@rshriram
Copy link
Member

rshriram commented Sep 3, 2017

Would also be nice to have fallback support , i.e. if no endpoints with the given labels are found, load balance among all endpoints. And the ability to derive the metadata from request headers.

@zuercher
Copy link
Member

zuercher commented Sep 7, 2017

I'm starting to look at this.

I think a reasonable first step is to plumb the LbEndpoint Metadata from EDS into Upstream::HostDescription/Impl. Then it should be possible to use the data from inside the load balancers.

@zuercher
Copy link
Member

Ready to take the next steps. This is a proposal for some further modifications (include a change to data-plane-api).

It seems reasonable to me not to have to maintain alternate load balancer implementations (there are currently 5) that differ only by being configurable with subsets. So I propose to

  1. Write a subsetting LB that wraps multiple underlying load
    balancers, one for each configured subset of hosts. A default subset
    may be configured as empty, all endpoints, or a specific
    subset of endpoints. Configuring subsets is optional and when unspecified the
    result is load balancers configured exactly as they are now. In the
    subsetting case, the underlying LBs all use the same lb_policy, health
    checks, and so forth as if subsetting were not enabled.

Add subset configuration to the Cluster message in cds.proto:

// Optionally divide the endpoints in this cluster into subsets defined by endpoint
// metadata.
message LbSubsetConfig {
  // The behavior used when no endpoint subset matches the LoadBalancerContext.
  // The options are no_fallback, any_endpoint, or default_subset. The value defaults
  // to no_fallback. If no_fallback is selected, a result equivalent to no healthy
  // hosts is reported. If any_endpoint is selected, any cluster endpoint may be
  // returned (subject to policy, health checks, etc). If default_subset is selected,
  // the load balancing is performed over the hosts matching the values in
  // default_subset.
  enum LbSubsetFallbackPolicy {
    NO_FALLBACK = 0;
    ANY_ENDPOINT = 1;
    DEFAULT_SUBSET = 2;
  }
  LbSubsetFallbackPolicy fallback_policy = 1;

  // Specifies the default subset used during fallback if fallback_policy is
  // default_subset. Each field in default_subset is compared to the matching
  // LbEndpoint.Metadata under the "envoy.lb" namespace. It is valid for no hosts to
  // match, in which case the behavior is the same as a fallback_policy of
  // no_fallback.
  google.protobuf.Struct default_subset = 2;

  // Specifications for subsets. For each entry, LbEndpoint.Metadata's "envoy.lb" 
  // namespace is traversed and a subset is created for each unique combination of 
  // key and value. For example:
  // { "subset_keys": [
  //     { "keys": [ "version" ] },
  //     { "keys": [ "stage", "hardware_type" ] }
  // ]}
  // Subsets may overlap. In the case of overlapping subsets, the first matching
  // subset is selected.
  message LbSubsetKeys {
    repeated string keys = 1;
  }
  repeated LbSubsetKeys subset_keys = 3;
}
LbSubsetConfig lb_subset_config = 22;
  1. Modify the LoadBalancerContext interface to provide a
    ProtobufWkt::Struct that represents the metadata for the desired
    subset. The subsetting LB will look for a subset that matches the
    given metadata and then delegate to the corresponding underlying LB to
    actually select a host.

  2. Wire up the Router to provide data from RouteAction.metadata_match
    and weighted_clusters in the LoadBalancerContext.

@htuch
Copy link
Member Author

htuch commented Sep 14, 2017

We've discussed making LB's pluggable like filters before @rshriram. We could add an opaque config for these so we don't need to keep cramming stuff into CDS.

@zuercher
Copy link
Member

I'm not entirely sure I follow. Is this just adding

google.protobuf.Struct lb_options = 22;

to the Cluster message and placing the subset config stuff in a separate file?

@mattklein123
Copy link
Member

@htuch I agree we should allow for pluggable LBs like we are now allowing for filters, stats, etc. but in this case I see no reason not to add the subsetting support directly mainly because as @zuercher points out it can be done in a way that does not change existing behavior.

@zuercher ^ sounds reasonable/useful to me. My only concern is actually perf. It's not great if we have to compare via reflection the struct from the route to the struct from the subset during LB. I think we are going to have to pre-parse somehow which we should be able to do when the route loads as well as when the LB refreshes. I'm not exactly sure how this would work though. @htuch any thoughts on how to best do this?

@zuercher
Copy link
Member

zuercher commented Sep 15, 2017

I expect that the structs will usually be flat (e.g., the field values will be string/bool/number and not nested structs or lists). I'm fairly new to protobufs, but it looks like it's possible to treat the struct as map of field names to values where each value has a kind that determines how to extract the typed value. Nested lists and structs would be more expensive since you have to traverse the hierarchy.

That said, I think it makes sense to compute hashes to avoid doing many comparisons.

And just to clarify, I had been thinking that there might be more fields in the structs from the route than in the subset struct in the LB. For example, the route might specify {"version":"123", "stage":"prod"} where the subset struct only includes {"version":"123"}. In this case I'd expect the subset to be used. So the hashing would be one level down and there'd likely be a new type used to represent the route and subset structs to handle this.

@htuch
Copy link
Member Author

htuch commented Sep 15, 2017

Yeah, I think we're going to have to invent our own thing here, protobuf's internal representation of Struct is verbose. I agree that flat hashing is what we want for the common case. One idea is that we could build an index hashtable for each LB subset query we encounter as well, so we only pay this penalty once.

@zuercher
Copy link
Member

Ok. I'll put up a review for the data-plane-api change and then get started on the actual implementation.

htuch pushed a commit to envoyproxy/data-plane-api that referenced this issue Sep 28, 2017
Provides a configuration point for a load balancer that can divide endpoints into subsets addressable by route metadata.

See discussion in envoyproxy/envoy#1279.

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
htuch pushed a commit that referenced this issue Oct 24, 2017
Fixes #1279, and is the final part of splitting #1735.

Provides the actual load balancer for subsets with basic stats for the number of subsets active, created, removed, selected (used for host selection), and fallbacks (used fallback path).

Signed-off-by: Stephan Zuercher <stephan@turbinelabs.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants