-
Notifications
You must be signed in to change notification settings - Fork 91
Routing proto revision #105
Conversation
Responded to feedback from @rshriram:
|
// ClusterIdentifier, that is used to uniquely identify a version of the | ||
// upstream service. | ||
// Destination declares policies that determine how to handle traffic for a | ||
// destination service (load balancing policies, failure recovery policies such |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kuat, thinking more about this, the problem with the current t destination is that its equivalent to envoy's virtual host. All routes under that host get the same policy. This might not be okay in several cases, especially in cases of retries. The retry policy is very much dependent on the API being called. Payments/checkcredit can be retried for any error code but payments/makepayment can't be retried if destination returns a 500 or example. That's why Envoy has retry policy per route entry. (Timeout as well). In your previous version, iirc destination was equivalent to the upstream cluster with tag based qualifiers and destination name. Right? Why was this removed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brought it back. I don't like the inconsistency that some policies apply per cluster while others apply per virtual host. Seems like Envoy-specific problem.
model/proxy/alphav1/config/cfg.proto
Outdated
CircuitBreaker circuit_breaker = 5; | ||
|
||
// L7 fault injection policy applies to L7 traffic | ||
HTTPFaultInjection http_fault = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of here? Either a http_fault or l4_fault. Not both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't apply L7 fault injection to L4 service ports and vice versa?
Service must declare all its ports and the protocol on each port.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that the case?
And yes, we don't do L7 fault injection on L4. or vice versa. Envoy or nginx clearly separate stream connections (UDP/TCP) from HTTP and handle HTTP separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need oneof constraint since the fault injection policies don't apply to the same traffic. If you have L4 service, you write L4 fault policy; if you have both L4 and L7 ports, you write both fault policies?
|
||
// Set of HTTP match conditions based on the request metadata | ||
HTTPMatchAttributes http = 6; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also one of here? As per yesterday's discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here. If my service declares L4 and L7 ports, I shouldn't be writing two rules for them since destinations are likely to be the same.
model/proxy/alphav1/config/cfg.proto
Outdated
|
||
// Precedence is used to disambiguate the order of application of rules | ||
// for the same destination service. A higher number takes priority. | ||
int32 precedence = 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice. The rules format is evolving into the existing amalgam8 constructs @frankb. We can't escape precedence.
I would also add a version field to the rule. So that it gives the end user the ability to rollback to a stable rule if they mess up something. this would certainly mean that manager needs to provide a rollback API. (Even if you don't want to do this now, can we just add this comment so that we can track this issue. Matter of fact, a CDN company I spoke to actually wanted this capability).
I would add twl more lines here.
E.g. for http, routes under a virtual host need to be ordered. Here precedence is priority.
Secondly, depending on the platform, the storage for the rules format may not support atomic replace (and concurrently will be a bigger mess). The easiest way to tackle this is to ask the user to create a new rule with higher priority and delete the old rule with lower priority. Priority essentially here becomes a rule version of some sort. E.g. send all traffic to helloworld/hello to v1. Then next rule is split traffic to helloworld/hello between v1 and v2. Both rules apply to same service and same route and virtual host.
To provide atomic update semantics to the user, the platform must support a transactional storage and patch/edit semantics as well. K8s may have etcd but unsure if mesos has.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no transactions in k8s, there's only optimistic concurrency. Hence, we should be dealing with revision at a higher-level than this proto. Every proto has its etcd revision number implicitly. That way we can provide optimistic apply/edit like kubctl. I don't think Manager can provide strong guarantees about rollouts though, given how k8s is structured.
The priority can be used for this too, and there's nothing wrong with that. I'll add a comment.
// (for an unhealthy upstream cluster) number of consecutive requests that | ||
// should succeed before the upstream cluster is marked healthy. | ||
// (for an unhealthy upstream) number of consecutive requests that | ||
// should succeed before the upstream is marked healthy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a TODO to add all Envoy CB features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
model/proxy/alphav1/config/cfg.proto
Outdated
// Service instances - set of pods/VMs/containers where the service is running. | ||
// The members share one or more common attributes. For e.g., all pods in a | ||
// group could share a common set of labels, or be running the same version of | ||
// the application binary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems confusing. I would say service instances share only one thing - they are all implementing some variation of the same service. I also think that before defining the concept of Service instances, we need to clearly define the term "Service" itself. Specifically, explain that a service is a name (string) of some functionality (e.g., ServiceA) representing part of an application, and point out that clients use this name to refer to the functionality being called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, deleted the old phrasing.
message Destination { | ||
// Service for which the service version is defined. | ||
string destination = 1; | ||
oneof route_rule { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't "service" be a better that "destination" for this field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"service" is a keyword in protos. In the routing context, service name is the routing destination that the rules modify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Maybe it's not a big deal, but it just seems strange to have a field name "destination" inside the "Destination" structure. Given that service is a reserved word, I understand the problem, although I would prefer something like "service_name", myself, but that's just a personal nit, so feel free to ignore.
// Destination declares policies that determine how to handle traffic for a | ||
// destination service (load balancing policies, failure recovery policies such | ||
// as timeouts, retries, circuit breakers, etc). Policies are applicable per | ||
// individual service versions. It is an error to define multiple policies for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the Destination structure, it looks more like Policies are applicable to all versions of a service. How are Policies applicable per individual service version, given that the Destination structure only identifies the service and does not include any specific version info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind. I see this has been fixed already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
model/proxy/alphav1/config/cfg.proto
Outdated
// itself from the evolution of dependent services. | ||
// | ||
// Service is a unit of an application with a unique name that other services | ||
// can refer to the functionality being called. Service instances are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/can/use to/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
model/proxy/alphav1/config/cfg.proto
Outdated
// | ||
// Service is a unit of an application with a unique name that other services | ||
// can refer to the functionality being called. Service instances are | ||
// pods/VMs/containers that comprise the service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/comprise/implement/ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
bb38f2c
to
af30b78
Compare
I have a question about ingress: are we planning to represent ingress resource as a routing rule or should we keep them separate? |
model/proxy/alphav1/config/cfg.proto
Outdated
// | ||
// N.B. The map is used instead of pstruct due to lack of serialization support | ||
// in golang protobuf library (see https://github.com/golang/protobuf/pull/208) | ||
map<string, string> version = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Version should be = 2, and the following fields bumped by 1
model/proxy/alphav1/config/cfg.proto
Outdated
// service versions for a given service (see the discussion on versions above). | ||
// The proxy would choose the version based on various routing rules. | ||
// | ||
// Applications address only the destination service using without knowledge of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/using/d
An attempt to capture ingress routing as well:
|
Made up examples:
|
Changes:
|
Goal: