You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 14, 2018. It is now read-only.
The manager code currently does not filter rules based on the source field. This is a bug as per the routing rule specifications (and will break the demo as well).
In order to do source based routing, we need to know the service to which the pod belongs to, so that we can filter the rules that don't apply to the pod and generate the right envoy config (also use that for service_cluster in the envoy command line args).
Since pods can be late bound to a service, and a pod can belong to multiple services, there is no way to derive the association automatically in advance before generating the routing rules. There is another complication where, the pod need not belong to any service.
We cannot route by pod names because they do not correlate with the source service field in the route rules. Nor should we complicate the route rules any further by asking users to specify the pod names instead of the source field.
There are two solutions:
Delay generation of routing rules until the pod is bound to a service. This means that the proxy should not accept any traffic from the app until it has all the routing rules in place. Once we get the initial membership information, we also need to continuously re-evaluate the service(s) to which a pod belongs to and re-evaluate rules, generate new config as this relationship changes.
Ask user to specify via env vars or cli args, the name of the k8s service to which the pod belongs to.
The latter option is the easiest to begin with. That is the mode in which we operate
The former option will work only if the pod belongs to some service. if the pod does not belong to any service (i.e. a wildcard source, which means all routing rules apply), there is no way to automatically determine this relationship, unless explicitly indicated by the user, which leads us to option 2 again.
At the same time, imposing this automatic inference logic on the end user has correctness implications as well, because we cannot assume that if a pod does not belong to any service, it can route to all services. This might result in unexpected behavior that the user does not want.
Simple example to illustrate the issue above.
user sets routing rules: a (tags v1) goes to b tags v1 only.
time t=0: pod a-v1 is launched (service A is not bound yet). Envoy config at pod a-v1: empty
t=1: pods b-v1 and b-v2 are launched and bound to serviceB. Envoy config at pod a-v1: route to b-v1/b-v2 for calls to serviceB . This will not be an issue because pod a-v1 gets no traffic (hopefully).
t=2: serviceA is launched. Information has not propagated to pod a-v1 or the proxy agent in pod a-v1 has not picked up these changes yet. Envoy config at pod a-v1: route to b-v1/b-v2 for calls to serviceB . This will be a problem now, because traffic coming to serviceA enters pod a-v1 and a-v1 will now route to either b-v1 or b-v2 (user wants traffic to go to only b-v1).
t=3: proxy agent at pod a-v1 picks up the changes (that its bound to serviceA). It recomputes the routing rules and sets up envoy such that it only routes to b-v1 (this is the route expected by the user).
One could apply similar examples to the mixer logic as well (e.g., I could set an ACL saying serviceA should not talk to serviceB except when cookie is user=shriram).
It is certainly possible to argue that this will be the kubernetes behavior, but it is asinine to assume that an average end user will be able to comprehend all this complexity related to concurrency, eventual consistency, etc.
Bottom line: Telling the user that initial route settings (default routes) will be eventually consistent results in a lot of confusion on the end user part.
FWIW, in the previous version of amalgam8, we asked users to explicitly specify the service to which the pod/container belonged to, as a conscious decision to avoid this complexity. In the current k8s integration (similar to the manager), we have the same problem as described above.
A possible middleground is to set cluster level policies at the mixer such that if a pod does not belong to any service, no traffic goes out of it via its proxy. This would mean result in mixer needing to know about routing rules as well (atleast service versions). (we would also need to revert PR #116 that uses the pod name as the sourceClusterName for proxy, in order to generate service graph using prometheus).
The text was updated successfully, but these errors were encountered:
I wouldn't call it a bug, we haven't implemented source-based filtering. I would rule against env variables since those cannot be changed without a pod restart. You're folding service definition into pod definition, which is an OK practice, but not the common case.
Regarding eventual consistency - isn't that inherent to the distributed proxy mesh? All operations are asynchronous and we need feedback loops for good usability, where the remote agent reports back on the success or failure of the operation.
Regarding source-based filtering - so is the basic problem how do we enforce that "only service a v1 can talk to service b v1"? This is impossible with the current defaulting rule "a can talk to b". We need to revoke the default rule to begin with.
Let's say you want to apply that rule at the client-side by overriding the default rule. Each proxy instance carries a list of instances it belongs to (including service versions). Then it can match the source field against the list of instances and apply only if it matches. The missing piece is enumerating all service versions from the rules (since those are declared dynamically by rules) so we can attach the list of service versions to each proxy instance.
The manager code currently does not filter rules based on the source field. This is a bug as per the routing rule specifications (and will break the demo as well).
In order to do source based routing, we need to know the service to which the pod belongs to, so that we can filter the rules that don't apply to the pod and generate the right envoy config (also use that for service_cluster in the envoy command line args).
Since pods can be late bound to a service, and a pod can belong to multiple services, there is no way to derive the association automatically in advance before generating the routing rules. There is another complication where, the pod need not belong to any service.
We cannot route by pod names because they do not correlate with the source service field in the route rules. Nor should we complicate the route rules any further by asking users to specify the pod names instead of the source field.
There are two solutions:
Delay generation of routing rules until the pod is bound to a service. This means that the proxy should not accept any traffic from the app until it has all the routing rules in place. Once we get the initial membership information, we also need to continuously re-evaluate the service(s) to which a pod belongs to and re-evaluate rules, generate new config as this relationship changes.
Ask user to specify via env vars or cli args, the name of the k8s service to which the pod belongs to.
The latter option is the easiest to begin with. That is the mode in which we operate
The former option will work only if the pod belongs to some service. if the pod does not belong to any service (i.e. a wildcard source, which means all routing rules apply), there is no way to automatically determine this relationship, unless explicitly indicated by the user, which leads us to option 2 again.
At the same time, imposing this automatic inference logic on the end user has correctness implications as well, because we cannot assume that if a pod does not belong to any service, it can route to all services. This might result in unexpected behavior that the user does not want.
Simple example to illustrate the issue above.
user sets routing rules: a (tags v1) goes to b tags v1 only.
time t=0: pod a-v1 is launched (service A is not bound yet). Envoy config at pod a-v1:
empty
t=1: pods b-v1 and b-v2 are launched and bound to serviceB. Envoy config at pod a-v1:
route to b-v1/b-v2 for calls to serviceB
. This will not be an issue because pod a-v1 gets no traffic (hopefully).t=2: serviceA is launched. Information has not propagated to pod a-v1 or the proxy agent in pod a-v1 has not picked up these changes yet. Envoy config at pod a-v1:
route to b-v1/b-v2 for calls to serviceB
. This will be a problem now, because traffic coming to serviceA enters pod a-v1 and a-v1 will now route to either b-v1 or b-v2 (user wants traffic to go to only b-v1).t=3: proxy agent at pod a-v1 picks up the changes (that its bound to serviceA). It recomputes the routing rules and sets up envoy such that it only routes to b-v1 (this is the route expected by the user).
One could apply similar examples to the mixer logic as well (e.g., I could set an ACL saying serviceA should not talk to serviceB except when cookie is user=shriram).
It is certainly possible to argue that this will be the kubernetes behavior, but it is asinine to assume that an average end user will be able to comprehend all this complexity related to concurrency, eventual consistency, etc.
Bottom line: Telling the user that initial route settings (default routes) will be eventually consistent results in a lot of confusion on the end user part.
cc @zcahana @elevran @louiscryan @mandarjog @kyessenov @frankbu
FWIW, in the previous version of amalgam8, we asked users to explicitly specify the service to which the pod/container belonged to, as a conscious decision to avoid this complexity. In the current k8s integration (similar to the manager), we have the same problem as described above.
A possible middleground is to set cluster level policies at the mixer such that if a pod does not belong to any service, no traffic goes out of it via its proxy. This would mean result in mixer needing to know about routing rules as well (atleast service versions). (we would also need to revert PR #116 that uses the pod name as the sourceClusterName for proxy, in order to generate service graph using prometheus).
The text was updated successfully, but these errors were encountered: