-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(kuma-cp) TrafficRoute in multizone issue #1979
Conversation
Signed-off-by: Ilya Lobkov <ilya.lobkov@konghq.com>
Signed-off-by: Ilya Lobkov <ilya.lobkov@konghq.com>
Signed-off-by: Ilya Lobkov <ilya.lobkov@konghq.com> # Conflicts: # pkg/xds/cache/cla/cache.go
Signed-off-by: Ilya Lobkov <ilya.lobkov@konghq.com>
Signed-off-by: Ilya Lobkov <ilya.lobkov@konghq.com>
Signed-off-by: Ilya Lobkov <ilya.lobkov@konghq.com>
Signed-off-by: Ilya Lobkov <ilya.lobkov@konghq.com>
pkg/xds/cache/cla/cache.go
Outdated
func (c *Cache) GetCLA(ctx context.Context, meshName, meshHash, service string, apiVersion envoy_common.APIVersion) (proto.Message, error) { | ||
key := fmt.Sprintf("%s:%s:%s:%s", apiVersion, meshName, service, meshHash) | ||
func (c *Cache) GetCLA(ctx context.Context, meshName, meshHash string, cluster envoy_common.Cluster, apiVersion envoy_common.APIVersion) (proto.Message, error) { | ||
key := fmt.Sprintf("%s:%s:%s:%s", apiVersion, meshName, cluster.Name(), meshHash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would this work with following case
apiVersion: kuma.io/v1alpha1
kind: TrafficRoute
mesh: default
metadata:
name: route-all-default
spec:
sources:
- match:
kuma.io/service: 'web'
destinations:
- match:
kuma.io/service: 'backend'
conf:
split:
- weight: 50
destination:
kuma.io/service: backend
version: 1
- weight: 50
destination:
kuma.io/service: backend
version: 2
---
apiVersion: kuma.io/v1alpha1
kind: TrafficRoute
mesh: default
metadata:
name: route-all-default
spec:
sources:
- match:
kuma.io/service: 'other'
destinations:
- match:
kuma.io/service: 'backend'
conf:
split:
- weight: 50
destination:
kuma.io/service: backend
version: 3
- weight: 50
destination:
kuma.io/service: backend
version: 4
if the cache is maintained for the whole mesh, wouldn't we generate _0_
, _1_
and have a mismatch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you're right, I added the method Hash()
to use in MeshHash (instead of cluster.Name()
)
Signed-off-by: Ilya Lobkov <ilya.lobkov@konghq.com> # Conflicts: # pkg/xds/envoy/listeners/v2/http_outbound_route_configurer_test.go # pkg/xds/envoy/listeners/v3/http_outbound_route_configurer_test.go
Signed-off-by: Ilya Lobkov <ilya.lobkov@konghq.com>
Signed-off-by: Ilya Lobkov <ilya.lobkov@konghq.com>
@Mergifyio update |
Command
|
|
Signed-off-by: Ilya Lobkov <ilya.lobkov@konghq.com>
} | ||
return errors.Errorf("resulted split values %v and %v aren't within original values %v and %v", | ||
echo1Resp/float64(maxAttempts)*100, echo2Resp/float64(maxAttempts)*100, v1Weight, v2Weight) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the advantage of this algo vs simple send 100 requests, see how they are distributed?
} | ||
|
||
for _, ne := range notExpected { | ||
Expect(stdout).ToNot(ContainSubstring(ne)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it safe to do expect from eventually? I was under impression this is not a good idea. Will this allow Eventually to do next try or will this fail the test?
kuma.io/service: echo-server_kuma-test_svc_8080 | ||
version: v4 | ||
` | ||
Expect(NewClusterSetup().Install(YamlUniversal(trafficRoute)).Setup(global)).To(Succeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: YamlUniversal(trafficRoute)(global)
is shorter
err := waitRatio("echo-server_kuma-test_svc_8080.mesh", v1Weight, v2Weight, "echo-v1", "echo-v2") | ||
Expect(err).ToNot(HaveOccurred()) | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like waitRatio
and collectResponses
are a bit cryptic. Names do not suggest that those are assertion and I think both of them contains a bit complicated logic.
Here's what I would do:
- Write a method
collectResponses := func(destination string) map[string]int
(instance to responses) that only sends 10 (can be 100 if needed) requests and collects responses, don't do any assertions - In
should access all instances of the service
I would write - eventually (collectResponses and then do assertion in the test that there is 1,2,4 and not 3) - In
should route 100 percent of the traffic to the different service
I would write - eventually (collectResponses and then do assertion that responses are from one instance) - In
should route split traffic between the versions with 20/80 ratio
I would write - eventually (collectResponses and count if the traffic is propagated properly)
I think it will be also more flexible to add new test cases. Also I think collectResponses
could be reused between both tests.
Signed-off-by: Ilya Lobkov <ilya.lobkov@konghq.com>
Signed-off-by: Ilya Lobkov <ilya.lobkov@konghq.com>
That's some next-level sorcery with those chained assertions 👀 |
Summary
In some specific cases, TrafficRoutes in multizone didn't work correctly. It happened because envoy.Cluster doesn't support hosts with the same ip/port but with different metadata for subsets.
Current PR changes the way we implemented TrafficRoutes, instead of ClusterSubsets we will use a separate cluster for each TrafficRoute.Split.
This change affected metric because previously Service name and Cluster name we the same, but now Cluster has autogenerated name
service_name-_num_
wherenum
is an index of Split. In order to support existing grafana dashboards Kuma Data Plane hijacks metrics and merge metrics for clusters that belong to the same service.Full changelog
Issues resolved
Fix #XXX
Documentation