Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant e2e failures against unreleased Kong with Kuma #5498

Closed
1 task done
randmonkey opened this issue Jan 29, 2024 · 3 comments · Fixed by #5507
Closed
1 task done

Constant e2e failures against unreleased Kong with Kuma #5498

randmonkey opened this issue Jan 29, 2024 · 3 comments · Fixed by #5507
Labels
bug Something isn't working
Milestone

Comments

@randmonkey
Copy link
Contributor

randmonkey commented Jan 29, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Recently the daily e2e run against unreleased Kong fails on the 2 cases running with kuma:
https://github.com/Kong/kubernetes-ingress-controller/actions/runs/7683416231/job/20938806328#logs

    kuma_test.go:36: waiting for route from Ingress to be operational at http://172.18.128.1/echo and forward traffic to all 3 backends
    kuma_test.go:36: 
        	Error Trace:	/home/runner/work/kubernetes-ingress-controller/kubernetes-ingress-controller/test/e2e/helpers_test.go:488
        	            				/opt/hostedtoolcache/go/1.21.6/x64/src/runtime/asm_amd64.s:1650
        	Error:      	"map[With IP address 10.244.0.15.
        	            	
        	            	:{} With IP address 10.244.0.17.
        	            	
        	            	:{}]" should have 3 item(s), but has 2

Expected Behavior

e2e test passes.

Steps To Reproduce

Run e2e tests TestDeployAllInOnePostgresKuma or TestDeployAllInOneDBLessKuma

Kong Ingress Controller version

`latest` image  3.0.2.

Kubernetes version

All supported k8s versions in this run:
https://github.com/Kong/kubernetes-ingress-controller/actions/runs/7675532503

- 1.29.0
- 1.28.0
- 1.27.4
- 1.26.6
- 1.25.11

Anything else?

No response

@randmonkey randmonkey added the bug Something isn't working label Jan 29, 2024
@tao12345666333
Copy link
Member

ref: #5496

@mflendrich mflendrich added this to the KIC v3.1.x milestone Jan 29, 2024
@mflendrich mflendrich assigned mflendrich and czeslavo and unassigned mflendrich Jan 29, 2024
@czeslavo
Copy link
Contributor

Copy-pasting @rainest's investigation notes from Slack for posterity:

All Kuma tests failed on a check that uses multiple replicas for the test service. It should distribute requests across all three replicas, but only distributed across two. This needs more investigation, but IDK how it'd break. AFAIK it should just be a standard upstream with three targets and no special LB configuration.

distribution is ostensibly entirely up to Kuma in that test: those services don't use separate targets, they use the service hostname, so that Kuma can intercept them and forward them on to some upstream. it does look like this was some change to the gateway, however--we started seeing failures on the latest dev image as of the 22nd (https://github.com/Kong/kubernetes-ingress-controller/actions/workflows/e2e_nightly.yaml) and there haven't been any changes to the Kuma chart (so no new version of the mesh proxy) since the 10th. that suggests some change in how the gateway is sending requests that prompts Kuma to distribute them unevenly.

it doesn't appear to be sticky based on which kong instance originated the request. if I send through the pods individually, one distributes requests to only one (of three) upstream endpoints, and the other distributes to two:

root@cool:/# for i in `seq 1000`; do curl -s 10.244.0.14:8000/echo >> /tmp/one; done
root@cool:/# for i in `seq 1000`; do curl -s 10.244.0.13:8000/echo >> /tmp/two; done
root@cool:/# sort /tmp/one | uniq -c
 1000 
  1000 With IP address 10.244.0.15.
root@cool:/# sort /tmp/two | uniq -c
  1000 
   504 With IP address 10.244.0.15.
   496 With IP address 10.244.0.17.

scaling tests indicate that we consistently only see two of the upstream replicas per proxy instance, but each instance sees a different set of endpoints. if I wait around for a while, that chosen set of upstream endpoints can change, suggesting the chosen set persists for the keepalive timeout. although scaling either the proxy or upstream replicas has no noticeable effect, increasing the worker count does. my intuition from that is that a worker delivers requests to the same endpoint for the duration of the keepalive period, though IDK what would have changed (were workers not maintaining keepalive connections to the mesh proxies before?)

// scaled echo replicas to 10
root@cool:/# for i in `seq 1000`; do curl -s 10.244.0.13:8000/echo >> /tmp/noot; done
root@cool:/# for i in `seq 1000`; do curl -s 10.244.0.14:8000/echo >> /tmp/poot; done
root@cool:/# sort /tmp/noot | uniq -c
1000
497 With IP address 10.244.0.19.
503 With IP address 10.244.0.20.
root@cool:/# sort /tmp/poot | uniq -c
1000
498 With IP address 10.244.0.17.
502 With IP address 10.244.0.20.

// scaled kong proxy replicas to 4
root@cool:/# for i in `seq 1000`; do curl -s 10.244.0.13:8000/echo >> /tmp/1; done
root@cool:/# for i in `seq 1000`; do curl -s 10.244.0.14:8000/echo >> /tmp/2; done
root@cool:/# for i in `seq 1000`; do curl -s 10.244.0.26:8000/echo >> /tmp/3; done
root@cool:/# for i in `seq 1000`; do curl -s 10.244.0.27:8000/echo >> /tmp/4; done
root@cool:/# sort /tmp/1 | uniq -c
1000
492 With IP address 10.244.0.21.
508 With IP address 10.244.0.23.
root@cool:/# sort /tmp/2 | uniq -c
1000
522 With IP address 10.244.0.21.
478 With IP address 10.244.0.22.
root@cool:/# sort /tmp/3 | uniq -c
1000
487 With IP address 10.244.0.16.
513 With IP address 10.244.0.17.
root@cool:/# sort /tmp/4 | uniq -c
1000
488 With IP address 10.244.0.17.
512 With IP address 10.244.0.19.

// worker count scaled to 5
root@cool:/# for i in `seq 1000`; do curl -s 10.244.0.31:8000/echo >> /tmp/ppp; done
root@cool:/# sort /tmp/ppp | uniq -c
1000
176 With IP address 10.244.0.19.
239 With IP address 10.244.0.20.
189 With IP address 10.244.0.22.
187 With IP address 10.244.0.23.
209 With IP address 10.244.0.24.

@czeslavo
Copy link
Contributor

Reproduction steps:

# Install Kuma.
helm install kuma kuma/kuma

# Create `kong` namespace
kubectl create ns kong

# Enable Mesh in `kong` and `default` namespace.
kubectl label ns kong kuma.io/sidecar-injection=enabled
kubectl label ns default kuma.io/sidecar-injection=enabled

# Install KIC+Kong 3.6 (with OpenResty bumped) using the all-in-one manifests.
curl https://raw.githubusercontent.com/Kong/kubernetes-ingress-controller/v3.0.2/test/e2e/manifests/all-in-one-dbless.yaml | \
  sed -r 's/kong:3.4/kong\/kong-gateway-dev:dc683d70ced4795471a5416773b14e9771ae8a6c/g' | \
  kubectl apply -f -

# Deploy an echo server Deployment with 3 replicas and expose it via Ingress.
echo 'apiVersion: apps/v1
kind: Deployment
metadata:
  name: echo
  labels:
    app: echo
spec:
  replicas: 3
  selector:
    matchLabels:
      app: echo
  template:
    metadata:
      labels:
        app: echo
    spec:
      containers:
        - name: echo
          image: kong/go-echo:0.3.0
          ports:
            - containerPort: 1027
          env:
            - valueFrom:
                fieldRef:
                  fieldPath: status.podIP
              name: POD_IP
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: echo
  name: echo
spec:
  type: ClusterIP
  selector:
    app: echo
  ports:
    - port: 1027
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: echo-ingress
  annotations:
    konghq.com/strip-path: "true"
    konghq.com/methods: "GET"
spec:
  ingressClassName: kong
  rules:
    - http:
        paths:
        - path: /echo
          pathType: Prefix
          backend:
            service:
              name: echo
              port:
                number: 1027' | kubectl apply -f -

# Get Kong proxy IP.
PROXY_IP=$(kubectl get svc --namespace kong kong-proxy -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo "$PROXY_IP"

# Send 100 requests to /echo endpoint and count unique responses (there will be Pod IPs in the response).
for i in `seq 100`; do curl -s "$PROXY_IP"/echo >> /tmp/hit_ips_with_bump; done
cat /tmp/hit_ips_with_bump | sort | uniq -c

# Get output only with 2 Pod IPs while 3 were expected:
# cat /tmp/hit_ips_with_bump | sort | uniq -c
#    100
#     40 With IP address 10.80.1.8.
#     60 With IP address 10.80.2.15.

# Change Kong version to 3.6 (one commit before OpenResty bump)
curl https://raw.githubusercontent.com/Kong/kubernetes-ingress-controller/v3.0.2/test/e2e/manifests/all-in-one-dbless.yaml | \
  sed -r 's/kong:3.4/kong\/kong-gateway-dev:eabdfbcedcd744546ee758275110a8790e038891/g' | \
  kubectl apply -f -

# Send 100 requests to /echo endpoint and count unique responses (there will be Pod IPs in the response).
for i in `seq 100`; do curl -s "$PROXY_IP"/echo >> /tmp/hit_ips_before_bump; done
cat /tmp/hit_ips_before_bump | sort | uniq -c

# Get output with 3 Pod IPs as expected:
# cat /tmp/hit_ips_before_bump | sort | uniq -c
#    100
#     44 With IP address 10.80.0.8.
#     27 With IP address 10.80.1.8.
#     29 With IP address 10.80.2.15.

The issue was narrowed down to Kong OpenResty bump in this commit and is handled by the Gateway team in https://konghq.atlassian.net/browse/KAG-3633.

I think we can keep this issue open on our side and close it once a fixed nightly Gateway build passes our E2E tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants