Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GlooEE helm upgrade fails when failurePolicy=Fail and extauth/ratelimit upstreams are used #6524

Closed
jenshu opened this issue Jun 2, 2022 · 2 comments
Assignees
Labels
Type: Bug Something isn't working zendesk

Comments

@jenshu
Copy link
Contributor

jenshu commented Jun 2, 2022

Gloo Edge Version

1.11.x (latest stable)

Kubernetes Version

No response

Describe the bug

When the validation webhook failurePolicy is set to Fail, we treat the CRs that come with the gloo helm chart (Gateways for OSS, extauth/ratelimit upstreams for EE) as helm hook resources so that we can ensure they get installed in a specific order (i.e. after the validation service is ready).

This can cause issues on upgrades if the extauth/ratelimit upstreams are being used, because helm will delete and recreate the hook resources during the upgrade. When helm tries to delete the upstreams, the validation service gets called, which checks if the upstream deletion would cause any translation errors. If a Proxy is referencing the upstreams, a translation error will occur and the validation service will reject the deletion of the upstreams, causing the helm upgrade to fail.

Steps to reproduce the bug

Install GlooEE with custom htttp gateway config that references the upstreams:

helm install -n gloo-system gloo-ee gloo-ee/gloo-ee --create-namespace \
  --set-string license_key=$GLOO_LICENSE_KEY \
  --version v1.11.2 --debug -f - <<EOF
gloo:
  gatewayProxies:
    gatewayProxy:
      gatewaySettings:
        customHttpGateway:
          options:
            extauth:
              extauthzServerRef:
                name: extauth
                namespace: gloo-system
            ratelimitServer:
              ratelimitServerRef:
                name: rate-limit
                namespace: gloo-system
EOF

Install petstore service

k apply -f https://raw.githubusercontent.com/solo-io/gloo/v1.2.9/example/petstore/petstore.yaml

and a VS that routes to it:

k apply -f - <<EOF
apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
  name: vs1
  namespace: default
spec:
  virtualHost:
    domains:
    - '*'
    routes:
    - matchers:
      - prefix: /
      routeAction:
        single:
          upstream:
            name: default-petstore-8080
            namespace: gloo-system
EOF

Upgrade GlooEE, with failurePolicy=Fail:

helm upgrade -n gloo-system gloo-ee gloo-ee/gloo-ee --set-string license_key=$GLOO_LICENSE_KEY \
  --version v1.11.16 --debug -f - <<EOF
gloo:
  gateway:
    validation:
      failurePolicy: Fail
  gatewayProxies:
    gatewayProxy:
      gatewaySettings:
        customHttpGateway:
          options:
            extauth:
              extauthzServerRef:
                name: extauth
                namespace: gloo-system
            ratelimitServer:
              ratelimitServerRef:
                name: rate-limit
                namespace: gloo-system
EOF

Errors like this will be seen:

client.go:252: [debug] Deleting "extauth" in gloo-system...
client.go:267: [debug] Failed to delete "upstreams/extauth", err: admission webhook "gateway.gloo-system.svc" denied the request: resource incompatible with current Gloo snapshot: [failed to validate Proxy with Gloo validation server: HttpListener Error: ProcessingError. Reason: extauth server upstream not found name:"extauth" namespace:"gloo-system"]
client.go:252: [debug] Deleting "rate-limit" in gloo-system...
client.go:267: [debug] Failed to delete "upstreams/rate-limit", err: admission webhook "gateway.gloo-system.svc" denied the request: resource incompatible with current Gloo snapshot: [failed to validate Proxy with Gloo validation server: HttpListener Error: ProcessingError. Reason: ratelimit server upstream not found name:"rate-limit" namespace:"gloo-system"; HttpListener Error: ProcessingError. Reason: ratelimit server upstream not found name:"rate-limit" namespace:"gloo-system"]

Expected Behavior

upgrade should succeed

Additional Context

No response

@jenshu jenshu added the Type: Bug Something isn't working label Jun 2, 2022
@jenshu jenshu self-assigned this Jun 2, 2022
@soloio-bot
Copy link

Zendesk ticket #753 has been linked to this issue.

@jenshu
Copy link
Contributor Author

jenshu commented Jul 7, 2022

fix will be available in GlooEE v1.12.0-beta12 and v1.11.27

@jenshu jenshu closed this as completed Jul 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Something isn't working zendesk
Projects
None yet
Development

No branches or pull requests

2 participants