Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide option for gloo to work with rejected Upstreams #3797

Closed
lgadban opened this issue Oct 28, 2020 · 6 comments
Closed

Provide option for gloo to work with rejected Upstreams #3797

lgadban opened this issue Oct 28, 2020 · 6 comments
Labels
Type: Enhancement New feature or request

Comments

@lgadban
Copy link
Contributor

lgadban commented Oct 28, 2020

Is your feature request related to a problem? Please describe.
When operating a multi-tenant environment with many teams sharing the same gloo control plane, if one team provides an incorrect Upstream it can prevent any other tenant from updating their configuration.

Describe the solution you'd like
An option to treat incorrectly configured Upstreams as a warning and enable invalid route replacement for routes referencing that Upstream

Possibly a statistic for Upstreams in a rejected state

Describe alternatives you've considered
N/A

Additional context
Related: #3761

@lgadban lgadban added the Type: Enhancement New feature or request label Oct 28, 2020
@kdorosh kdorosh self-assigned this Nov 12, 2020
@lgadban
Copy link
Contributor Author

lgadban commented Nov 13, 2020

The primary scenario for this is when a k8s type Upstream references an invalid k8s Service, e.g. one that doesn't exist.

I am, at this point, unable to reproduce the problem of Gloo not being able to receive updates.

In the following scenario:

  • Route replacement correctly configured
  • An Upstream is created to reference a valid k8s Service
  • A VirtualService is created that references that Upstream
    • At this point, everything should work fine
  • The Upstream is modified to point to a non-existent Service
  • The VirtualService still shows healthy
    • Probably should be reevaluated if this is correct
  • The Upstream shows unhealthy
  • Route replacement does not work for this route, and envoy will still work with the "old" destination of the Upstream
    • Initial investigation shows that the Upstream resource that contains the bad Service reference does not actually make it through processing, which seems to results in the following bullet item:
    • The envoy cluster that corresponds to the (now rejected Upstream) still exists

Now that it is in the "broken" state, the assumption behind this issue is that other unrelated changes would not be propagated to Envoy, however...

  • Modify to VirtualService to contain a new route to a healthy upstream
  • The changes are correctly applied to envoy, i.e. I can call the newly added route successfully

@asadoughi
Copy link

The primary scenario for this is when a k8s type Upstream references an invalid k8s Service, e.g. one that doesn't exist.

I am, at this point, unable to reproduce the problem of Gloo not being able to receive updates.

If it helps, the scenario I encountered this issue included: deleting the k8s service behind a valid upstream/virtualservice (and not deleting the upstream and virtualservice), then creating a new k8s service (unrelated to the previous deleted k8s service), and creating a new upstream/virtualservice for that new k8s service. I am unable to reach the new k8s service because of the orphaned upstream/virtualservice.

@kdorosh
Copy link
Contributor

kdorosh commented Nov 16, 2020

Hi @asadoughi . I was unable to reproduce using the following steps:

  • glooctl install gateway --version 1.5.3
  • kubectl patch settings -n gloo-system default --patch '{"spec": {"gloo": {"invalidConfigPolicy": {"replaceInvalidRoutes": true, "invalidRouteResponseCode": 404, "invalidRouteResponseBody": "Gloo Gateway has invalid configuration. Administrators should run glooctl check to find and fix config errors."}}}}' --type=merge
  • kubectl apply -f https://raw.githubusercontent.com/solo-io/gloo/v1.2.9/example/petstore/petstore.yaml
  • Apply the following vs:
apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
  name: default
  namespace: gloo-system
spec:
  virtualHost:
    domains:
    - 'foo.example.com'
    routes:
    - matchers:
      - exact: /all-pets
      options:
        prefixRewrite: /api/pets
      routeAction:
        single:
          upstream:
            name: default-petstore-8080
            namespace: gloo-system
  • Reachable after k port-forward -n gloo-system deploy/gateway-proxy 8080 then curl -H "Host: foo.example.com" localhost:8080/all-pets.
  • Delete the original k8s service, krm svc petstore
  • Add the new k8s service:
apiVersion: v1
kind: Service
metadata:
  name: petstore2
  namespace: default
  labels:
    service: petstore
spec:
  ports:
  - port: 8080
    protocol: TCP
  selector:
    app: petstore

and the new vs:

apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
  name: default2
  namespace: gloo-system
spec:
  virtualHost:
    domains:
    - 'bar.example.com'
    routes:
    - matchers:
      - exact: /all-pets
      options:
        prefixRewrite: /api/pets
      routeAction:
        single:
          upstream:
            name: default-petstore2-8080
            namespace: gloo-system
  • Reachable after k port-forward -n gloo-system deploy/gateway-proxy 8080 then curl -H "Host: bar.example.com" localhost:8080/all-pets.

The original route now returns a 404 initially and then gets the invalid route config, but the new route is reachable.

@asadoughi
Copy link

@kdorosh Try again with gloo.discovery.enabled = false. In other words, manually create your upstreams.

@lgadban
Copy link
Contributor Author

lgadban commented Nov 17, 2020

@asadoughi I am unable to reproduce as well even when using manually created upstreams.

In your testing, have you ensured that a correct invalidConfigPolicy is specified, as outlined in @kdorosh 's steps above?
I.e. your Settings should have a config similar to:

  gloo:
    ...
    invalidConfigPolicy:
      invalidRouteResponseBody: Gloo Gateway has invalid configuration
      invalidRouteResponseCode: 404
      replaceInvalidRoutes: true

@kdorosh kdorosh removed their assignment Nov 19, 2020
@lgadban
Copy link
Contributor Author

lgadban commented Nov 19, 2020

Per slack conversations, it seems that as long as route replacement has been correctly configured (along with #3818 being fixed), we have been unable to reproduce the initially described issue.

We will close this issue for now but happy to take another look if we get a concrete reproducer.

@lgadban lgadban closed this as completed Nov 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants