Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlocks configuring envoy for IAP #903

Closed
jlewi opened this issue Jun 1, 2018 · 2 comments
Closed

Deadlocks configuring envoy for IAP #903

jlewi opened this issue Jun 1, 2018 · 2 comments

Comments

@jlewi
Copy link
Contributor

jlewi commented Jun 1, 2018

I'm noticing deadlocks and other problems configuring envoy using the IAP script. Some of the problems I observe

  1. iap.sh is never able to acquire the lock and therefore able to write the envoy-config.json
  2. envoy container is crash looping - prevents GCP loadbalancer from detecting the backend is heathy

I think we should make the following changes

  1. There should be a single pod responsible for enabling IAP and updating the envoy-config map as needed

    • We should move this out of the sidecar and into a separate deployment
    • Locking should be less important because there won't be contention
  2. The envoy sidecars are now just responsible for updating envoy config based on the config map

    • They no longer need to acquire a lock
    • They can periodically check the config map and compute a hash to know when it changes
  3. We should provide a default config that will allow envoy to startup but ensure non secure traffic is blocked

    • This way we can avoid the problems with the ingress thinking the backend is unhealthy.
jlewi added a commit to jlewi/kubeflow that referenced this issue Jun 5, 2018
* We want a single process to update the backend service and IAP.
  Right now we do this in the envoy side car which is replicated.

* The script gets a lock to prevent multiple services from trying to update
  IAP simultaneously.

* But until the process acquires the lock it can't update the local copy
  of the envoy config to do proper JWT validation.

* So we move updating the backend into a separate deployment that has a single
  replica 1 (although we still use a lock).

* envoy sidecars now do not modify the backend but just get relevant info
  and update the envoy config.

* To support getting the ksonnet app of the bootstrapper add source repository
  IAM role to the admin service account and mount the service account
  into the bootstrapper.
     This will allow use to push the ksonnet app from the bootstrapper to
     a cloud source repository.

Related to kubeflow#903
@jlewi
Copy link
Contributor Author

jlewi commented Jun 5, 2018

/assign @jlewi

k8s-ci-robot pushed a commit that referenced this issue Jun 7, 2018
* Prevent deadlocks trying to setup IAP.

* We want a single process to update the backend service and IAP.
  Right now we do this in the envoy side car which is replicated.

* The script gets a lock to prevent multiple services from trying to update
  IAP simultaneously.

* But until the process acquires the lock it can't update the local copy
  of the envoy config to do proper JWT validation.

* So we move updating the backend into a separate deployment that has a single
  replica 1 (although we still use a lock).

* envoy sidecars now do not modify the backend but just get relevant info
  and update the envoy config.

* To support getting the ksonnet app of the bootstrapper add source repository
  IAM role to the admin service account and mount the service account
  into the bootstrapper.
     This will allow use to push the ksonnet app from the bootstrapper to
     a cloud source repository.

Related to #903

* format jsonnet files.

* Address comments; no need to modify the envoy config.
@jlewi
Copy link
Contributor Author

jlewi commented Jun 11, 2018

Closing this issue because #924 should fix the deadlocks and we have open issues for the other issues.

@jlewi jlewi closed this as completed Jun 11, 2018
saffaalvi pushed a commit to StatCan/kubeflow that referenced this issue Feb 11, 2021
* Prevent deadlocks trying to setup IAP.

* We want a single process to update the backend service and IAP.
  Right now we do this in the envoy side car which is replicated.

* The script gets a lock to prevent multiple services from trying to update
  IAP simultaneously.

* But until the process acquires the lock it can't update the local copy
  of the envoy config to do proper JWT validation.

* So we move updating the backend into a separate deployment that has a single
  replica 1 (although we still use a lock).

* envoy sidecars now do not modify the backend but just get relevant info
  and update the envoy config.

* To support getting the ksonnet app of the bootstrapper add source repository
  IAM role to the admin service account and mount the service account
  into the bootstrapper.
     This will allow use to push the ksonnet app from the bootstrapper to
     a cloud source repository.

Related to kubeflow#903

* format jsonnet files.

* Address comments; no need to modify the envoy config.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant