+++ title = "Monitor IAP Setup" description = "Instructions for monitoring and troubleshooting IAP" weight = 5 +++

Using identity aware proxy (IAP) is the recommended solution for accessing your Kubeflow deployment from outside the cluster.

This is a step to step guide to ensuring your IAP secured endpoint comes up and debugging problems when it doesn't.

While it requires some effort, the end result is well worth it

Users can easily login in using their GCP accounts
You rely on Google's security expertise to protect your sensitive workloads

The first step is to ensure the ingress and GCB loadbalancer is created

kubectl -n kubeflow describe ingress

Name:             envoy-ingress
Namespace:        kubeflow
Address:          35.244.132.160
Default backend:  default-http-backend:80 (10.20.0.10:8080)
Events:
   Type     Reason     Age                 From                     Message
   ----     ------     ----                ----                     -------
   Normal   ADD        12m                 loadbalancer-controller  kubeflow/envoy-ingress
   Warning  Translate  12m (x10 over 12m)  loadbalancer-controller  error while evaluating the ingress spec: could not find service "kubeflow/envoy"
   Warning  Translate  12m (x2 over 12m)   loadbalancer-controller  error while evaluating the ingress spec: error getting BackendConfig for port "8080" on service "kubeflow/envoy", err: no BackendConfig for service port exists.
   Warning  Sync       12m                 loadbalancer-controller  Error during sync: Error running backend syncing routine: received errors when updating backend service: googleapi: Error 400: The resource 'projects/code-search-demo/global/backendServices/k8s-be-32230--bee2fc38fcd6383f' is not ready, resourceNotReady
 googleapi: Error 400: The resource 'projects/code-search-demo/global/backendServices/k8s-be-32230--bee2fc38fcd6383f' is not ready, resourceNotReady
   Normal  CREATE  11m  loadbalancer-controller  ip: 35.244.132.160
...

If the address isn't set then there was a problem creating the loadbalancer
- The CREATE event indicates the loadbalancer was successfully created on the specified ip address
If there are any problems creating the loadbalancer they will be reported as Kubernetes events that show up when you run describe
The most common error is running out of GCP quota
If you run out of GCP quota you will either need to increase the quota on your project for that resource or else delete some existing resources.

Verify that a signed SSL certificate could be generated using Let's Encrypt

kubectl -n kubeflow get certificate envoy-ingress-tls  -o yaml

apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
  annotations:
    ksonnet.io/managed: '{"pristine":"H4sIAAAAAAAA/6yRsW7zMAyE9/8xONv+463w2qlLhg5dggyMRDuCJVIQ6RSB4XcvlDQdCnRqN0EHfjzerYA5vFHRIAwDOCqWkHGi0s1P2gX5f+kx5jP20MAc2MMAz1QsjMGhETSQyNCjIQwrRDxR1PqaVZjJKsBJysLEBgMEzG3gqZAqbA0wJoIBiC9yffy3FhXukmZ0VZ+XE41R3uuIZnJ1Abo6uoITHsMEw2EFLwkDKwwHmMf2klCNSsu7viP2WQKbdg9U60LrKUe5JmLrXJTFd5PIBMcGzmZ511f6w+s3j7Btx60BJykJ7+9H/GJlA561Yv7Ae1BdqLzSeGvhs7C4VNzLTYKv2COZErtyzdbmIv4WL7lCtv+pl2379wEAAP//AQAA///uHVhQMgIAAA=="}'
    kubecfg.ksonnet.io/garbage-collect-tag: gc-tag
  creationTimestamp: 2019-04-02T22:49:43Z
  generation: 1
  labels:
    app.kubernetes.io/deploy-manager: ksonnet
    ksonnet.io/component: iap-ingress
  name: envoy-ingress-tls
  namespace: kubeflow
  resourceVersion: "4803"
  selfLink: /apis/certmanager.k8s.io/v1alpha1/namespaces/kubeflow/certificates/envoy-ingress-tls
  uid: 9b137b29-5599-11e9-a223-42010a8e020c
spec:
  acme:
    config:
    - domains:
      - mykubeflow.endpoints.myproject.cloud.goog
      http01:
        ingress: envoy-ingress
  commonName: kf-vmaster-n01.endpoints.kubeflow-ci-deployment.cloud.goog
  dnsNames:
  - mykubeflow.endpoints.myproject.cloud.goog
  issuerRef:
    kind: ClusterIssuer
    name: letsencrypt-prod
  secretName: envoy-ingress-tls
status:
  acme:
    order:
      url: https://acme-v02.api.letsencrypt.org/acme/order/54483154/382580193
  conditions:
  - lastTransitionTime: 2019-04-02T23:00:28Z
    message: Certificate issued successfully
    reason: CertIssued
    status: "True"
    type: Ready
  - lastTransitionTime: null
    message: Order validated
    reason: OrderValidated
    status: "False"
    type: ValidateFailed

The most recent condition should be Certificate issued successfully
It can take around 10 minutes to provision a certificate after the GCP loadbalancer is created
The most common error is hitting Let's Encrypt quota issues
- Let's Encrypt enforces a quota of 5 duplicate certificates per week
- The easiest fix to quota issues is to pick a different hostname by recreating and redeploying Kubeflow with a different name
- For example if you ran
```
kfctl init myapp --project=myproject --platform=gcp
```
- Rerun kfctl with a different name that you had not previously used
```
kfctl init myapp-unique --project=myproject --platform=gcp
```

Wait for the load balancer to report the backends as healthy

NODE_PORT=$(kubectl --namespace=${NAMESPACE} get svc envoy -o jsonpath='{.spec.ports[0].nodePort}')
BACKEND_NAME=$(gcloud compute --project=${PROJECT} backend-services list --filter=name~k8s-be-${NODE_PORT}- --format='value(name)')
gcloud compute --project=${PROJECT} backend-services get-health --global ${BACKEND_NAME}

https://www.googleapis.com/compute/v1/projects/kubeflow-ci-deployment/zones/us-east1-b/instanceGroups/k8s-ig--686aad7559e1cf0e
status:
   healthStatus:
   - healthState: HEALTHY
     instance: https://www.googleapis.com/compute/v1/projects/kubeflow-ci-deployment/zones/us-east1-b/instances/gke-kf-vmaster-n01-kf-vmaster-n01-cpu-66360615-xjrc
     ipAddress: 10.142.0.8
     port: 32694
   - healthState: HEALTHY
     instance: https://www.googleapis.com/compute/v1/projects/kubeflow-ci-deployment/zones/us-east1-b/instances/gke-kf-vmaster-n01-kf-vmaster-n01-cpu-66360615-gmmx
     ipAddress: 10.142.0.13
     port: 32694
   kind: compute#backendServiceGroupHealth

Both backends should be reported as healthy
It can take several minutes for the load balancer to consider the backend healthy
The service with port ${NODE_PORT} is the one we care about most since that is the one handling Kubeflow traffic

If the backend is unhealthy check the status of the envoy podss

kubectl -n kubeflow get pods -l service=envoy
NAME                     READY     STATUS    RESTARTS   AGE
envoy-69bf97959c-29dnw   2/2       Running   2          1d
envoy-69bf97959c-5w5rl   2/2       Running   3          1d
envoy-69bf97959c-9cjtg   2/2       Running   3          1d

The backends should have status Running
A small number of restarts is expected since the envoy containers need to be restarted as part of their configuration process

If the pods are crash looping look at the logs to try to figure out why
```
kubectl -n kubeflow logs ${POD}
```

Now that the certificate exists the ingress should report that it is serving on https as well

```
kubectl -n kubeflow get ingress
NAME            HOSTS                                                        ADDRESS          PORTS     AGE
envoy-ingress   mykubeflow.endpoints.myproject.cloud.goog   35.244.132.159   80, 443   1d
```

If you don't see 443 look at the ingress events using kubectl describe to see if there are any errors

Try accessing IAP at the full qualified domain name in your web browser
```
https://${FQDN}     
```
- If you get SSL errors this typically means your SSL certificate is still propagating wait a bit and try again
  - SSL propagation could take up to 10 minutes
- If you are not asked to login and you get a 404 error that means IAP is still being configured
  - Keep retrying for up to 10 minutes
After logging in if you get an error Error: redirect_uri_mismatch this means the OAuth authorized redirect URIs does not include your domain
- The full error message will look like the following and include the relevant links
```
The redirect URI in the request, https://mykubeflow.endpoints.myproject.cloud.goog/_gcp_gatekeeper/authenticate, does not match the ones authorized for the OAuth client. 
To update the authorized redirect URIs, visit: https://console.developers.google.com/apis/credentials/oauthclient/22222222222-7meeee7a9a76jvg54j0g2lv8lrsb4l8g.apps.googleusercontent.com?project=22222222222
```
- Follow the link in the error message to navigate to the OAuth credential being used and add the redirect URI listed in the error message to the list of authorized URIs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

monitor-iap-setup.md

monitor-iap-setup.md

Files

monitor-iap-setup.md

Latest commit

History

monitor-iap-setup.md

File metadata and controls