Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endpoint not coming up when Kfctl apply on GCP with IAP #3628

Closed
johnugeorge opened this issue Jul 10, 2019 · 7 comments · Fixed by kubeflow/manifests#219
Closed

Endpoint not coming up when Kfctl apply on GCP with IAP #3628

johnugeorge opened this issue Jul 10, 2019 · 7 comments · Fixed by kubeflow/manifests#219
Labels

Comments

@johnugeorge
Copy link
Member

I tried KF installation with latest Kfctl(built using make build-kfctl) but I am not able to view the cloud IAP endpoint.

kfctl init kfapp --platform gcp --project ${PROJECT} -V 
kfctl generate all -V
kfctl apply all -V

I see that iap-enabler pod in istio-system namespace is continuously crashing. Logs are given below.

Activated service account credentials for: [kfapp-admin@ai-kubeflow.iam.gserviceaccount.com]
[component_manager]
disable_update_check = true
[core]
account = kfapp-admin@ai-kubeflow.iam.gserviceaccount.com
disable_usage_reporting = true
project = ai-kubeflow
[metrics]
environment = github_docker_image

Your active configuration is: [default]
node port is 31380
fetching backends info with envoy-ingress: {"k8s-be-31380--d5461ad1876ddeba":"HEALTHY","k8s-be-31753--d5461ad1876ddeba":"HEALTHY","k8s-be-32135--d5461ad1876ddeba":"HEALTHY"}
backend name is k8s-be-31380--d5461ad1876ddeba
Waiting for backend id PROJECT=ai-kubeflow NAMESPACE=istio-system SERVICE=istio-ingressgateway filter=name~k8s-be-31380--d5461ad1876ddeba
BACKEND_ID=3213546390509118616
patch JWT audience: /projects/497879924778/global/backendServices/3213546390509118616
policy.authentication.istio.io/ingress-jwt not patched
Clearing lock on service annotation
service/istio-ingressgateway not patched
Wed Jul 10 08:13:50 UTC 2019 WARN: Backend check failed, restarting container.

/cc @lluunn
/cc @kunmingg
/cc @jlewi

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the label kind/bug to this issue, with a confidence of 0.59. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

@lluunn
Copy link
Contributor

lluunn commented Jul 10, 2019

All backends seem healthy from the log.
What's the error message when you visit the endpoint?
Could you describe ingress if your deployment is still there?
thanks

@lluunn
Copy link
Contributor

lluunn commented Jul 10, 2019

I just noticed in one of my running kubeflow cluster (endpoint accessible), the iap-enabler pod is also crash-looping.
So that part might be fine for the endpoint (not saying we should not fix it).

@johnugeorge
Copy link
Member Author

@lluunn were you able to access the kubeflow endpoint ?

@jlewi
Copy link
Contributor

jlewi commented Jul 15, 2019

@johnugeorge did you try following the IAP troubleshooting guide? It might need some updating for ISTIO but it's a good place to start.

@mattnworb
Copy link

I am seeing the same thing, on a GKE cluster created via kfctl built from the v0.6.0-rc.0 tag. I created the cluster with:

~/code/kubeflow/bootstrap/bin/kfctl v060-rc-test --platform gcp --project ${PROJECT}
~/code/kubeflow/bootstrap/bin/kfctl generate all -V --zone ${ZONE}
~/code/kubeflow/bootstrap/bin/kfctl apply all -V

This is in a GCP project where I have previously created a GKE cluster with kfctl (using version 0.5.1) which no longer exists, with IAP / cloud endpoints, in case that makes any difference.

Kubeflow in the cluster and IAP works just fine though - but I am left with a crash-looping iap-enabler pod.

I think I know what the problem is - the istio-ingressgateway NodePort service lists many ports:

  ports:
  - name: status-port
    nodePort: 30777
    port: 15020
    protocol: TCP
    targetPort: 15020
  - name: http2
    nodePort: 31380
    port: 80
    protocol: TCP
    targetPort: 80
  - name: https
    nodePort: 31390
    port: 443
    protocol: TCP
    targetPort: 443
  - name: tcp
    nodePort: 31400
    port: 31400
    protocol: TCP
    targetPort: 31400
  - name: https-kiali
    nodePort: 32007
    port: 15029
    protocol: TCP
    targetPort: 15029
  - name: https-prometheus
    nodePort: 30355
    port: 15030
    protocol: TCP
    targetPort: 15030
  - name: https-grafana
    nodePort: 31764
    port: 15031
    protocol: TCP
    targetPort: 15031
  - name: https-tracing
    nodePort: 30439
    port: 15032
    protocol: TCP
    targetPort: 15032
  - name: tls
    nodePort: 32472
    port: 15443
    protocol: TCP
    targetPort: 15443

The setup_backend.sh script that the iap-enabled pod is running tries to list a backend-service with gcloud that the nodePort value from the first port in the list - for me that value is 30777:

# If node port or backend id change, so does the JWT audience.
CURR_NODE_PORT=$(kubectl --namespace=${NAMESPACE} get svc ${SERVICE} -o jsonpath='{.spec.ports[0].nodePort}')
read -ra toks <<<"$(gcloud compute --project=${PROJECT} backend-services list --filter=name~k8s-be-${CURR_NODE_PORT}- --format='value(id,timeoutSec)')"

$ kubectl --namespace=istio-system get svc istio-ingressgateway -o jsonpath='{.spec.ports[0].nodePort}'
30777

and I don't have a backend service whose name contains that port:

$ gcloud compute --project=${PROJECT} backend-services list
NAME                            BACKENDS                                            PROTOCOL
k8s-be-31380--ae2281ecbdbcadbf  us-east4-b/instanceGroups/k8s-ig--ae2281ecbdbcadbf  HTTP
k8s-be-32092--ae2281ecbdbcadbf  us-east4-b/instanceGroups/k8s-ig--ae2281ecbdbcadbf  HTTP

k8s-be-32092... points to kube-systems default-http-backend (which has nodePort 32092), and k8s-be-31380... points to istio-ingressgateway. The script is selecting .spec.ports[0] which ends up being a port named status-port.

@mattnworb
Copy link

I think a jsonpath in the kubectl command of -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}' would select the intended port.

mattnworb added a commit to mattnworb/kubeflow that referenced this issue Jul 18, 2019
The script selects the nodePort based on [the name of the port earlier in the script](https://github.com/kubeflow/kubeflow/blob/e6944f35149f7c362d32fb9d57fc9f3842a46347/kubeflow/gcp/setup_backend.sh#L26), but the query in the `checkBackend` function assumes that the port number is the first in the Service.

I believe this fixes kubeflow#3628
mattnworb added a commit to mattnworb/manifests that referenced this issue Jul 18, 2019
This should fix kubeflow/kubeflow#3628. See also kubeflow/kubeflow#3691 (which modified the wrong file).
mattnworb added a commit to mattnworb/manifests that referenced this issue Jul 19, 2019
This should fix kubeflow/kubeflow#3628. See also kubeflow/kubeflow#3691 (which modified the wrong file).
k8s-ci-robot pushed a commit to kubeflow/manifests that referenced this issue Jul 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants