Skip to content

Commit

Permalink
Prevent deadlocks trying to setup IAP. (#924)
Browse files Browse the repository at this point in the history
* Prevent deadlocks trying to setup IAP.

* We want a single process to update the backend service and IAP.
  Right now we do this in the envoy side car which is replicated.

* The script gets a lock to prevent multiple services from trying to update
  IAP simultaneously.

* But until the process acquires the lock it can't update the local copy
  of the envoy config to do proper JWT validation.

* So we move updating the backend into a separate deployment that has a single
  replica 1 (although we still use a lock).

* envoy sidecars now do not modify the backend but just get relevant info
  and update the envoy config.

* To support getting the ksonnet app of the bootstrapper add source repository
  IAM role to the admin service account and mount the service account
  into the bootstrapper.
     This will allow use to push the ksonnet app from the bootstrapper to
     a cloud source repository.

Related to #903

* format jsonnet files.

* Address comments; no need to modify the envoy config.
  • Loading branch information
jlewi authored and k8s-ci-robot committed Jun 7, 2018
1 parent 476e150 commit e88ef99
Show file tree
Hide file tree
Showing 5 changed files with 317 additions and 127 deletions.
16 changes: 15 additions & 1 deletion docs/gke/configs/cluster.jinja
Original file line number Diff line number Diff line change
Expand Up @@ -300,6 +300,11 @@ TODO(jlewi): Do we need to serialize API activation
members:
- {{ 'serviceAccount:' + env['project_number'] + '@cloudservices.gserviceaccount.com' }}

{# Grant permissions needed to push the app to a cloud repository. #}
- role: roles/source.admin
members:
- {{ 'serviceAccount:' + env['project_number'] + '@cloudservices.gserviceaccount.com' }}

{# servicemanagement.admin is needed by CloudEndpoints controller
so we can create a service to get a hostname.
#}
Expand Down Expand Up @@ -495,19 +500,28 @@ the corresponding type provider.
- --apply=true
- --namespace=kubeflow
- --config=/etc/kubeflow/config.yaml
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /var/run/secrets/sa/admin-gcp-sa.json
volumeMounts:
- name: kubeflow-ksonnet-pvc
mountPath: /opt/bootstrap
- name: kubeflow-bootstrapper
mountPath: /etc/kubeflow
- name: kubeflow-admin-sa
readOnly: true
mountPath: /var/run/secrets/sa
volumes:
- name: kubeflow-ksonnet-pvc
persistentVolumeClaim:
claimName: kubeflow-ksonnet-pvc
- name: kubeflow-bootstrapper
configMap:
name: kubeflow-bootstrapper

- name: kubeflow-admin-sa
secret:
secretName: admin-gcp-sa

metadata:
dependsOn:
- admin-namespace
Expand Down
2 changes: 2 additions & 0 deletions docs/gke/create_k8s_secrets.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ export SA_EMAIL=${DEPLOYMENT_NAME}-admin@${PROJECT}.iam.gserviceaccount.com
# TODO(jlewi): We should name the secrets more consistently based on the service account name.
# We will need to update the component configs though
gcloud --project=${PROJECT} iam service-accounts keys create ${SA_EMAIL}.json --iam-account ${SA_EMAIL}

kubectl create secret generic --namespace=kubeflow-admin admin-gcp-sa --from-file=admin-gcp-sa.json=./${SA_EMAIL}.json
kubectl create secret generic --namespace=kubeflow admin-gcp-sa --from-file=admin-gcp-sa.json=./${SA_EMAIL}.json

export USER_EMAIL=${DEPLOYMENT_NAME}-user@${PROJECT}.iam.gserviceaccount.com
Expand Down
74 changes: 74 additions & 0 deletions kubeflow/core/configure_envoy_for_iap.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
#!/bin/bash
#
# A script to modify envoy config to perform JWT validation
# given the information for the service.
# Script executed by the iap container to configure IAP. When finished, the envoy config is created with the JWT audience.

[ -z ${CLIENT_ID} ] && echo Error CLIENT_ID must be set && exit 1
[ -z ${CLIENT_SECRET} ] && echo Error CLIENT_SECRET must be set && exit 1
[ -z ${NAMESPACE} ] && echo Error NAMESPACE must be set && exit 1
[ -z ${SERVICE} ] && echo Error SERVICE must be set && exit 1

apk add --update jq
curl https://storage.googleapis.com/kubernetes-release/release/v1.9.4/bin/linux/amd64/kubectl > /usr/local/bin/kubectl && chmod +x /usr/local/bin/kubectl


PROJECT=$(curl -s -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/project/project-id)
if [ -z ${PROJECT} ]; then
echo Error unable to fetch PROJECT from compute metadata
exit 1
fi

PROJECT_NUM=$(curl -s -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/project/numeric-project-id)
if [ -z ${PROJECT_NUM} ]; then
echo Error unable to fetch PROJECT_NUM from compute metadata
exit 1
fi

# Activate the service account
gcloud auth activate-service-account --key-file=${GOOGLE_APPLICATION_CREDENTIALS}
# Print out the config for debugging
gcloud config list

NODE_PORT=$(kubectl --namespace=${NAMESPACE} get svc ${SERVICE} -o jsonpath='{.spec.ports[0].nodePort}')
while [[ -z ${BACKEND_ID} ]];
do BACKEND_ID=$(gcloud compute --project=${PROJECT} backend-services list --filter=name~k8s-be-${NODE_PORT}- --format='value(id)');
echo "Waiting for backend id PROJECT=${PROJECT} NAMESPACE=${NAMESPACE} SERVICE=${SERVICE}...";
sleep 2;
done
echo BACKEND_ID=${BACKEND_ID}

NODE_PORT=$(kubectl --namespace=${NAMESPACE} get svc ${SERVICE} -o jsonpath='{.spec.ports[0].nodePort}')
BACKEND_SERVICE=$(gcloud --project=${PROJECT} compute backend-services list --filter=name~k8s-be-${NODE_PORT}- --uri)

JWT_AUDIENCE="/projects/${PROJECT_NUM}/global/backendServices/${BACKEND_ID}"

# For healthcheck compare.
echo "JWT_AUDIENCE=${JWT_AUDIENCE}" > /var/shared/healthz.env
echo "NODE_PORT=${NODE_PORT}" >> /var/shared/healthz.env
echo "BACKEND_ID=${BACKEND_ID}" >> /var/shared/healthz.env

kubectl get configmap -n ${NAMESPACE} envoy-config -o jsonpath='{.data.envoy-config\.json}' | \
sed -e "s|{{JWT_AUDIENCE}}|${JWT_AUDIENCE}|g" > /var/shared/envoy-config.json

echo "Restarting envoy"
curl -s ${ENVOY_ADMIN}/quitquitquit

function checkIAP() {
# created by init container.
. /var/shared/healthz.env

# If node port or backend id change, so does the JWT audience.
CURR_NODE_PORT=$(kubectl --namespace=${NAMESPACE} get svc ${SERVICE} -o jsonpath='{.spec.ports[0].nodePort}')
CURR_BACKEND_ID=$(gcloud compute --project=${PROJECT} backend-services list --filter=name~k8s-be-${CURR_NODE_PORT}- --format='value(id)')
[ "$BACKEND_ID" == "$CURR_BACKEND_ID" ]
}

# Verify IAP every 10 seconds.
while true; do
if ! checkIAP; then
echo "$(date) WARN: IAP check failed, restarting container."
exit 1
fi
sleep 10
done
227 changes: 101 additions & 126 deletions kubeflow/core/iap.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,11 @@
$.parts(namespace).service,
$.parts(namespace).ingress(secretName, ipName, hostname),
$.parts(namespace).certificate(secretName, hostname, issuer),
$.parts(namespace).initServiceAcount,
$.parts(namespace).initServiceAccount,
$.parts(namespace).initClusterRoleBinding,
$.parts(namespace).initClusterRole,
$.parts(namespace).deploy(envoyImage, oauthSecretName),
$.parts(namespace).iapEnabler(oauthSecretName),
$.parts(namespace).configMap(disableJwt),
$.parts(namespace).whoamiService,
$.parts(namespace).whoamiApp,
Expand Down Expand Up @@ -58,7 +59,7 @@
},
}, // service

initServiceAcount:: {
initServiceAccount:: {
apiVersion: "v1",
kind: "ServiceAccount",
metadata: {
Expand Down Expand Up @@ -190,7 +191,7 @@
image: "google/cloud-sdk:alpine",
command: [
"sh",
"/var/envoy-config/iap-init.sh",
"/var/envoy-config/configure_envoy_for_iap.sh",
],
env: [
{
Expand Down Expand Up @@ -271,6 +272,101 @@
},
}, // deploy

// Run the process to enable iap
iapEnabler(oauthSecretName):: {
apiVersion: "extensions/v1beta1",
kind: "Deployment",
metadata: {
name: "iap-enabler",
namespace: namespace,
},
spec: {
replicas: 1,
template: {
metadata: {
labels: {
service: "iap-enabler",
},
},
spec: {
serviceAccountName: "envoy",
containers: [
{
name: "iap",
image: "google/cloud-sdk:alpine",
command: [
"sh",
"/var/envoy-config/setup_iap.sh",
],
env: [
{
name: "NAMESPACE",
value: namespace,
},
{
name: "CLIENT_ID",
valueFrom: {
secretKeyRef: {
name: oauthSecretName,
key: "CLIENT_ID",
},
},
},
{
name: "CLIENT_SECRET",
valueFrom: {
secretKeyRef: {
name: oauthSecretName,
key: "CLIENT_SECRET",
},
},
},
{
name: "SERVICE",
value: "envoy",
},
{
name: "ENVOY_ADMIN",
value: "http://localhost:" + envoyAdminPort,
},
{
name: "GOOGLE_APPLICATION_CREDENTIALS",
value: "/var/run/secrets/sa/admin-gcp-sa.json",
},
],
volumeMounts: [
{
mountPath: "/var/envoy-config/",
name: "config-volume",
},
{
name: "sa-key",
readOnly: true,
mountPath: "/var/run/secrets/sa",
},
],
},
],
restartPolicy: "Always",
volumes: [
{
configMap: {
name: "envoy-config",
},
name: "config-volume",
},
{
name: "sa-key",
secret: {
secretName: "admin-gcp-sa",
},
},
],
},
},
},
}, // iapEnabler

configMap(disableJwt):: {
apiVersion: "v1",
kind: "ConfigMap",
Expand All @@ -280,129 +376,8 @@
},
data: {
"envoy-config.json": std.manifestJson($.parts(namespace).envoyConfig(disableJwt)),
// Script executed by the iap container to configure IAP. When finished, the envoy config is created with the JWT audience.
"iap-init.sh": |||
[ -z ${CLIENT_ID} ] && echo Error CLIENT_ID must be set && exit 1
[ -z ${CLIENT_SECRET} ] && echo Error CLIENT_SECRET must be set && exit 1
[ -z ${NAMESPACE} ] && echo Error NAMESPACE must be set && exit 1
[ -z ${SERVICE} ] && echo Error SERVICE must be set && exit 1
apk add --update jq
curl https://storage.googleapis.com/kubernetes-release/release/v1.9.4/bin/linux/amd64/kubectl > /usr/local/bin/kubectl && chmod +x /usr/local/bin/kubectl
# Stagger init of replicas when acquiring lock
sleep $(( $RANDOM % 5 + 1 ))
kubectl get svc ${SERVICE} -o json > service.json
LOCK=$(jq -r ".metadata.annotations.iaplock" service.json)
NOW=$(date -u +'%s')
if [[ -z "${LOCK}" || "${LOCK}" == "null" ]]; then
LOCK_T=$NOW
else
LOCK_T=$(echo "${LOCK}" | cut -d' ' -f2)
fi
LOCK_AGE=$(( $NOW - $LOCK_T ))
LOCK_TTL=120
if [[ -z "${LOCK}" || "${LOCK}" == "null" || "${LOCK_AGE}" -gt "${LOCK_TTL}" ]]; then
jq -r ".metadata.annotations.iaplock=\"$(hostname -s) ${NOW}\"" service.json > service_lock.json
kubectl apply -f service_lock.json 2>/dev/null
if [[ $? -eq 0 ]]; then
echo "Acquired lock on service annotation to update IAP."
else
echo "WARN: Failed to acquire lock on service annotation."
exit 1
fi
else
echo "WARN: Lock on service annotation already acquired by: $LOCK, age: $LOCK_AGE, TTL: $LOCK_TTL"
sleep 20
exit 1
fi
PROJECT=$(curl -s -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/project/project-id)
if [ -z ${PROJECT} ]; then
echo Error unable to fetch PROJECT from compute metadata
exit 1
fi
PROJECT_NUM=$(curl -s -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/project/numeric-project-id)
if [ -z ${PROJECT_NUM} ]; then
echo Error unable to fetch PROJECT_NUM from compute metadata
exit 1
fi
# Activate the service account
gcloud auth activate-service-account --key-file=${GOOGLE_APPLICATION_CREDENTIALS}
# Print out the config for debugging
gcloud config list
NODE_PORT=$(kubectl --namespace=${NAMESPACE} get svc ${SERVICE} -o jsonpath='{.spec.ports[0].nodePort}')
while [[ -z ${BACKEND_ID} ]];
do BACKEND_ID=$(gcloud compute --project=${PROJECT} backend-services list --filter=name~k8s-be-${NODE_PORT}- --format='value(id)');
echo "Waiting for backend id PROJECT=${PROJECT} NAMESPACE=${NAMESPACE} SERVICE=${SERVICE}...";
sleep 2;
done
echo BACKEND_ID=${BACKEND_ID}
NODE_PORT=$(kubectl --namespace=${NAMESPACE} get svc ${SERVICE} -o jsonpath='{.spec.ports[0].nodePort}')
BACKEND_SERVICE=$(gcloud --project=${PROJECT} compute backend-services list --filter=name~k8s-be-${NODE_PORT}- --uri)
# Enable IAP on the backend service:
gcloud --project=${PROJECT} compute backend-services update ${BACKEND_SERVICE} \
--global \
--iap=enabled,oauth2-client-id=${CLIENT_ID},oauth2-client-secret=${CLIENT_SECRET}
while [[ -z ${HEALTH_CHECK_URI} ]];
do HEALTH_CHECK_URI=$(gcloud compute --project=${PROJECT} health-checks list --filter=name~k8s-be-${NODE_PORT}- --uri);
echo "Waiting for the healthcheck resource PROJECT=${PROJECT} NODEPORT=${NODE_PORT} SERVICE=${SERVICE}...";
sleep 2;
done
# Since we create the envoy-ingress ingress object before creating the envoy
# deployment object, healthcheck will not be configured correctly in the GCP
# load balancer. It will default the healthcheck request path to a value of
# / instead of the intended /healthz.
# Manually update the healthcheck request path to /healthz
gcloud --project=${PROJECT} compute health-checks update http ${HEALTH_CHECK_URI} --request-path=/healthz
# Since JupyterHub uses websockets we want to increase the backend timeout
echo Increasing backend timeout for JupyterHub
gcloud --project=${PROJECT} compute backend-services update --global ${BACKEND_SERVICE} --timeout=3600
JWT_AUDIENCE="/projects/${PROJECT_NUM}/global/backendServices/${BACKEND_ID}"
# For healthcheck compare.
echo "JWT_AUDIENCE=${JWT_AUDIENCE}" > /var/shared/healthz.env
echo "NODE_PORT=${NODE_PORT}" >> /var/shared/healthz.env
echo "BACKEND_ID=${BACKEND_ID}" >> /var/shared/healthz.env
kubectl get configmap -n ${NAMESPACE} envoy-config -o jsonpath='{.data.envoy-config\.json}' | \
sed -e "s|{{JWT_AUDIENCE}}|${JWT_AUDIENCE}|g" > /var/shared/envoy-config.json
echo "Restarting envoy"
curl -s ${ENVOY_ADMIN}/quitquitquit
echo "Clearing lock on service annotation"
kubectl patch svc "${SERVICE}" -p "{\"metadata\": { \"annotations\": {\"iaplock\": \"\" }}}"
function checkIAP() {
# created by init container.
. /var/shared/healthz.env
# If node port or backend id change, so does the JWT audience.
CURR_NODE_PORT=$(kubectl --namespace=${NAMESPACE} get svc ${SERVICE} -o jsonpath='{.spec.ports[0].nodePort}')
CURR_BACKEND_ID=$(gcloud compute --project=${PROJECT} backend-services list --filter=name~k8s-be-${CURR_NODE_PORT}- --format='value(id)')
[ "$BACKEND_ID" == "$CURR_BACKEND_ID" ]
}
# Verify IAP every 10 seconds.
while true; do
if ! checkIAP; then
echo "$(date) WARN: IAP check failed, restarting container."
exit 1
fi
sleep 10
done
|||,
"setup_iap.sh": importstr "setup_iap.sh",
"configure_envoy_for_iap.sh": importstr "configure_envoy_for_iap.sh",
},
},

Expand Down

0 comments on commit e88ef99

Please sign in to comment.