Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot run mnist example on TFJob on a fresh Kubeflow deployment on local MicroK8s cluster #5492

Closed
alessandroferrari opened this issue Jan 3, 2021 · 3 comments

Comments

@alessandroferrari
Copy link

alessandroferrari commented Jan 3, 2021

When running a TFJob as shown in the tutorial, i run into an error:
$ kubectl logs mnist-worker-0 -c tensorflow -n kubeflow Traceback (most recent call last): File "/var/tf_mnist/mnist_with_summaries.py", line 212, in <module> tf.app.run(main=main, argv=[sys.argv[0]] + unparsed) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "/var/tf_mnist/mnist_with_summaries.py", line 183, in main train() File "/var/tf_mnist/mnist_with_summaries.py", line 39, in train fake_data=FLAGS.fake_data) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 306, in new_func return func(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py", line 260, in read_data_sets source_url + TRAIN_IMAGES) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 306, in new_func return func(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/base.py", line 252, in maybe_download temp_file_name, _ = urlretrieve_with_retry(source_url) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 306, in new_func return func(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/base.py", line 205, in wrapped_fn return fn(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/base.py", line 233, in urlretrieve_with_retry return urllib.request.urlretrieve(url, filename) File "/usr/lib/python2.7/urllib.py", line 98, in urlretrieve return opener.retrieve(url, filename, reporthook, data) File "/usr/lib/python2.7/urllib.py", line 245, in retrieve fp = self.open(url, data) File "/usr/lib/python2.7/urllib.py", line 213, in open return getattr(self, name)(url) File "/usr/lib/python2.7/urllib.py", line 443, in open_https h.endheaders(data) File "/usr/lib/python2.7/httplib.py", line 1053, in endheaders self._send_output(message_body) File "/usr/lib/python2.7/httplib.py", line 897, in _send_output self.send(msg) File "/usr/lib/python2.7/httplib.py", line 859, in send self.connect() File "/usr/lib/python2.7/httplib.py", line 1270, in connect HTTPConnection.connect(self) File "/usr/lib/python2.7/httplib.py", line 836, in connect self.timeout, self.source_address) File "/usr/lib/python2.7/socket.py", line 575, in create_connection raise err IOError: [Errno socket error] [Errno 111] Connection refused

When running docker run gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0 outside of K8s on workstation it works.
KF central dashboard and notebook server looks to work fine.

I have deployed KF on a fresh MicroK8s install, on a single workstation, detailed instructions followed for the deploy from a fresh install:

`
#install uk8s
MICROK8S_VERSION=1.18
sudo snap install microk8s --classic --channel=$MICROK8S_VERSION/stable

#setup net if for microk8s
sudo ufw allow in on cni0 & sudo ufw allow out on cni0

#setup port forward
sudo ufw default allow routed
sudo iptables -P FORWARD ACCEPT

#create and cd to deployment folder
BASE_DIR=/home/driveav/workspace/kubeflow/deployment
KF_NAME=kf-test
KF_DIR="$BASE_DIR/$KF_NAME"
cd $KF_DIR

#setup k8s config
mkdir -p $HOME/.kube
sudo microk8s.kubectl config view --raw > $HOME/.kube/config
export KUBECONFIG=$HOME/.kube/config
sudo chmod 0766 $HOME/.kube/config

#enable components needed
KUBERNETES_MASTER=https://10.152.183.1:443
microk8s enable dns dashboard storage gpu

#setup token
token=$(microk8s kubectl -n kube-system get secret | grep default-token | cut -d " " -f1)
microk8s kubectl -n kube-system describe secret $token

#Setup args for trustworthy JWTs as suggested in https://gist.github.com/etheleon/80414516c7fbc7147a5718b9897b1518
echo "--- Updating kube-apiserver configuration ---"

KUBE_APISERVER_ARG1_KEY="service-account-signing-key-file"
KUBE_APISERVER_ARG1_VALUE="${SNAP_DATA}/certs/serviceaccount.key"
KUBE_APISERVER_ARG1_VALUE_ESC="${SNAP_DATA}/certs/serviceaccount.key"
KUBE_APISERVER_ARG2_KEY="service-account-issuer"
KUBE_APISERVER_ARG2_VALUE="kubernetes.default.svc"
KUBE_APISERVER_FILE=/var/snap/microk8s/current/args/kube-apiserver

if ! grep -q "$KUBE_APISERVER_ARG1_KEY" $KUBE_APISERVER_FILE;
then # add the arg if missing
echo -e "--$KUBE_APISERVER_ARG1_KEY=$KUBE_APISERVER_ARG1_VALUE" >> $KUBE_APISERVER_FILE
else # replace it with the correct value if pre-existent
sed -i "s/^--$KUBE_APISERVER_ARG1_KEY=.*/--$KUBE_APISERVER_ARG1_KEY=$KUBE_APISERVER_ARG1_VALUE_ESC/g" $KUBE_APISERVER_FILE
fi

if ! grep -q "$KUBE_APISERVER_ARG2_KEY" $KUBE_APISERVER_FILE;
then
echo -e "--$KUBE_APISERVER_ARG2_KEY=$KUBE_APISERVER_ARG2_VALUE" >> $KUBE_APISERVER_FILE
else
sed -i "s/^--$KUBE_APISERVER_ARG2_KEY=.*/--$KUBE_APISERVER_ARG2_KEY=$KUBE_APISERVER_ARG2_VALUE/g" $KUBE_APISERVER_FILE
fi

cat $KUBE_APISERVER_FILE

KUBELET_ARG1_KEY="container-runtime"
KUBELET_ARG1_VALUE="docker"
KUBELET_ARG2_KEY="container-runtime-endpoint"
KUBELET_ARG3_KEY="image-pull-progress-deadline"
KUBELET_ARG3_VALUE='"10m'
KUBELET_FILE=/var/snap/microk8s/current/args/kubelet

if ! grep -q "$KUBELET_ARG1_KEY=" $KUBELET_FILE;
then #add the arg if missing
echo -e "--$KUBELET_ARG1_KEY=$KUBELET_ARG1_VALUE" >> $KUBELET_FILE
else #replace it with the correct value if pre-existent
sed -i "s/^--$KUBELET_ARG1_KEY=.*/--$KUBELET_ARG1_KEY=$KUBELET_ARG1_VALUE/g" $KUBELET_FILE
fi

sed -e "/^--$KUBELET_ARG2_KEY=.*/s/^/#/g" -i $KUBELET_FILE

#When settings this, like suggested by https://gist.github.com/etheleon/80414516c7fbc7147a5718b9897b1518,
#deployment does not work

#if ! grep -q "$KUBELET_ARG3_KEY=" $KUBELET_FILE;
#then # add the arg if missing
#echo -e "--$KUBELET_ARG3_KEY=$KUBELET_ARG3_VALUE" >> $KUBELET_FILE
#else # replace it with the correct value if pre-existent
#sed -i "s/^--$KUBELET_ARG3_KEY=.*/--$KUBELET_ARG3_KEY=$KUBELET_ARG3_VALUE/g" $KUBELET_FILE
#fi

cat $KUBELET_FILE

#restart uk8s
microk8s.stop
microk8s.start

#deploy as shown in https://www.kubeflow.org/docs/started/k8s/kfctl-k8s-istio/
CONFIG_URI="https://raw.githubusercontent.com/kubeflow/manifests/v1.2-branch/kfdef/kfctl_k8s_istio.v1.2.0.yaml"
kfctl apply -V -f $CONFIG_URI

#setup the nodeport for istio ingress-gateway-controller, as shown in
#https://istio.io/latest/docs/tasks/traffic-management/ingress/ingress-control/
#https://www.kubeflow.org/docs/started/k8s/kfctl-k8s-istio/#access-the-kubeflow-user-interface-ui
kubectl get svc istio-ingressgateway -n istio-system

INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')
SECURE_INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="https")].nodePort}')
TCP_INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="tcp")].nodePort}')
INGRESS_HOST=$(kubectl get po -l istio=ingressgateway -n istio-system -o jsonpath='{.items[0].status.hostIP}')

kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: httpbin-gateway
spec:
selector:
istio: ingressgateway # use Istio default gateway implementation
servers:

  • port:
    number: 80
    name: http
    protocol: HTTP
    hosts:
    • "httpbin.example.com"
      EOF

kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: httpbin
spec:
hosts:

  • "httpbin.example.com"
    gateways:
  • httpbin-gateway
    http:
  • match:
    • uri:
      prefix: /status
    • uri:
      prefix: /delay
      route:
    • destination:
      port:
      number: 8000
      host: httpbin
      EOF

curl -s -I -HHost:httpbin.example.com "http://$INGRESS_HOST:$INGRESS_PORT/status/200"

echo "http://$INGRESS_HOST:$INGRESS_PORT"
`

Status of k8s cluster:
`$ microk8s inspect
Inspecting Certificates
Inspecting services
Service snap.microk8s.daemon-cluster-agent is running
Service snap.microk8s.daemon-flanneld is running
Service snap.microk8s.daemon-containerd is running
Service snap.microk8s.daemon-apiserver is running
Service snap.microk8s.daemon-apiserver-kicker is running
Service snap.microk8s.daemon-proxy is running
Service snap.microk8s.daemon-kubelet is running
Service snap.microk8s.daemon-scheduler is running
Service snap.microk8s.daemon-controller-manager is running
Service snap.microk8s.daemon-etcd is running
Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
Copy processes list to the final report tarball
Copy snap list to the final report tarball
Copy VM name (or none) to the final report tarball
Copy disk usage information to the final report tarball
Copy memory usage information to the final report tarball
Copy server uptime to the final report tarball
Copy current linux distribution to the final report tarball
Copy openSSL information to the final report tarball
Copy network configuration to the final report tarball
Inspecting kubernetes cluster
Inspect kubernetes cluster

Building the report tarball
Report tarball is at /var/snap/microk8s/1854/inspection-report-20210103_114419.tar.gz

$ kubectl get all -n kubeflow
NAME READY STATUS RESTARTS AGE
pod/admission-webhook-bootstrap-stateful-set-0 1/1 Running 3 13h
pod/admission-webhook-deployment-5d9ccb5696-kfg6f 1/1 Running 0 58m
pod/application-controller-stateful-set-0 1/1 Running 1 13h
pod/argo-ui-684bcb587f-258r8 1/1 Running 1 13h
pod/cache-deployer-deployment-6667847478-6c24t 2/2 Running 7 13h
pod/cache-server-bd9c859db-69qgn 2/2 Running 2 13h
pod/centraldashboard-895c4c768-tdrm9 1/1 Running 1 13h
pod/jupyter-web-app-deployment-6588c6f544-hmzgq 1/1 Running 1 13h
pod/katib-controller-75c8d47f8c-wcv7b 1/1 Running 2 13h
pod/katib-db-manager-6c88c68d79-rp6c9 1/1 Running 1 13h
pod/katib-mysql-858f68f588-5b69t 1/1 Running 1 13h
pod/katib-ui-68f59498d4-gkh5l 1/1 Running 1 13h
pod/kfserving-controller-manager-0 2/2 Running 2 13h
pod/kubeflow-pipelines-profile-controller-69c94df75b-dr8xb 1/1 Running 1 13h
pod/metadata-db-757dc9c7b5-9c47k 1/1 Running 1 13h
pod/metadata-envoy-deployment-6ff58757f6-vssbt 1/1 Running 1 13h
pod/metadata-grpc-deployment-76d69f69c8-x9s84 1/1 Running 6 13h
pod/metadata-writer-6d94ffb7df-h6qjq 2/2 Running 2 13h
pod/minio-66c9cd74c9-tbp9s 1/1 Running 1 13h
pod/ml-pipeline-54989c9946-5ggrx 2/2 Running 2 13h
pod/ml-pipeline-persistenceagent-7f6bf7646-7h5xc 2/2 Running 2 13h
pod/ml-pipeline-scheduledworkflow-66db7bcf5d-4gkmv 2/2 Running 2 13h
pod/ml-pipeline-ui-756b58fb-tvhpk 2/2 Running 2 13h
pod/ml-pipeline-viewer-crd-58f59f87db-4jf4f 2/2 Running 6 13h
pod/ml-pipeline-visualizationserver-6f9ff4974-chmdg 2/2 Running 2 13h
pod/mnist-worker-0 1/2 Error 0 29m
pod/mpi-operator-77bb5d8f4b-gdjtx 1/1 Running 1 13h
pod/mxnet-operator-68b688bb69-4qnw5 1/1 Running 1 13h
pod/mysql-7694c6b8b7-nlsxx 2/2 Running 2 13h
pod/notebook-controller-deployment-58447d4b4c-gtb7z 1/1 Running 1 13h
pod/profiles-deployment-78d4549cbc-z2s7l 2/2 Running 2 13h
pod/pytorch-operator-b79799447-97wx2 1/1 Running 1 13h
pod/seldon-controller-manager-5fc5dfc86c-nblzl 1/1 Running 1 13h
pod/spark-operatorsparkoperator-67c6bc65fb-bkh56 1/1 Running 1 13h
pod/spartakus-volunteer-6ddc7b6676-254wc 1/1 Running 1 13h
pod/tf-job-operator-5c97f4bf7-bvpks 1/1 Running 1 13h
pod/workflow-controller-5c7cc7976d-lqqrv 1/1 Running 1 13h

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/admission-webhook-service ClusterIP 10.152.183.253 443/TCP 13h
service/application-controller-service ClusterIP 10.152.183.42 443/TCP 13h
service/argo-ui NodePort 10.152.183.142 80:31140/TCP 13h
service/cache-server ClusterIP 10.152.183.123 443/TCP 13h
service/centraldashboard ClusterIP 10.152.183.231 80/TCP 13h
service/jupyter-web-app-service ClusterIP 10.152.183.54 80/TCP 13h
service/katib-controller ClusterIP 10.152.183.144 443/TCP,8080/TCP 13h
service/katib-db-manager ClusterIP 10.152.183.6 6789/TCP 13h
service/katib-mysql ClusterIP 10.152.183.53 3306/TCP 13h
service/katib-ui ClusterIP 10.152.183.108 80/TCP 13h
service/kfserving-controller-manager-metrics-service ClusterIP 10.152.183.169 8443/TCP 13h
service/kfserving-controller-manager-service ClusterIP 10.152.183.55 443/TCP 13h
service/kfserving-webhook-server-service ClusterIP 10.152.183.190 443/TCP 13h
service/kubeflow-pipelines-profile-controller ClusterIP 10.152.183.148 80/TCP 13h
service/metadata-db ClusterIP 10.152.183.229 3306/TCP 13h
service/metadata-envoy-service ClusterIP 10.152.183.248 9090/TCP 13h
service/metadata-grpc-service ClusterIP 10.152.183.137 8080/TCP 13h
service/minio-service ClusterIP 10.152.183.204 9000/TCP 13h
service/ml-pipeline ClusterIP 10.152.183.178 8888/TCP,8887/TCP 13h
service/ml-pipeline-ui ClusterIP 10.152.183.226 80/TCP 13h
service/ml-pipeline-visualizationserver ClusterIP 10.152.183.235 8888/TCP 13h
service/mnist-worker-0 ClusterIP None 2222/TCP 29m
service/mysql ClusterIP 10.152.183.41 3306/TCP 13h
service/notebook-controller-service ClusterIP 10.152.183.176 443/TCP 13h
service/profiles-kfam ClusterIP 10.152.183.166 8081/TCP 13h
service/pytorch-operator ClusterIP 10.152.183.145 8443/TCP 13h
service/seldon-webhook-service ClusterIP 10.152.183.168 443/TCP 13h
service/tf-job-operator ClusterIP 10.152.183.232 8443/TCP 13h

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/admission-webhook-deployment 1/1 1 1 13h
deployment.apps/argo-ui 1/1 1 1 13h
deployment.apps/cache-deployer-deployment 1/1 1 1 13h
deployment.apps/cache-server 1/1 1 1 13h
deployment.apps/centraldashboard 1/1 1 1 13h
deployment.apps/jupyter-web-app-deployment 1/1 1 1 13h
deployment.apps/katib-controller 1/1 1 1 13h
deployment.apps/katib-db-manager 1/1 1 1 13h
deployment.apps/katib-mysql 1/1 1 1 13h
deployment.apps/katib-ui 1/1 1 1 13h
deployment.apps/kubeflow-pipelines-profile-controller 1/1 1 1 13h
deployment.apps/metadata-db 1/1 1 1 13h
deployment.apps/metadata-envoy-deployment 1/1 1 1 13h
deployment.apps/metadata-grpc-deployment 1/1 1 1 13h
deployment.apps/metadata-writer 1/1 1 1 13h
deployment.apps/minio 1/1 1 1 13h
deployment.apps/ml-pipeline 1/1 1 1 13h
deployment.apps/ml-pipeline-persistenceagent 1/1 1 1 13h
deployment.apps/ml-pipeline-scheduledworkflow 1/1 1 1 13h
deployment.apps/ml-pipeline-ui 1/1 1 1 13h
deployment.apps/ml-pipeline-viewer-crd 1/1 1 1 13h
deployment.apps/ml-pipeline-visualizationserver 1/1 1 1 13h
deployment.apps/mpi-operator 1/1 1 1 13h
deployment.apps/mxnet-operator 1/1 1 1 13h
deployment.apps/mysql 1/1 1 1 13h
deployment.apps/notebook-controller-deployment 1/1 1 1 13h
deployment.apps/profiles-deployment 1/1 1 1 13h
deployment.apps/pytorch-operator 1/1 1 1 13h
deployment.apps/seldon-controller-manager 1/1 1 1 13h
deployment.apps/spark-operatorsparkoperator 1/1 1 1 13h
deployment.apps/spartakus-volunteer 1/1 1 1 13h
deployment.apps/tf-job-operator 1/1 1 1 13h
deployment.apps/workflow-controller 1/1 1 1 13h

NAME DESIRED CURRENT READY AGE
replicaset.apps/admission-webhook-deployment-5d9ccb5696 1 1 1 13h
replicaset.apps/argo-ui-684bcb587f 1 1 1 13h
replicaset.apps/cache-deployer-deployment-6667847478 1 1 1 13h
replicaset.apps/cache-server-bd9c859db 1 1 1 13h
replicaset.apps/centraldashboard-895c4c768 1 1 1 13h
replicaset.apps/jupyter-web-app-deployment-6588c6f544 1 1 1 13h
replicaset.apps/katib-controller-75c8d47f8c 1 1 1 13h
replicaset.apps/katib-db-manager-6c88c68d79 1 1 1 13h
replicaset.apps/katib-mysql-858f68f588 1 1 1 13h
replicaset.apps/katib-ui-68f59498d4 1 1 1 13h
replicaset.apps/kubeflow-pipelines-profile-controller-69c94df75b 1 1 1 13h
replicaset.apps/metadata-db-757dc9c7b5 1 1 1 13h
replicaset.apps/metadata-envoy-deployment-6ff58757f6 1 1 1 13h
replicaset.apps/metadata-grpc-deployment-76d69f69c8 1 1 1 13h
replicaset.apps/metadata-writer-6d94ffb7df 1 1 1 13h
replicaset.apps/minio-66c9cd74c9 1 1 1 13h
replicaset.apps/ml-pipeline-54989c9946 1 1 1 13h
replicaset.apps/ml-pipeline-persistenceagent-7f6bf7646 1 1 1 13h
replicaset.apps/ml-pipeline-scheduledworkflow-66db7bcf5d 1 1 1 13h
replicaset.apps/ml-pipeline-ui-756b58fb 1 1 1 13h
replicaset.apps/ml-pipeline-viewer-crd-58f59f87db 1 1 1 13h
replicaset.apps/ml-pipeline-visualizationserver-6f9ff4974 1 1 1 13h
replicaset.apps/mpi-operator-77bb5d8f4b 1 1 1 13h
replicaset.apps/mxnet-operator-68b688bb69 1 1 1 13h
replicaset.apps/mysql-7694c6b8b7 1 1 1 13h
replicaset.apps/notebook-controller-deployment-58447d4b4c 1 1 1 13h
replicaset.apps/profiles-deployment-78d4549cbc 1 1 1 13h
replicaset.apps/pytorch-operator-b79799447 1 1 1 13h
replicaset.apps/seldon-controller-manager-5fc5dfc86c 1 1 1 13h
replicaset.apps/spark-operatorsparkoperator-67c6bc65fb 1 1 1 13h
replicaset.apps/spartakus-volunteer-6ddc7b6676 1 1 1 13h
replicaset.apps/tf-job-operator-5c97f4bf7 1 1 1 13h
replicaset.apps/workflow-controller-5c7cc7976d 1 1 1 13h

NAME READY AGE
statefulset.apps/admission-webhook-bootstrap-stateful-set 1/1 13h
statefulset.apps/application-controller-stateful-set 1/1 13h
statefulset.apps/kfserving-controller-manager 1/1 13h
statefulset.apps/metacontroller 0/1 13h

$kubectl get all -n istio-system
NAME READY STATUS RESTARTS AGE
pod/cluster-local-gateway-84bb595449-dpxr9 1/1 Running 1 13h
pod/istio-citadel-7f66ddfcfb-7xvpg 1/1 Running 1 13h
pod/istio-galley-7976dd55cd-z8vkd 1/1 Running 1 13h
pod/istio-ingressgateway-c79f9f6f-4gqwt 1/1 Running 1 13h
pod/istio-nodeagent-q584k 1/1 Running 1 13h
pod/istio-pilot-7bd96d69d9-czwnq 2/2 Running 2 13h
pod/istio-policy-66b5d9887c-nmzpr 2/2 Running 7 13h
pod/istio-security-post-install-release-1.3-latest-daily-zkrqc 0/1 Completed 0 13h
pod/istio-sidecar-injector-56b6997f7d-l7jdf 1/1 Running 1 13h
pod/istio-telemetry-856f7bcff4-f4d9q 2/2 Running 7 13h
pod/prometheus-65fdcbc857-qc5nc 1/1 Running 1 13h

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/cluster-local-gateway ClusterIP 10.152.183.246 80/TCP,443/TCP,31400/TCP,15011/TCP,8060/TCP,15029/TCP,15030/TCP,15031/TCP,15032/TCP 13h
service/istio-citadel ClusterIP 10.152.183.121 8060/TCP,15014/TCP 13h
service/istio-galley ClusterIP 10.152.183.120 443/TCP,15014/TCP,9901/TCP 13h
service/istio-ingressgateway NodePort 10.152.183.64 15020:32535/TCP,80:31380/TCP,443:31390/TCP,31400:31400/TCP,15029:30236/TCP,15030:31147/TCP,15031:30102/TCP,15032:30543/TCP,15443:30722/TCP 13h
service/istio-pilot ClusterIP 10.152.183.220 15010/TCP,15011/TCP,8080/TCP,15014/TCP 13h
service/istio-policy ClusterIP 10.152.183.52 9091/TCP,15004/TCP,15014/TCP 13h
service/istio-sidecar-injector ClusterIP 10.152.183.116 443/TCP,15014/TCP 13h
service/istio-telemetry ClusterIP 10.152.183.132 9091/TCP,15004/TCP,15014/TCP,42422/TCP 13h
service/prometheus ClusterIP 10.152.183.68 9090/TCP 13h

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/istio-nodeagent 1 1 1 1 1 13h

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/cluster-local-gateway 1/1 1 1 13h
deployment.apps/istio-citadel 1/1 1 1 13h
deployment.apps/istio-galley 1/1 1 1 13h
deployment.apps/istio-ingressgateway 1/1 1 1 13h
deployment.apps/istio-pilot 1/1 1 1 13h
deployment.apps/istio-policy 1/1 1 1 13h
deployment.apps/istio-sidecar-injector 1/1 1 1 13h
deployment.apps/istio-telemetry 1/1 1 1 13h
deployment.apps/prometheus 1/1 1 1 13h

NAME DESIRED CURRENT READY AGE
replicaset.apps/cluster-local-gateway-84bb595449 1 1 1 13h
replicaset.apps/istio-citadel-7f66ddfcfb 1 1 1 13h
replicaset.apps/istio-galley-7976dd55cd 1 1 1 13h
replicaset.apps/istio-ingressgateway-c79f9f6f 1 1 1 13h
replicaset.apps/istio-pilot-7bd96d69d9 1 1 1 13h
replicaset.apps/istio-policy-66b5d9887c 1 1 1 13h
replicaset.apps/istio-sidecar-injector-56b6997f7d 1 1 1 13h
replicaset.apps/istio-telemetry-856f7bcff4 1 1 1 13h
replicaset.apps/prometheus-65fdcbc857 1 1 1 13h

NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
horizontalpodautoscaler.autoscaling/cluster-local-gateway Deployment/cluster-local-gateway /80% 1 5 1 13h
horizontalpodautoscaler.autoscaling/istio-ingressgateway Deployment/istio-ingressgateway /80% 1 5 1 13h
horizontalpodautoscaler.autoscaling/istio-pilot Deployment/istio-pilot /80% 1 5 1 13h
horizontalpodautoscaler.autoscaling/istio-policy Deployment/istio-policy /80% 1 5 1 13h
horizontalpodautoscaler.autoscaling/istio-telemetry Deployment/istio-telemetry /80% 1 5 1 13h

NAME COMPLETIONS DURATION AGE
job.batch/istio-security-post-install-release-1.3-latest-daily 1/1 15s 13h
`

Thanks in advance!

@kubeflow-bot kubeflow-bot added this to To Do in Needs Triage Jan 3, 2021
@yanniszark
Copy link
Contributor

yanniszark commented Jan 5, 2021

cc @kubeflow/wg-training-leads

Should this issue be transfered to https://github.com/kubeflow/tf-operator?

@gaocegege
Copy link
Member

I think it is related to the network in your microk8s cluster.

@alessandroferrari
Copy link
Author

@gaocegege right. It was sufficient to simply
sudo mv /etc/docker/daemon.json /etc/docker/daemon.json.bckp
sudo service docker restart
Afterwards it worked like a charm.
I think I can close this one.

Needs Triage automation moved this from To Do to Closed Jan 7, 2021
@kubeflow-bot kubeflow-bot removed this from Closed in Needs Triage Jan 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants