You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I cannot link gci ,so i find docker pull cschen/tf-mnist-with-summaries from dockerhub.
But my tfjob pod status is "CrashLoopBackOff" ,and describe is "Error syncing pod",and logs is "python: can't open file '/root/kubeflow/mnist-with-summaries.py': [Errno 2] No such file or directory"
The contents of other documents are the same as the examples.
kubectl -n kubeflow describe pod mnist-worker -0
**Events:
Type Reason Age From Message
Normal Scheduled 3m default-scheduler Successfully assigned mnist-worker-0 to k8s-node-vm8o4o-ev15h9pjqd
Normal SuccessfulMountVolume 3m kubelet, k8s-node-vm8o4o-ev15h9pjqd MountVolume.SetUp succeeded for volume "default-token-pt7pz"
Normal SuccessfulMountVolume 3m kubelet, k8s-node-vm8o4o-ev15h9pjqd MountVolume.SetUp succeeded for volume "tfevent-volume"
Normal Pulled 2m (x3 over 2m) kubelet, k8s-node-vm8o4o-ev15h9pjqd Successfully pulled image "cschen/tf-mnist-with-summaries"
Normal Created 2m (x3 over 2m) kubelet, k8s-node-vm8o4o-ev15h9pjqd Created container
Normal Started 2m (x3 over 2m) kubelet, k8s-node-vm8o4o-ev15h9pjqd Started container
Warning BackOff 2m (x5 over 2m) kubelet, k8s-node-vm8o4o-ev15h9pjqd Back-off restarting failed container
Warning FailedSync 2m (x5 over 2m) kubelet, k8s-node-vm8o4o-ev15h9pjqd Error syncing pod
Normal Pulling 2m (x4 over 3m) kubelet, k8s-node-vm8o4o-ev15h9pjqd pulling image "cschen/tf-mnist-with-summaries"
**
kubectl -n kubeflow logs pod/mnist-worker-0 python: can't open file '/root/kubeflow/mnist-with-summaries.py': [Errno 2] No such file or directory
kubectl -n kubeflow describe tfjob mnist
**Events:
Type Reason Age From Message
Normal SuccessfulCreatePod 7m tf-operator Created pod: mnist-worker-0
Normal SuccessfulCreateService 7m tf-operator Created service: mnist-worker-0
Normal ExitedWithCode 1m (x10 over 7m) tf-operator Pod: kubeflow.mnist-worker-0 exited with code 2**
kubectl -n kubeflow logs tfjob/mnist error: no kind "TFJob" is registered for version "kubeflow.org/v1beta1"
The text was updated successfully, but these errors were encountered:
I cannot link gci ,so i find docker pull cschen/tf-mnist-with-summaries from dockerhub.
But my tfjob pod status is "CrashLoopBackOff" ,and describe is "Error syncing pod",and logs is "python: can't open file '/root/kubeflow/mnist-with-summaries.py': [Errno 2] No such file or directory"
The contents of other documents are the same as the examples.
my tf_job_mnist.yaml
apiVersion: "kubeflow.org/v1beta1"
kind: "TFJob"
metadata:
name: "mnist"
namespace: kubeflow
spec:
cleanPodPolicy: None
tfReplicaSpecs:
Worker:
replicas: 1
restartPolicy: OnFailure
template:
spec:
containers:
- name: tensorflow
image: cschen/tf-mnist-with-summaries
command:
- "python"
- "/root/kubeflow/mnist-with-summaries.py"
- "--log_dir=/train"
- "--learning_rate=0.01"
- "--batch_size=150"
volumeMounts:
- mountPath: "/train"
name: tfevent-volume
volumes:
- name: tfevent-volume
persistentVolumeClaim:
claimName: "tfevent-volume"
kubectl -n kubeflow get pods
**mnist-worker-0 0/1 CrashLoopBackOff 2 1m
**
kubectl -n kubeflow describe pod mnist-worker -0
**Events:
Type Reason Age From Message
Normal Scheduled 3m default-scheduler Successfully assigned mnist-worker-0 to k8s-node-vm8o4o-ev15h9pjqd
Normal SuccessfulMountVolume 3m kubelet, k8s-node-vm8o4o-ev15h9pjqd MountVolume.SetUp succeeded for volume "default-token-pt7pz"
Normal SuccessfulMountVolume 3m kubelet, k8s-node-vm8o4o-ev15h9pjqd MountVolume.SetUp succeeded for volume "tfevent-volume"
Normal Pulled 2m (x3 over 2m) kubelet, k8s-node-vm8o4o-ev15h9pjqd Successfully pulled image "cschen/tf-mnist-with-summaries"
Normal Created 2m (x3 over 2m) kubelet, k8s-node-vm8o4o-ev15h9pjqd Created container
Normal Started 2m (x3 over 2m) kubelet, k8s-node-vm8o4o-ev15h9pjqd Started container
Warning BackOff 2m (x5 over 2m) kubelet, k8s-node-vm8o4o-ev15h9pjqd Back-off restarting failed container
Warning FailedSync 2m (x5 over 2m) kubelet, k8s-node-vm8o4o-ev15h9pjqd Error syncing pod
Normal Pulling 2m (x4 over 3m) kubelet, k8s-node-vm8o4o-ev15h9pjqd pulling image "cschen/tf-mnist-with-summaries"
**
kubectl -n kubeflow logs pod/mnist-worker-0
python: can't open file '/root/kubeflow/mnist-with-summaries.py': [Errno 2] No such file or directory
kubectl -n kubeflow describe tfjob mnist
**Events:
Type Reason Age From Message
Normal SuccessfulCreatePod 7m tf-operator Created pod: mnist-worker-0
Normal SuccessfulCreateService 7m tf-operator Created service: mnist-worker-0
Normal ExitedWithCode 1m (x10 over 7m) tf-operator Pod: kubeflow.mnist-worker-0 exited with code 2**
kubectl -n kubeflow logs tfjob/mnist
error: no kind "TFJob" is registered for version "kubeflow.org/v1beta1"
The text was updated successfully, but these errors were encountered: