Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PaddlePaddle Training: why can't find pods #1694

Closed
hecheng64 opened this issue Nov 23, 2022 · 2 comments
Closed

PaddlePaddle Training: why can't find pods #1694

hecheng64 opened this issue Nov 23, 2022 · 2 comments

Comments

@hecheng64
Copy link

step:

  1. kubectl apply -k "github.com/kubeflow/training-operator/manifests/overlays/standalone"

  2. kubectl create -f https://raw.githubusercontent.com/kubeflow/training-operator/master/examples/paddlepaddle/simple-cpu.yaml

result:
image

NAMESPACE NAME READY STATUS RESTARTS AGE
default dist-mnist-for-e2e-test-worker-0 0/1 Completed 0 30m
default dist-mnist-for-e2e-test-worker-1 0/1 Completed 0 30m
default dist-mnist-for-e2e-test-worker-3 0/1 Completed 0 30m
istio-system istio-egressgateway-6d985d9948-2gj5h 1/1 Running 0 18h
istio-system istio-ingressgateway-579b7748cf-skhvm 1/1 Running 0 18h
istio-system istiod-689f8b79b-s998f 1/1 Running 0 18h
kube-system coredns-598c94845b-kzfk9 1/1 Running 0 18h
kube-system coredns-598c94845b-shn7z 1/1 Running 0 18h
kube-system kube-apiserver-aifs-master-1 1/1 Running 0 18h
kube-system kube-controller-manager-aifs-master-1 1/1 Running 0 18h
kube-system kube-flannel-ds-fmft8 1/1 Running 0 18h
kube-system kube-flannel-ds-gsg2x 1/1 Running 0 18h
kube-system kube-flannel-ds-k6wzl 1/1 Running 0 18h
kube-system kube-proxy-7h49p 1/1 Running 0 18h
kube-system kube-proxy-dsmb7 1/1 Running 0 18h
kube-system kube-proxy-gkqhl 1/1 Running 0 18h
kube-system kube-scheduler-aifs-master-1 1/1 Running 0 18h
kube-system metrics-server-64d876c7b6-nznpz 1/1 Running 0 18h
kube-system node-local-dns-8h8dt 1/1 Running 0 18h
kube-system node-local-dns-bqw7w 1/1 Running 0 18h
kube-system node-local-dns-x6vhk 1/1 Running 0 18h
kubeflow training-operator-5759b8548c-4b7f6 1/1 Running 0 33m

but find paddlejob:
root@aifs-worker-2:/home/lw/hecheng/TrainingExample/tensorflow/dist-mnist# kubectl get paddlejobs -A
NAMESPACE NAME STATE AGE
kubeflow paddle-simple-cpu 10m

@johnugeorge
Copy link
Member

@hecheng64 The paddle paddle support has been added recently.

If you want to try, you can use this image kubeflow/training-operator:v1-dd77069

@hecheng64
Copy link
Author

tks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants