Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all training run on the first worker #721

Closed
zuowang opened this issue Apr 25, 2018 · 1 comment
Closed

all training run on the first worker #721

zuowang opened this issue Apr 25, 2018 · 1 comment

Comments

@zuowang
Copy link

zuowang commented Apr 25, 2018

@jiezhang could you take a look at this issue?

root@openmpi-master-0:/examples# ssh openmpi-worker-1.openmpi.kubeflow.svc.cluster.local
Warning: Permanently added '[openmpi-worker-1.openmpi.kubeflow.svc.cluster.local]:2022,[192.168.173.254]:2022' (ECDSA) to the list of known hosts.
Last login: Wed Apr 25 11:47:57 2018 from 192.168.201.150
root@openmpi-worker-1:~# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 08:09 ?        00:00:00 sh /kubeflow/openmpi/assets/init.sh
root        14     1  0 08:09 ?        00:00:00 /usr/sbin/sshd -D -e -f /kubeflow/openmpi/assets/sshd_config
root        27    14  0 11:46 ?        00:00:00 sshd: root@notty
root        28    27  0 11:47 ?        00:00:00 bash -c     PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_L
root        29    28  0 11:47 ?        00:00:00 /usr/local/bin/orted -mca ess env -mca ess_base_jobid 924975104 -mca ess_base_vpid 2 -mca ess_base_num_procs 5 -mca
root        44    14  0 11:57 ?        00:00:00 sshd: root@pts/0
root        45    44  0 11:57 pts/0    00:00:00 -bash
root        51    45  0 11:57 pts/0    00:00:00 ps -ef
root@openmpi-worker-1:~# exit
logout
Connection to openmpi-worker-1.openmpi.kubeflow.svc.cluster.local closed.
root@openmpi-master-0:/examples# ssh openmpi-worker-0.openmpi.kubeflow.svc.cluster.local
Warning: Permanently added '[openmpi-worker-0.openmpi.kubeflow.svc.cluster.local]:2022,[192.168.99.142]:2022' (ECDSA) to the list of known hosts.
Last login: Wed Apr 25 11:47:43 2018 from 192.168.201.150
root@openmpi-worker-0:~# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 08:09 ?        00:00:00 sh /kubeflow/openmpi/assets/init.sh
root        14     1  0 08:09 ?        00:00:00 /usr/sbin/sshd -D -e -f /kubeflow/openmpi/assets/sshd_config
root       529    14  0 11:46 ?        00:00:00 sshd: root@notty
root       530   529  0 11:46 ?        00:00:00 bash -c     PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_L
root       531   530  0 11:46 ?        00:00:00 /usr/local/bin/orted -mca ess env -mca ess_base_jobid 924975104 -mca ess_base_vpid 1 -mca ess_base_num_procs 5 -mca
root       535   531  0 11:46 ?        00:00:00 sh -c LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-9.0/targets/x86_64-linux/lib/stubs python /examples/keras_mn
root       536   531  0 11:46 ?        00:00:00 sh -c LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-9.0/targets/x86_64-linux/lib/stubs python /examples/keras_mn
root       537   535 99 11:46 ?        00:27:23 python /examples/keras_mnist_advanced.py
root       538   531  0 11:46 ?        00:00:00 sh -c LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-9.0/targets/x86_64-linux/lib/stubs python /examples/keras_mn
root       539   536 99 11:46 ?        00:27:17 python /examples/keras_mnist_advanced.py
root       540   531  0 11:46 ?        00:00:00 sh -c LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-9.0/targets/x86_64-linux/lib/stubs python /examples/keras_mn
root       541   538 99 11:46 ?        00:27:23 python /examples/keras_mnist_advanced.py
root       542   540 99 11:46 ?        00:27:18 python /examples/keras_mnist_advanced.py
root      1119    14  0 11:57 ?        00:00:00 sshd: root@pts/4
root      1120  1119  0 11:57 pts/4    00:00:00 -bash
root      1126  1120  0 11:57 pts/4    00:00:00 ps -ef
root@openmpi-worker-0:~#

@zuowang
Copy link
Author

zuowang commented Apr 25, 2018

not a issue

@zuowang zuowang closed this as completed Apr 25, 2018
yanniszark pushed a commit to arrikto/kubeflow that referenced this issue Feb 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant