Question: What is the recommended way for Data Scientists to run a distributed training job #1535

mChowdhury-91 · 2022-02-14T17:56:20Z

The Data Scientists do not have access to K8s cluster, and hence cannot use commands like kubectl create -f -n kubeflow, in that case what is the recommended way to run distributed training jobs.
Is Kubeflow Pipeline the right approach or Fairing?

johnugeorge · 2022-02-15T15:24:46Z

You can use python sdk for it . https://github.com/kubeflow/training-operator/tree/master/sdk/python

mChowdhury-91 · 2022-02-16T05:53:47Z

@johnugeorge How do we run a MPIjob using the Python SDK. Is there any api to call the MPIJob yaml file

mChowdhury-91 · 2022-02-23T16:11:39Z

@johnugeorge Can we directly provide the configuration yaml file https://github.com/kubeflow/training-operator/blob/master/examples/tensorflow/dist-mnist/tf_job_mnist.yaml or https://github.com/kubeflow/training-operator/blob/master/examples/pytorch/mnist/v1/pytorch_job_mnist_mpi.yaml to trigger the training via a Kubeflow pipeline?

johnugeorge · 2022-02-24T10:56:32Z

you can take a look at https://github.com/kubeflow/katib/blob/master/examples/v1beta1/kubeflow-pipelines/kubeflow-e2e-mnist.ipynb

mChowdhury-91 closed this as completed Feb 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: What is the recommended way for Data Scientists to run a distributed training job #1535

Question: What is the recommended way for Data Scientists to run a distributed training job #1535

mChowdhury-91 commented Feb 14, 2022

johnugeorge commented Feb 15, 2022

mChowdhury-91 commented Feb 16, 2022

mChowdhury-91 commented Feb 23, 2022

johnugeorge commented Feb 24, 2022

Question: What is the recommended way for Data Scientists to run a distributed training job #1535

Question: What is the recommended way for Data Scientists to run a distributed training job #1535

Comments

mChowdhury-91 commented Feb 14, 2022

johnugeorge commented Feb 15, 2022

mChowdhury-91 commented Feb 16, 2022

mChowdhury-91 commented Feb 23, 2022

johnugeorge commented Feb 24, 2022