Skip to content
This repository has been archived by the owner on Feb 1, 2022. It is now read-only.

How to run distributed training from Kubeflow Pipelines SDK? #42

Open
marrrcin opened this issue Feb 24, 2020 · 3 comments
Open

How to run distributed training from Kubeflow Pipelines SDK? #42

marrrcin opened this issue Feb 24, 2020 · 3 comments

Comments

@marrrcin
Copy link

The example linked in README https://github.com/kubeflow/xgboost-operator/tree/master/config/samples/xgboost-dist shows that spawning distributed training job requires running kubectl. I want to run distributed XGBoost training as a part of bigger Kubeflow pipeline, how to achieve this? Is there a possibility to spawn distributed job from the Python code itself or from the Kubeflow Pipelines SDK?

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
question 0.86

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@terrytangyuan
Copy link
Member

I think you can run XGBoostJob as part of Kubeflow Pipelines similar to other Kubeflow operators but I am not familiar enough with Kubeflow Pipelines to be sure. Try it out and let us know if you encounter any issues.

@pingsutw
Copy link
Member

I think it' related to kubeflow/pipelines#973

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants