-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Advice on adding parallelism #1710
Comments
Hello, @gregfriedland ! A workflow that is currently supported for remote computing is to
There's no current effort to tackle this, as far as I know. By the way, @gregfriedland , thank you for your interest in contributing, is really appreciated :) ! |
Thank you for the feature request! Yeah, a few people have already asked for parallel execution (like #755). It is a quite heavy feature (from the first glance) but we would be really happy to help with the implementation if you are interested in - to localize the scope, decompose this feature request to small ones and implementation. Please let us know if you are ready to commit to this. Another thing that I'd like to clarify...
It seams like the method that @MrOutis proposed might actually work quite well for you - create a "run-branch" with no execution, execute it in a remote machine and then marge in a local one. It would be great to hear what is your opinion about this approach. Is there any issues that you see? |
Thanks for your responses. The main issue I see with the suggested approach is that then I need to Thanks for your offer to help break down the tasks. After thinking about it more, I think |
Hello,
First off, thanks for your work on this project! It seems very close to what I need for a project I'm working on at my company to build an automated, reproducible, versioned pipeline for training ML models to be used in our production workflows.
The only thing missing is parallelism, which we need because our model training takes up to 36h on an instance, and we have many of these training jobs we'd like to do simultaneously. My thought is that the pipeline jobs could be launched remotely (e.g. using kubernetes) and we'd wait for completion before continuing onto the next pipeline step. Thus, no local resources would be used and many pipelines (without interdependencies of course) could be run in parallel from a single node.
I saw some discussion of this in other issues, but it doesn't seem like there's a remote branch that yet tackles this yet so I wanted to inquire whether A) this work had already been started, and B) whether you have guidance for how to put together a PR that you would accept?
Best,
Greg
The text was updated successfully, but these errors were encountered: