-
-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Would it make sense to use hivemind for distributed training/generation? #77
Comments
That would take quite a bit of work to onboard to the horde, but it's a promising thing. The problem is that the horde is asynchronous so it could be that the latency would be prohibitive, but I would be willing to consider it, especially if someone sends a PR |
I think the horde is primarily used for inference, not training. Do any jobs actually do training, or is that planned for the future? If not, it seems like this may provide limited benefit. |
Speaking of using hivemind for distributed training/etc, just stumbled across the following on:
Doing some further googling/etc, it seems that the 'SD Training Labs' discord is: And things are being coordinated in the
It looks like the
A couple of snippets from skimming that discord channel
|
…ra-Org#77) * Number of improvements to the main loop, improving performance: - job.exception() blocks until the job is done. Because of this, the main loop would always wait until all jobs were finished before executing the next iteration - Because of this, the job > 180s code path wasn't reachable This path still had some bugs that were fixed - Added a queue in front of the running jobs. This way, we can already retrieve the next job while the previous one is still running, hiding the job-pop latency - log timestamps with microsecond precision + added debug logging for performance tuning GPU's are actually not great at running multiple workloads at once; On a 3090 I see ~20% total throughput drop as soon as the second job starts. With these optimisations, it should be possible to run the worker with max_threads = 1 for optimal performance. Even for higher thread counts, the job.exception block prevented these threads from getting the highest utilisation of the GPU, so even in that case performance should be significantly better. * small bugfix * Don't use the queue if queue_size = 0, make 0 the default Use these defaults until we can prove that queue_size = 1 && max_threads = 1 is faster than queue_size = 0 and max_threads = 2 * stylefix * removed torch_gc * removed torch_gc Co-authored-by: Divided by Zer0 <mail@dbzer0.com>
Splitting this out from the unrelated issue:
Basically, I stumbled across the hivemind lib and thought that it could be a useful addition to AI-Horde. I'm not 100% sure how the current distributed process is implemented, but from a quick skim it looked like perhaps you had rolled your own.
Not sure if it's something you already considered and decided against, but wanted to bring it to your attention in case you hadn't seen it before.
The text was updated successfully, but these errors were encountered: