-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
more detailed explanation of Multi GPU #373
Comments
I think what you mentioned is correct. |
Multi GPU inference is not implemented yet. Only training is supported. Let’s hope we will see Tensor and Pipeline Parallel implementations for inference in near future 😁. Let’s stay updated on @karpathy 's roadmap. |
Absolutely! @karpathy is doing a lot of great stuff and I learned a lot here. The training is distributed between the processes as separate batches. This will be more than enough for speedups in the training phase. But for the inference parallelism, I was hoping they would do something similar to what is implemented in https://github.com/ggerganov/llama.cpp as |
I was wondering if it is possible to add more explanation in the README.md file for the multi-GPU section. Specifically a brief explanation of which functions and parts of the inference and training and to which level are parallelized with multiple GPUs.
From my basic knowledge, I understood that the inference is completely run on single GPU with all the forward paths. In the training phase, the batches of data are shared between multiple GPUs and the results of the losses and gradients are summed up. Could you please clarify a little. Thanks a lot.
The text was updated successfully, but these errors were encountered: