New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
model parallel #1193
Comments
Maybe ask on the repo where you picked that example? I have no idea what the script "summarize_rlhf/trlx_gptj_text_summarization.py" is so can't really help. |
cool man, I am so sorry, I got a mistake, I used trlx to train a chat bloom, I put my issue in wrong place. I success train the bloom560m. But , the trlx uses accelerate to train huge chat gpt model. I am just not sure whether accelerate can use model parallel to train 300g chat bloomz. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
I wondered the same thing when I was using trlx. I found the following in the Accelerate documentation (Handling big models for inference)
It seems that model parallelism is still partially supported. |
We have never claimed supporting pipeline parallelism (where there is a schedule that split your batches in micro-batches and make sure all GPUs work at the same time) only sequential model parallelism (where the GPU1 waits for GPU 0 to finish and so forth). This is still quite fast if you batch your inputs together. |
Thanks for the detailed explanation 😄 |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hello, I just successfully run "summarize_rlhf/trlx_gptj_text_summarization.py", but I am not sure whether it is implemented using model parallel or not. I have to run large size gpt3, so I need to seperate the huge mode into several gpus using model paralled.
The text was updated successfully, but these errors were encountered: