-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[question] how to apply model parallism to solve cuda memory error #1510
Comments
also have tried to use examples/accelerate_configs/deepspeed_zero3.yaml
when set max_seq_length to 4096, it can train on MP manner on 8 gpus (A40 48GB) but when increase max_seq_length to higher number , for example, 10000, it crashes due to OOM. |
my launching script.
|
just change max_seq_length from 4096 to 5000 or 5120, without any other change, will also cause oom error:
|
Besides training, model parallelism in the trl chat would be welcome too. |
@yananchen1989 any suggestions here? |
any updates? |
i go back to single gpu training. especially when using PPO. There are some issues when using multi-gpu with deepspeed_zero3.yaml. |
hi team. I am using the SFT and PPO code to train my model, link https://github.com/huggingface/trl/tree/main/examples/scripts.
Due to long context length and 7B-level model size, I am facing cuda memory issue on my single gpu.
Is there any straightforward manner to utilize multiple gpus on my server to train the model thru SFT and PPO script ?
such as spliting the model to multiple gpus as model parallism. Is there any argument parameters I can directly pass into my training script ?
Thanks a lot.
The text was updated successfully, but these errors were encountered: