Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bloom-176b CUDA out of memory on 8* A100 80g #17

Open
Niko-zyf opened this issue Jun 26, 2023 · 3 comments
Open

bloom-176b CUDA out of memory on 8* A100 80g #17

Niko-zyf opened this issue Jun 26, 2023 · 3 comments

Comments

@Niko-zyf
Copy link

Thanks for your work on support the bloom model. I have already put the --parallel or --auto_parallel argument on my script, but still can't comput AWQ on my 8* A100 80g server.
python -m awq.entry_new_lambada --model_path $model_path/$MODEL
--w_bit 4 --q_group_size 128
--run_awq --dump_awq awq_cache/$MODEL-w4-g128.pt --parallel

How can I fix this problem?

@Sakits
Copy link
Collaborator

Sakits commented Jun 27, 2023

Hi @415905716,
We have added the CPU offloading support for run_awq in the dev/more_models branch. Now you should able to run awq for bloom-176b on a single A100. Welcome to try it out and feel free to bring up any issues you might encounter!

Thanks for your interest in our work!

@Niko-zyf
Copy link
Author

I appreciate it so much. I pulled the latest code in the dev/more_models branch, and run bloom-176b successfully. I noticed that the only cuda:0 is used now , and I already open the --parallel argument. If I missed some argument or the awq can only run on a single gpu ?

@abhinavkulkarni
Copy link

@415905716: You may want to pull changes from my PR: #22

You can specify a --max_memory argument to specify what parts of your models should be loaded on which GPUs and CPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants