bloom-176b CUDA out of memory on 8* A100 80g #17

Niko-zyf · 2023-06-26T11:49:03Z

Thanks for your work on support the bloom model. I have already put the --parallel or --auto_parallel argument on my script, but still can't comput AWQ on my 8* A100 80g server.
python -m awq.entry_new_lambada --model_path $model_path/$MODEL
--w_bit 4 --q_group_size 128
--run_awq --dump_awq awq_cache/$MODEL-w4-g128.pt --parallel

How can I fix this problem?

The text was updated successfully, but these errors were encountered:

Sakits · 2023-06-27T19:47:27Z

Hi @415905716,
We have added the CPU offloading support for run_awq in the dev/more_models branch. Now you should able to run awq for bloom-176b on a single A100. Welcome to try it out and feel free to bring up any issues you might encounter!

Thanks for your interest in our work!

Niko-zyf · 2023-06-28T07:30:12Z

I appreciate it so much. I pulled the latest code in the dev/more_models branch, and run bloom-176b successfully. I noticed that the only cuda:0 is used now , and I already open the --parallel argument. If I missed some argument or the awq can only run on a single gpu ?

abhinavkulkarni · 2023-07-03T04:25:19Z

@415905716: You may want to pull changes from my PR: #22

You can specify a --max_memory argument to specify what parts of your models should be loaded on which GPUs and CPU.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bloom-176b CUDA out of memory on 8* A100 80g #17

bloom-176b CUDA out of memory on 8* A100 80g #17

Niko-zyf commented Jun 26, 2023

Sakits commented Jun 27, 2023 •

edited

Loading

Niko-zyf commented Jun 28, 2023

abhinavkulkarni commented Jul 3, 2023

bloom-176b CUDA out of memory on 8* A100 80g #17

bloom-176b CUDA out of memory on 8* A100 80g #17

Comments

Niko-zyf commented Jun 26, 2023

Sakits commented Jun 27, 2023 • edited Loading

Niko-zyf commented Jun 28, 2023

abhinavkulkarni commented Jul 3, 2023

Sakits commented Jun 27, 2023 •

edited

Loading