You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your work on support the bloom model. I have already put the --parallel or --auto_parallel argument on my script, but still can't comput AWQ on my 8* A100 80g server.
python -m awq.entry_new_lambada --model_path $model_path/$MODEL
--w_bit 4 --q_group_size 128
--run_awq --dump_awq awq_cache/$MODEL-w4-g128.pt --parallel
How can I fix this problem?
The text was updated successfully, but these errors were encountered:
Hi @415905716,
We have added the CPU offloading support for run_awq in the dev/more_models branch. Now you should able to run awq for bloom-176b on a single A100. Welcome to try it out and feel free to bring up any issues you might encounter!
I appreciate it so much. I pulled the latest code in the dev/more_models branch, and run bloom-176b successfully. I noticed that the only cuda:0 is used now , and I already open the --parallel argument. If I missed some argument or the awq can only run on a single gpu ?
Thanks for your work on support the bloom model. I have already put the
--parallel
or--auto_parallel
argument on my script, but still can't comput AWQ on my 8* A100 80g server.python -m awq.entry_new_lambada --model_path $model_path/$MODEL
--w_bit 4 --q_group_size 128
--run_awq --dump_awq awq_cache/$MODEL-w4-g128.pt --parallel
How can I fix this problem?
The text was updated successfully, but these errors were encountered: