You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
I noticed that when quantization config is not None and is_deepspeed_zero3_enabled() is True, the device map is 'cpu'. Thus the quantization process is on CPU.
Why this? If the quantization can be run on the GPUs?
Expected behavior
--
The text was updated successfully, but these errors were encountered:
The GPU memory usage during from_pretrained is very slow, as the quantization process is going on CPU.
Same with other quantization method, like EETQ and AWQ.
System Info
transformers
version: 4.41.0.dev0Who can help?
@SunMarc and @younesbelkada
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I noticed that when quantization config is not None and is_deepspeed_zero3_enabled() is True, the device map is 'cpu'. Thus the quantization process is on CPU.
Why this? If the quantization can be run on the GPUs?
Expected behavior
--
The text was updated successfully, but these errors were encountered: