Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add support for loading GPTQ models on CPU
Right now, we can only load the GPTQ Quantized model on the CUDA device. The flag `load_gptq_on_cpu` adds the support to load the GPTQ models on the CPU. The larger variants of the model are hard to load/run/trace on the GPU and that's the rationale behind adding this flag. Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
- Loading branch information