-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting disable_exllama=True
in the quantization config objec
#2459
Comments
Hi, may I ask how do you load the model, in my case with single GPU I also had that problem and I had to use |
It can work, Thanks. @aliozts |
Disabling Exllama makes the entire inferencing much slower. Check out AutoGPTQ/AutoGPTQ#406 for how to enable Exllama. |
Why I cant run it in GPU? Although I am having NVIDIA GeForce Mx450. |
I have NVIDIA GTX 1650 still getting same error |
在config.json的quantization_config下加入"disable_exllama": true,即可解决问题。 |
I was running into a similar problem running GPTQ in a docker container. I was getting GPU: 1660Ti SOLUTION: for me I fixed the |
when trying to load quantized models i always get
ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting
disable_exllama=True
in the quantization config objecThe text was updated successfully, but these errors were encountered: