-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMD quantize #6
Comments
Nothing is generated in the model folder? Can you provide more details on what's being printed? |
i can run inference:
but to quantize the message is:
trying GPTQ:
content on my folder: /checkpoints/meta-llama/Llama-2-7b-chat-hf> ls
pip list
python --version |
The performance here is a lot lower than I'd expect. What GPU are you using? As for the quantization note, perhaps the issue is that you're running out of CPU memory at some point during the process? I don't see any reason why the quantization script would stop in the middle. |
I am using iGPU from ryzen 5600g CPU. Yes, to quantize I must have more memory. Thanks. |
trying to quantize and no model is generated
my hardware is amd
The text was updated successfully, but these errors were encountered: