-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torch.cuda.OutOfMemoryError: CUDA out of memory #6
Comments
For me, the |
Oh ok, yeah that would make sense. Do you think there might be a way to run the script in a way the uses less VRAM? |
You can try the 8bit (less precision) model: https://github.com/tloen/llama-int8 |
I found https://github.com/qwopqwop200/GPTQ-for-LLaMa which turn it to 4-bit, I can run the benchmark in that repo. But the code in that repo is using huggingface transformer and it seems that at least the model loading is different. |
I changed the Line 72 in 450d686
|
Thanks for the tips! |
|
Hello, i am using a Nvidia Gtx1650 4GB GPU. Are there any way to run the 7B model on it? |
Sure. I am still working on it... so that we can run 13B on a 4GB GPU. |
Thanks for making this repo! I was looking to run this on my own hardware and this is helping me do just that.
I first tried to run inference with Facebook's own instructions by I was getting a memory error. I tried a few other modifications but they did not work either.
Finally, I came to this repository to try and fix my problem. I'm still getting the same error, however.
Error:
I have been consistently seeing this error everytime that I've tried to run inference and I'm not sure how to fix it.
I run the inference with this command:
python inference.py --ckpt_dir ./llama-dl/7B --tokenizer_path ./llama-dl/tokenizer.model
My Specs:
With these specs it seems I should be able to run this version of inference but it still does not work.
Before running the program I ran the
free
command:So I definitely have more than the 8GB of ram shown in the README.
I would really appreciate your help, thanks!
The text was updated successfully, but these errors were encountered: