Skip to content
Discussion options

You must be logged in to vote

Regarding your second run, the DefaultCPUAllocator one:
Before a 4-bit GPTQ model model can be loaded into the GPU memory (VRAM), it must be loaded into the main RAM first. It usually takes up about 1.5 times mode space in RAM than on disk (because reasons). Look at alpaca-30b-4bit.pt, measure its size on disk and multiply it by 1.5 — that’s how much actual free RAM you need before python starts loading it. You might just not have this much. Happened to me before. I can’t even use 13B models with my 16G of RAM.
Another thing to keep in mind when working with GPTQ models is that they cannot be split between VRAM, RAM and disk cache, so --auto-devices, --gpu-memory, --cpu-memory, --disk flags

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
1 reply
@tr7zw
Comment options

Comment options

You must be logged in to vote
1 reply
@tr7zw
Comment options

Answer selected by tr7zw
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants