Small VRAM System: Trimming VRAM allocation LocalAI + llama.cpp #9936
MattMalone
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Ok, so I have been running models on my 6G 1060 video card for a while using Hugging Face, Python, PyTorch, Cuda for some time. I decided to move to LocalAI in hopes of getting an OpenAI style web interface that I could call for further developments. I have been stymied for days. Regardless of all the settings I have been recommended to try, for the particular very small model I am attempting, the process always reports (DEBUG=true):
The model is of size 2.58G, it is tiny.
Nowhere in my YAML file have I ever set n_gpu_layers to be 99999999.
I have gradually added context_size, gpu_layers, no-mmap, mmap=false, everything google is telling me to do to get rid of the memory allocation error. I even have gpu_layers=0 now, so it should run in CPU memory, without any Cuda allocation at all.
From the beginning, and completely unchanged in any attempt, there is an attempt to allocate 8G on a 6G card, like it is a fixed default minimum, which fails and the LLM does not load. To be clear, the LLM does not require 8G to run. It should run passably on 4G, and be able to at least load on 3G. But LocalAI + llama.cpp refuse to allocate less than 8G.
I have exhausted the YAML settings google is familiar with for LocalAI. Is there anything else to cause LocalAI to attempt to allocation anything less than 8G ?
I struggle to understand how no one has encountered this problem as there must be a few people who are trying to work with cards less than 10-12G (where 8G might be allocated along side OS usage of VRAM) but nothing comes up in searches of this discussion group.
Beta Was this translation helpful? Give feedback.
All reactions