2080ti 11GB, after loading deepseek-r1:14b, it reaches 9GB, after accessing the embedding model of the knowledge base. The graphics card memory is almost full, and it is no longer possible to communicate with the chatbot and perform operations.
Can the embedded model and the computational model be temporarily stored in the running memory, and then called into the video memory for computation whenever needed? This is a good method for consumer users with small graphics card memory but large running memory.