Utilize System RAM to Extend VRAM and Overcome VRAM Limitations #2208

msjsc001 · 2024-03-02T03:09:39Z

Dear Jan.ai Development Team,

Greetings!

I have come across a significant limitation while utilizing the Jan.ai platform that I believe impacts a wide range of users. The obstacle in question pertains to the limitations set by VRAM capacities when running AI models. As VRAM is an integral component of computer hardware that cannot be upgraded independently, users looking to enhance their VRAM capacity face the challenge of having to replace their entire graphics card. Given that consumer-grade graphics cards seldom come with more than 8 to 12 GB of VRAM, this becomes a substantial hurdle.

Considering the relative ease of upgrading system RAM as opposed to VRAM, I would like to propose a feature that could potentially offer a solution. Would it be possible for Jan.ai to enable a functionality where system RAM could be used to supplement VRAM when it falls short? This could significantly improve the user experience, particularly for those with less capable hardware, enabling a broader user base to efficiently utilize AI models without the prerequisite of high-end GPU investments.

I am cognizant of the technical complexities that such a feature may entail, but the benefits it could yield for users without the means to invest in advanced hardware could be tremendous. Enabling system RAM to assist with VRAM limitations would not only enhance the accessibility of AI models on the platform but also underscore Jan.ai's commitment to user inclusivity.

Thank you for considering this suggestion. I eagerly await your response and am hopeful that this proposal will be met with positive consideration, ultimately leading to an even more versatile and accessible Jan.ai community.

Warm regards,
msjsc001

pabbuzgar · 2024-04-08T22:02:15Z

I support this request. These days I've been testing the Jan.ai application, trying it out with both CPU and GPU, and there's a clear, significant difference in favor of the GPU. However, the models that can be used are limited by VRAM when using it with the GPU. Using the models with the GPU is almost instantaneous and it barely heats up, whereas using it with the CPU is very slow and results in high usage and overheating. Therefore, I believe implementing the functionality mentioned by msjsc001 could be an excellent advancement for this application. However, I'm not sure if there's a easy way to configure this, for example, with the "config.json" file—I believe that's what it's called. Although I've attempted to "modify" the VRAM in this file, it seems that the program restores it every time it's opened. There might be another straightforward option to solve this.

0xSage · 2024-06-11T02:05:53Z

we'll only support if llamacpp has it. though iirc, its not fast in practice

msjsc001 added the type: feature request A new feature label Mar 2, 2024

Van-QA assigned hiento09 Mar 5, 2024

Van-QA mentioned this issue May 7, 2024

docs: llama.cpp/GGUF CPU offloading no longer present? #2859

Closed

0xSage closed this as completed Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utilize System RAM to Extend VRAM and Overcome VRAM Limitations #2208

Utilize System RAM to Extend VRAM and Overcome VRAM Limitations #2208

msjsc001 commented Mar 2, 2024 •

edited

Loading

pabbuzgar commented Apr 8, 2024

0xSage commented Jun 11, 2024

Utilize System RAM to Extend VRAM and Overcome VRAM Limitations #2208

Utilize System RAM to Extend VRAM and Overcome VRAM Limitations #2208

Comments

msjsc001 commented Mar 2, 2024 • edited Loading

pabbuzgar commented Apr 8, 2024

0xSage commented Jun 11, 2024

msjsc001 commented Mar 2, 2024 •

edited

Loading