Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPT4All v2.6.1 leaks VRAM when switching models #1840

Closed
2 tasks
lexisdude opened this issue Jan 15, 2024 · 9 comments
Closed
2 tasks

GPT4All v2.6.1 leaks VRAM when switching models #1840

lexisdude opened this issue Jan 15, 2024 · 9 comments
Labels
bug Something isn't working chat gpt4all-chat issues

Comments

@lexisdude
Copy link

System Info

Windows 10 22H2 128GB ram - AMD Ryzen 7 5700X 8-Core Processor / Nvidea GeForce RTX 3060

Information

  • The official example notebooks/scripts
  • My own modified scripts

Reproduction

  1. Load GPT4ALL
  2. Change dataset (ie: to Wizard-Vicuna-13B-Uncensored.Q8_0)
  3. Crash

Expected behavior

All my datasets were working fine before the update. After it started crashing, I even tried deleting and performing a full (clean) install - and it still crashes trying to switch to another dataset other than the one it loads with.

@GeniusBroccoli
Copy link

GeniusBroccoli commented Jan 15, 2024

Same here, clean install of everything doesn't help.
Windows 11 23H2, i5-12400, RTX 3060.

@cebtenzzre
Copy link
Member

cebtenzzre commented Jan 16, 2024

Could you try a smaller model, e.g. a Q4_0 or Q4_K_S? And please provide a link to where you're downloading it from. This model loads fine for me on CPU, but on my server with a new enough GPU it runs out of memory.

And please confirm the version of GPT4All you are using - and the last known good version, if you know it.

@cebtenzzre cebtenzzre added the chat gpt4all-chat issues label Jan 16, 2024
@GeniusBroccoli
Copy link

Could you try a smaller model, e.g. a Q4_0 or Q4_K_S? And please provide a link to where you're downloading it from. This model loads fine for me on CPU, but on my server with a new enough GPU it runs out of memory.

And please confirm the version of GPT4All you are using - and the last known good version, if you know it.

In my case 2.5.4 worked fine. I download models from application, I used mistral, falcon (both GPU), wizardlm (CPU). After update 2.5.4->2.6.1 I experience problems, that was explained by @lexisdude.

@lexisdude
Copy link
Author

lexisdude commented Jan 17, 2024

Could you try a smaller model, e.g. a Q4_0 or Q4_K_S? And please provide a link to where you're downloading it from. This model loads fine for me on CPU, but on my server with a new enough GPU it runs out of memory.

And please confirm the version of GPT4All you are using - and the last known good version, if you know it.

Hello

Lets ignore the example I gave and instead use Hermes, or Snoozy downloaded directly from the GPT4ALL application. They do the very same thing when switching to those. By default Mistral is the dataset that loads by default in the application when I first start it. Regardless of what, or how many datasets I have in the models directory, switching to any other dataset , causes GPT4ALL to crash.

The 2.5.4 version of the application works fine for anything I load into it , the 2.6.1 version crashes almost instantaneously when I select any other dataset regardless of it's size.

I have 3 datasets that I downloaded directly from the GPT4ALL application, the rest of the datasets came from
"https://huggingface.co/TheBloke/" repository ..

I would not say my rig is uber powerful or cannot run out of memory, but it has no problems with the 2.5.4 version of GPT4ALL, and does not have any issues with stable-diffusion XL, even processing requests from a 160 gig SDXL dataset.

@cebtenzzre
Copy link
Member

I see what's going on here. GPT4All eventually runs out of VRAM if you switch models enough times, due to a memory leak. Using larger models on a GPU with less VRAM will exacerbate this, especially on an OS like Windows that tends to fragment VRAM (and we don't handle that as well as we should).

@cebtenzzre cebtenzzre added the bug Something isn't working label Jan 17, 2024
@cebtenzzre cebtenzzre changed the title Update today causes GPT4All to crash when loading any other model than the one it loads with (Mistral). GPT4All v2.6.1 leaks VRAM when switching models Jan 17, 2024
@lexisdude
Copy link
Author

I see what's going on here. GPT4All eventually runs out of VRAM if you switch models enough times, due to a memory leak. Using larger models on a GPU with less VRAM will exacerbate this, especially on an OS like Windows that tends to fragment VRAM (and we don't handle that as well as we should).

Awesome, hopefully there will be a fix coming soon because there should be no doubt, I love using GPT4ALL. And windows - for all insinuating purposes, breaks more things than it fixes. But I didn't say that.

@Chris2000SP
Copy link

Chris2000SP commented Jan 22, 2024

This is bad for the Display-Server under Linux with 1 GPU only. I had to sysrq me out because of VRAM get 100% full if i didn't kill it fast enough though. That's bad.
EDIT:
OK, sorry. It is because of more than 2 Threads on GPU then it Crash the Display-Server. Not because of VRAM full. I make a issue for that.

@cebtenzzre
Copy link
Member

This should be finally fixed as of eadc3b8

@cebtenzzre cebtenzzre added the awaiting-release issue is awaiting next release label Jan 31, 2024
@cebtenzzre
Copy link
Member

Fixed in v2.6.2. It still seems like VRAM is leaked when model loading fails, but that's a separate issue.

@cebtenzzre cebtenzzre removed the awaiting-release issue is awaiting next release label Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working chat gpt4all-chat issues
Projects
None yet
Development

No branches or pull requests

4 participants