dual GPU 8G/16G - CUDA error: out of memory with dolphin-mixtral #3460

sebastianlau · 2024-04-02T16:52:16Z

What is the issue?

Ollama crashes out entirely with error (throws error, then terminates process)

[CUDA error: out of memory current device: 0, in function alloc at C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:445 cudaMalloc((void **) &ptr, look_ahead_size) GGML_ASSERT: C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:193: !"CUDA error"](error: out of memory current device: 0, in function alloc at C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:445 cudaMalloc((void **) &ptr, look_ahead_size) GGML_ASSERT: C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:193: !"CUDA error")

What did you expect to see?

Output (any)

Steps to reproduce

Start Ollama / Navigate to Open WebUI
Enter any text

Notes:

used dolphin-mixtral as the model
CUDA_VISIBLE_DEVICES used to set GPU order (16GB, 8GB)

Are there any recent changes that introduced the issue?

Update from 0.1.29 to 0.1.30 (reverting back to 0.1.29 fixed)

OS

Windows

Architecture

amd64

Platform

No response

Ollama version

0.1.30

GPU

Nvidia

GPU info

GPU 0: NVIDIA GeForce GTX 1080 (8GB)
GPU 1: Tesla P100-PCIE-16GB

CPU

AMD

Other software

Windows Server 2022 Standard x64 21H2

Zig1375 · 2024-04-03T09:41:30Z

I encounter the same issue from time to time when num_ctx is set to 2048.
If num_ctx is set to 4096 or higher, the error occurs consistently (using a Nvidia 4070 with 12GB of memory (RAM 64GB)).

FonzieBonzo · 2024-04-04T09:50:01Z

Same as this, someone is working on it.....

darkdev04 · 2024-04-10T01:33:13Z

Previously it was running well but after some time, started to show same error that :

requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

then i checked "C:\Users<username>\AppData\Local\Ollama\server.log" file and found following error at end of file:

CUDA error: out of memory
  current device: 0, in function alloc at C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:532
  cuMemSetAccess(pool_addr + pool_size, reserve_size, &access, 1)
GGML_ASSERT: C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:193: !"CUDA error"

then i tried with following solution :
modifying values of num_ctx & num_gpu it resolved

but after this it is consuming too much RAM about 90% of my RAM!, but yah its running 👍

dhiltgen · 2024-06-01T22:44:34Z

I don't have a test environment to verify this asymmetry, but PR #4517 may fix this.

Zig1375 · 2024-06-02T11:35:38Z

On my side it works fine now, I haven't being seeing this error how about a few weeks or maybe a month.

sebastianlau · 2024-06-03T13:32:24Z

I just checked and it "seems" to work with WebUI 0.2.2 and ollama 0.1.41
I say seems because a) it was incredibly slow (at least 2 times slower than when I used 0.1.29) and b) the UI had issues (not sure if this is due to the UI or API though) -- seen as the title not updating and the response only being visible by navigating away then back (or refreshing)

sebastianlau added bug Something isn't working needs-triage labels Apr 2, 2024

pdevine added nvidia Issues relating to Nvidia GPUs and CUDA gpu and removed needs-triage labels Apr 12, 2024

pdevine assigned dhiltgen Apr 12, 2024

dhiltgen assigned mxyng and unassigned dhiltgen Apr 12, 2024

dhiltgen assigned dhiltgen and unassigned mxyng Jun 1, 2024

dhiltgen added the windows label Jun 1, 2024

dhiltgen changed the title ~~CUDA error: out of memory with 0.1.30~~ dual GPU 8G/16G - CUDA error: out of memory with 0.1.30 Jun 1, 2024

dhiltgen changed the title ~~dual GPU 8G/16G - CUDA error: out of memory with 0.1.30~~ dual GPU 8G/16G - CUDA error: out of memory with dolphin-mixtral Jun 1, 2024

sebastianlau closed this as completed Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dual GPU 8G/16G - CUDA error: out of memory with dolphin-mixtral #3460

dual GPU 8G/16G - CUDA error: out of memory with dolphin-mixtral #3460

sebastianlau commented Apr 2, 2024

Zig1375 commented Apr 3, 2024

Uh oh!

FonzieBonzo commented Apr 4, 2024

Uh oh!

darkdev04 commented Apr 10, 2024

Uh oh!

dhiltgen commented Jun 1, 2024

Uh oh!

Zig1375 commented Jun 2, 2024

Uh oh!

sebastianlau commented Jun 3, 2024 •

edited

Loading

Uh oh!

dual GPU 8G/16G - CUDA error: out of memory with dolphin-mixtral #3460

dual GPU 8G/16G - CUDA error: out of memory with dolphin-mixtral #3460

Comments

sebastianlau commented Apr 2, 2024

What is the issue?

What did you expect to see?

Steps to reproduce

Are there any recent changes that introduced the issue?

OS

Architecture

Platform

Ollama version

GPU

GPU info

CPU

Other software

Zig1375 commented Apr 3, 2024

Uh oh!

FonzieBonzo commented Apr 4, 2024

Uh oh!

darkdev04 commented Apr 10, 2024

Uh oh!

dhiltgen commented Jun 1, 2024

Uh oh!

Zig1375 commented Jun 2, 2024

Uh oh!

sebastianlau commented Jun 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sebastianlau commented Jun 3, 2024 •

edited

Loading