Skip to content

dual GPU 8G/16G - CUDA error: out of memory with dolphin-mixtral #3460

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sebastianlau opened this issue Apr 2, 2024 · 6 comments
Closed
Assignees
Labels
bug Something isn't working gpu nvidia Issues relating to Nvidia GPUs and CUDA windows

Comments

@sebastianlau
Copy link

What is the issue?

Ollama crashes out entirely with error (throws error, then terminates process)

[CUDA error: out of memory current device: 0, in function alloc at C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:445 cudaMalloc((void **) &ptr, look_ahead_size) GGML_ASSERT: C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:193: !"CUDA error"](error: out of memory current device: 0, in function alloc at C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:445 cudaMalloc((void **) &ptr, look_ahead_size) GGML_ASSERT: C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:193: !"CUDA error")

What did you expect to see?

Output (any)

Steps to reproduce

  1. Start Ollama / Navigate to Open WebUI
  2. Enter any text

Notes:

  • used dolphin-mixtral as the model
  • CUDA_VISIBLE_DEVICES used to set GPU order (16GB, 8GB)

Are there any recent changes that introduced the issue?

Update from 0.1.29 to 0.1.30 (reverting back to 0.1.29 fixed)

OS

Windows

Architecture

amd64

Platform

No response

Ollama version

0.1.30

GPU

Nvidia

GPU info

GPU 0: NVIDIA GeForce GTX 1080 (8GB)
GPU 1: Tesla P100-PCIE-16GB

CPU

AMD

Other software

Windows Server 2022 Standard x64 21H2

@sebastianlau sebastianlau added bug Something isn't working needs-triage labels Apr 2, 2024
@Zig1375
Copy link

Zig1375 commented Apr 3, 2024

I encounter the same issue from time to time when num_ctx is set to 2048.
If num_ctx is set to 4096 or higher, the error occurs consistently (using a Nvidia 4070 with 12GB of memory (RAM 64GB)).

@FonzieBonzo
Copy link

Same as this, someone is working on it.....

@darkdev04
Copy link

Previously it was running well but after some time, started to show same error that :

requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

then i checked "C:\Users<username>\AppData\Local\Ollama\server.log" file and found following error at end of file:

CUDA error: out of memory
  current device: 0, in function alloc at C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:532
  cuMemSetAccess(pool_addr + pool_size, reserve_size, &access, 1)
GGML_ASSERT: C:\a\ollama\ollama\llm\llama.cpp\ggml-cuda.cu:193: !"CUDA error"

then i tried with following solution :
modifying values of num_ctx & num_gpu it resolved

image

but after this it is consuming too much RAM about 90% of my RAM!, but yah its running 👍

image

@pdevine pdevine added nvidia Issues relating to Nvidia GPUs and CUDA gpu and removed needs-triage labels Apr 12, 2024
@dhiltgen dhiltgen assigned mxyng and unassigned dhiltgen Apr 12, 2024
@dhiltgen dhiltgen assigned dhiltgen and unassigned mxyng Jun 1, 2024
@dhiltgen
Copy link
Collaborator

dhiltgen commented Jun 1, 2024

I don't have a test environment to verify this asymmetry, but PR #4517 may fix this.

@dhiltgen dhiltgen changed the title CUDA error: out of memory with 0.1.30 dual GPU 8G/16G - CUDA error: out of memory with 0.1.30 Jun 1, 2024
@dhiltgen dhiltgen changed the title dual GPU 8G/16G - CUDA error: out of memory with 0.1.30 dual GPU 8G/16G - CUDA error: out of memory with dolphin-mixtral Jun 1, 2024
@Zig1375
Copy link

Zig1375 commented Jun 2, 2024

On my side it works fine now, I haven't being seeing this error how about a few weeks or maybe a month.

@sebastianlau
Copy link
Author

sebastianlau commented Jun 3, 2024

I just checked and it "seems" to work with WebUI 0.2.2 and ollama 0.1.41
I say seems because a) it was incredibly slow (at least 2 times slower than when I used 0.1.29) and b) the UI had issues (not sure if this is due to the UI or API though) -- seen as the title not updating and the response only being visible by navigating away then back (or refreshing)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gpu nvidia Issues relating to Nvidia GPUs and CUDA windows
Projects
None yet
Development

No branches or pull requests

7 participants