GPU support via NVBLAS #239

ggerganov · 2022-12-08T20:16:32Z

WIP IN PROGRESS

vakkov · 2023-04-19T14:19:43Z

@ggerganov Just "ported" the good work from ggerganov/llama.cpp#1044 to whisper and the performance improvement is significant... shall I PR it? Is this the way to go with Whisper as well?

Green-Sky · 2023-04-19T14:46:58Z

NVBLAS or cuBLAS ?

dakk · 2023-04-20T14:35:21Z

let me know if someone else is working on this feature; otherwise I'll put my hands on it

EDIT: I was too curious and I implemented it already: https://github.com/dakk/whisper.cpp using cublas. Let me know if I can create the PR.

DanielusG · 2023-04-20T18:40:58Z

let me know if someone else is working on this feature; otherwise I'll put my hands on it

EDIT: I was to curious and I implemented it already: https://github.com/dakk/whisper.cpp using cublas. Let me know if I can create the PR.

I want to try your code, what should I do? I've cloned your repo and i've the ggml model. And now? (i'm on linux if it can be relevant)

dakk · 2023-04-20T19:30:34Z

let me know if someone else is working on this feature; otherwise I'll put my hands on it
EDIT: I was to curious and I implemented it already: https://github.com/dakk/whisper.cpp using cublas. Let me know if I can create the PR.

I want to try your code, what should I do? I've cloned your repo and i've the ggml model. And now? (i'm on linux if it can be relevant)

mkdir build
cd build
cmake .. -DWHISPER_SUPPORT_CUBLAS=TRUE
make
cd ..
./build/bin/main -m models/ggml-tiny.en.bin -f samples/jfk.wav

DanielusG · 2023-04-20T23:05:24Z

@dakk I can confirm it works well with my GPU, but I notice one thing. With standard whisper I can use medium model with my RTX 2060 (6GB). Now with whisper.cpp CUDA enabled it runs out of memory.

dakk · 2023-04-21T06:16:34Z

@dakk I can confirm it works well with my GPU, but I notice one thing. With standard whisper I can use medium model with my RTX 2060 (6GB). Now with whisper.cpp CUDA enabled it runs out of memory.

Forgot to free some memory; now should run fine.

git pull && cd build && make

vakkov · 2023-04-21T10:29:50Z

@DanielusG https://bitbucket.org/sensika/whisper.cpp/branch/cublas

medium consumes 580 MB of vmem for me:
WHISPER_CUBLAS=1 make

DanielusG · 2023-04-21T21:41:09Z

@dakk I can confirm it works well with my GPU, but I notice one thing. With standard whisper I can use medium model with my RTX 2060 (6GB). Now with whisper.cpp CUDA enabled it runs out of memory.

Forgot to free some memory; now should run fine.

git pull && cd build && make

Yes! it work very well. Like a charm :)

ggerganov · 2023-04-23T14:12:51Z

I will add the cuBLAS support from the llama.cpp repo to whisper.cpp for the next release

ggerganov · 2023-04-29T17:07:27Z

Hi all, I am backporting the cuBLAS support from llama.cpp here: #834

Unfortunately, I don't observe significant speed-up. It's quite small actually
@vakkov Will appreciate if you take a look if I did something wrong. Maybe it is just my system and it actually works better on others - not sure

Edit: nevermind - it is much faster actually. Was using too old GPU before

ggerganov · 2023-04-30T09:00:12Z

Superseded by #834

ggml : initial tests with libnvblas

683f111

ggerganov changed the title ~~GPU support via NVLBLAS~~ GPU support via NVBLAS Dec 8, 2022

ggerganov mentioned this pull request Apr 30, 2023

No cuBLAS performance gain for F16 ggerganov/llama.cpp#1249

Closed

ggerganov closed this Apr 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU support via NVBLAS #239

GPU support via NVBLAS #239

ggerganov commented Dec 8, 2022

vakkov commented Apr 19, 2023 •

edited

Green-Sky commented Apr 19, 2023

dakk commented Apr 20, 2023 •

edited

DanielusG commented Apr 20, 2023 •

edited

dakk commented Apr 20, 2023 •

edited

DanielusG commented Apr 20, 2023

dakk commented Apr 21, 2023

vakkov commented Apr 21, 2023

DanielusG commented Apr 21, 2023

ggerganov commented Apr 23, 2023

ggerganov commented Apr 29, 2023 •

edited

ggerganov commented Apr 30, 2023

GPU support via NVBLAS #239

GPU support via NVBLAS #239

Conversation

ggerganov commented Dec 8, 2022

vakkov commented Apr 19, 2023 • edited

Green-Sky commented Apr 19, 2023

dakk commented Apr 20, 2023 • edited

DanielusG commented Apr 20, 2023 • edited

dakk commented Apr 20, 2023 • edited

DanielusG commented Apr 20, 2023

dakk commented Apr 21, 2023

vakkov commented Apr 21, 2023

DanielusG commented Apr 21, 2023

ggerganov commented Apr 23, 2023

ggerganov commented Apr 29, 2023 • edited

ggerganov commented Apr 30, 2023

vakkov commented Apr 19, 2023 •

edited

dakk commented Apr 20, 2023 •

edited

DanielusG commented Apr 20, 2023 •

edited

dakk commented Apr 20, 2023 •

edited

ggerganov commented Apr 29, 2023 •

edited