Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU support via NVBLAS #239

Closed
wants to merge 1 commit into from
Closed

GPU support via NVBLAS #239

wants to merge 1 commit into from

Conversation

ggerganov
Copy link
Owner

ref #220

WIP IN PROGRESS

@ggerganov ggerganov changed the title GPU support via NVLBLAS GPU support via NVBLAS Dec 8, 2022
@vakkov
Copy link

vakkov commented Apr 19, 2023

@ggerganov Just "ported" the good work from ggerganov/llama.cpp#1044 to whisper and the performance improvement is significant... shall I PR it? Is this the way to go with Whisper as well?

@Green-Sky
Copy link
Contributor

NVBLAS or cuBLAS ?

@dakk
Copy link

dakk commented Apr 20, 2023

let me know if someone else is working on this feature; otherwise I'll put my hands on it

EDIT: I was too curious and I implemented it already: https://github.com/dakk/whisper.cpp using cublas. Let me know if I can create the PR.

@DanielusG
Copy link

DanielusG commented Apr 20, 2023

let me know if someone else is working on this feature; otherwise I'll put my hands on it

EDIT: I was to curious and I implemented it already: https://github.com/dakk/whisper.cpp using cublas. Let me know if I can create the PR.

I want to try your code, what should I do? I've cloned your repo and i've the ggml model. And now? (i'm on linux if it can be relevant)

@dakk
Copy link

dakk commented Apr 20, 2023

let me know if someone else is working on this feature; otherwise I'll put my hands on it
EDIT: I was to curious and I implemented it already: https://github.com/dakk/whisper.cpp using cublas. Let me know if I can create the PR.

I want to try your code, what should I do? I've cloned your repo and i've the ggml model. And now? (i'm on linux if it can be relevant)

mkdir build
cd build
cmake .. -DWHISPER_SUPPORT_CUBLAS=TRUE
make
cd ..
./build/bin/main -m models/ggml-tiny.en.bin -f samples/jfk.wav

@DanielusG
Copy link

@dakk I can confirm it works well with my GPU, but I notice one thing. With standard whisper I can use medium model with my RTX 2060 (6GB). Now with whisper.cpp CUDA enabled it runs out of memory.

@dakk
Copy link

dakk commented Apr 21, 2023

@dakk I can confirm it works well with my GPU, but I notice one thing. With standard whisper I can use medium model with my RTX 2060 (6GB). Now with whisper.cpp CUDA enabled it runs out of memory.

Forgot to free some memory; now should run fine.

git pull && cd build && make

@vakkov
Copy link

vakkov commented Apr 21, 2023

@DanielusG https://bitbucket.org/sensika/whisper.cpp/branch/cublas

medium consumes 580 MB of vmem for me:
WHISPER_CUBLAS=1 make

@DanielusG
Copy link

@dakk I can confirm it works well with my GPU, but I notice one thing. With standard whisper I can use medium model with my RTX 2060 (6GB). Now with whisper.cpp CUDA enabled it runs out of memory.

Forgot to free some memory; now should run fine.

git pull && cd build && make

Yes! it work very well. Like a charm :)

@ggerganov
Copy link
Owner Author

I will add the cuBLAS support from the llama.cpp repo to whisper.cpp for the next release

@ggerganov
Copy link
Owner Author

ggerganov commented Apr 29, 2023

Hi all, I am backporting the cuBLAS support from llama.cpp here: #834

Unfortunately, I don't observe significant speed-up. It's quite small actually
@vakkov Will appreciate if you take a look if I did something wrong. Maybe it is just my system and it actually works better on others - not sure

Edit: nevermind - it is much faster actually. Was using too old GPU before

@ggerganov
Copy link
Owner Author

Superseded by #834

@ggerganov ggerganov closed this Apr 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants