New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU support via NVBLAS #239
Conversation
@ggerganov Just "ported" the good work from ggerganov/llama.cpp#1044 to whisper and the performance improvement is significant... shall I PR it? Is this the way to go with Whisper as well? |
NVBLAS or cuBLAS ? |
let me know if someone else is working on this feature; otherwise I'll put my hands on it EDIT: I was too curious and I implemented it already: https://github.com/dakk/whisper.cpp using cublas. Let me know if I can create the PR. |
I want to try your code, what should I do? I've cloned your repo and i've the ggml model. And now? (i'm on linux if it can be relevant) |
|
@dakk I can confirm it works well with my GPU, but I notice one thing. With standard whisper I can use medium model with my RTX 2060 (6GB). Now with whisper.cpp CUDA enabled it runs out of memory. |
Forgot to free some memory; now should run fine.
|
@DanielusG https://bitbucket.org/sensika/whisper.cpp/branch/cublas medium consumes 580 MB of vmem for me: |
Yes! it work very well. Like a charm :) |
I will add the cuBLAS support from the |
Hi all, I am backporting the cuBLAS support from Unfortunately, I don't observe significant speed-up. It's quite small actually Edit: nevermind - it is much faster actually. Was using too old GPU before |
Superseded by #834 |
ref #220
WIP IN PROGRESS