New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Working with Whisper-large-v3 #547
Comments
For anyone else facing this problem, here's a hacky solution that worked for me. model.feature_extractor.mel_filters = model.feature_extractor.get_mel_filters(model.feature_extractor.sampling_rate, model.feature_extractor.n_fft, n_mels=128) |
@UmarRamzan Hey I'm trying to use Whisper Large v3 with faster_whisper, but I don't think it's been officially added. How did you do it? Also are you able to implement batching as shown here -- https://huggingface.co/openai/whisper-large-v3 The only reason I wish to use faster_whisper is cause it provides things like srt, verbose, word level transcription |
Where exactly do I change this line of code? |
After calling the model, pass this line |
cool, thanks! |
I'm trying to follow @UmarRamzan's tip, but the file for large-v3 is not found (tried to patch # this code do not work!
import faster_whisper
faster_whisper.utils._MODELS["large-v3"] = "guillaumekln/faster-whisper-large-v3" # does not exist
model_size = "large-v3" # can be one of `faster_whisper.utils._MODELS.keys()`
device = "cuda"
compute_type = "float16"
model = faster_whisper.WhisperModel(model_size, device=device, compute_type=compute_type)
if model_size == "large-v3":
model.feature_extractor.mel_filters = model.feature_extractor.get_mel_filters(model.feature_extractor.sampling_rate, model.feature_extractor.n_fft, n_mels=128)
# ...do stuff with `model`... We probably need to wait for @guillaumekln to generate a new |
from faster_whisper import WhisperModel |
Can we also use batching with faster_whisper? Cause batching makes the inference speed really fast |
It does not work. There's no such model in https://huggingface.co/guillaumekln?sort_models=modified#models |
For those who are planning to work with whisper v3, kindly note that our current version of faster-whisper does not support faster-whisper v3. Instead, you will have to either modify your version of faster-whisper to make it work (which is what I did) or you will have to use someone else's fork of the project. I edited mine based on Bungerr's code and used Purfview's model. |
Don't hold your breath, usually the big 'evil' corps don't want their employers contributing to the community, even if they do on their own free time, some make crazy contracts that THEY own any code line written by you, even if you wrote it in your sleep. ;) |
anyone noticed that large-v3 translates when setting it to "transcribe" and transcribes when set to "translate" ? Seems odd to me and i haven't changed anything else in my code that could explain that. 🤔 |
That's why. |
well. i updated to the latest ctranslate library and changed the mel_filters. is there anything else i should change? |
Dunno what you changed on your side, everything you need is discussed in this PR -> #548 |
I've converted the large-v3 model, uploaded to Hugging Face and implemented the changes on my fork of faster-whisper so the usage is the same, without any monkey-patch. More info on this PR. |
Can we also use batching with faster_whisper? |
Okay. I saw https://huggingface.co/turicas/faster-whisper-large-v3 and there, the issue is also mentioned in the comment So i am not crazy that large-v3 translates when task is set to "transcribe" and transcribes (mostly) when task is set to "translate". with the official whisper repo and large-v1 and large-v2 i don't have that behaviour. sure we can do something like if model_size == "large-v3":
if task == "transcribe":
task = "translate"
elif task == "translate":
task = "transcribe" but that just feels wrong. okay. its apparently because of the wrong tokenizer. if using the transformers tokenizer, it is working correctly. |
@Sharrnah you're not crazy! The problem is that the tokenizer for large-v3 is different and so the tokens related to |
Why another PR, is #548 not working for you? You can test it -> https://github.com/Purfview/whisper-standalone-win/releases/tag/faster-whisper Not exact like PR548, but similar till they went "preprocessor_config.json" way. |
I need a codebase more like to the original one so can be used as a drop-in replacement on some systems (not using Windows though), and there's the tokenizer thing I'm not sure the correct way, so preferred to force using the one from OpenAI. |
Thanks. i already changed it to use the OpenAI tokenizer. But you are sure the only token ids that have changed are the ones for <|transcribe|>/<|translate|> ? I would not be 100% sure because it still behaved a little bit strange even when i set it to "translate" (and as such it was more or less transcribing) Will have to test with the OpenAI tokenizer to see if that prevents the occasional strange behaviour. |
I don't get it. I didn't pick any token by hand - my fork just uses the openai/whisper-large-v3 tokenizer directly and transcribe works as expected (ie does not translate). |
Sorry. i meant, that if only <|transcribe|>/<|translate|> id's have changed, we could just update an existing tokenizer.json and i was trying to say that i am not sure if that would be the only change because of my testing. Anyway. everything fine now and transcription / translation works for me also with v3. :) And someone already exported a tokenizer.json https://huggingface.co/bababababooey/faster-whisper-large-v3/tree/main maybe all the whisper-v3 issues and PRs should be merged (if thats possible) |
#578 added v3 support |
Invalid input features shape: expected an input with shape (1, 128, 3000), but got an input with shape (1, 80, 3000) instead
Could an option be added to change the input size of the feature extractor?
The text was updated successfully, but these errors were encountered: