Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working with Whisper-large-v3 #547

Closed
UmarRamzan opened this issue Nov 7, 2023 · 25 comments
Closed

Working with Whisper-large-v3 #547

UmarRamzan opened this issue Nov 7, 2023 · 25 comments

Comments

@UmarRamzan
Copy link

Invalid input features shape: expected an input with shape (1, 128, 3000), but got an input with shape (1, 80, 3000) instead

Could an option be added to change the input size of the feature extractor?

@UmarRamzan UmarRamzan changed the title Working with Whisper-largge-v3 Working with Whisper-large-v3 Nov 7, 2023
@UmarRamzan
Copy link
Author

For anyone else facing this problem, here's a hacky solution that worked for me.

model.feature_extractor.mel_filters = model.feature_extractor.get_mel_filters(model.feature_extractor.sampling_rate, model.feature_extractor.n_fft, n_mels=128)

@souvikqb
Copy link

souvikqb commented Nov 8, 2023

@UmarRamzan Hey I'm trying to use Whisper Large v3 with faster_whisper, but I don't think it's been officially added. How did you do it?

Also are you able to implement batching as shown here -- https://huggingface.co/openai/whisper-large-v3
Cause batching speeds up the transcription process by a lot,

The only reason I wish to use faster_whisper is cause it provides things like srt, verbose, word level transcription

@skanda1005
Copy link

model.feature_extractor.mel_filters

Where exactly do I change this line of code?

@souvikqb
Copy link

souvikqb commented Nov 9, 2023

model.feature_extractor.mel_filters

Where exactly do I change this line of code?

After calling the model, pass this line

@skanda1005
Copy link

cool, thanks!

@turicas
Copy link

turicas commented Nov 11, 2023

I'm trying to follow @UmarRamzan's tip, but the file for large-v3 is not found (tried to patch faster_whisper.utils._MODELS also to add it, but don't know which model/file to point to):

# this code do not work!
import faster_whisper

faster_whisper.utils._MODELS["large-v3"] = "guillaumekln/faster-whisper-large-v3"  # does not exist
model_size = "large-v3"  # can be one of `faster_whisper.utils._MODELS.keys()`
device = "cuda"
compute_type = "float16"

model = faster_whisper.WhisperModel(model_size, device=device, compute_type=compute_type)
if model_size == "large-v3":
    model.feature_extractor.mel_filters = model.feature_extractor.get_mel_filters(model.feature_extractor.sampling_rate, model.feature_extractor.n_fft, n_mels=128)

# ...do stuff with `model`...

We probably need to wait for @guillaumekln to generate a new model.bin for large-v3 and upload it on hugging face before using these patches. =)

@anthonyyuan
Copy link

from faster_whisper import WhisperModel
#model = WhisperModel(model_size, device="cuda", compute_type="float16")
model = WhisperModel("faster-whisper-large-v3")

@souvikqb
Copy link

from faster_whisper import WhisperModel #model = WhisperModel(model_size, device="cuda", compute_type="float16") model = WhisperModel("faster-whisper-large-v3")

Can we also use batching with faster_whisper? Cause batching makes the inference speed really fast

@turicas
Copy link

turicas commented Nov 11, 2023

from faster_whisper import WhisperModel #model = WhisperModel(model_size, device="cuda", compute_type="float16") model = WhisperModel("faster-whisper-large-v3")

It does not work. There's no such model in https://huggingface.co/guillaumekln?sort_models=modified#models

@blackpolarz
Copy link

For those who are planning to work with whisper v3, kindly note that our current version of faster-whisper does not support faster-whisper v3. Instead, you will have to either modify your version of faster-whisper to make it work (which is what I did) or you will have to use someone else's fork of the project. I edited mine based on Bungerr's code and used Purfview's model.

@Purfview
Copy link
Contributor

Purfview commented Nov 11, 2023

We probably need to wait for @guillaumekln to generate a new model.bin for large-v3 and upload it on hugging face before using these patches. =)

Don't hold your breath, usually the big 'evil' corps don't want their employers contributing to the community, even if they do on their own free time, some make crazy contracts that THEY own any code line written by you, even if you wrote it in your sleep. ;)

@Sharrnah
Copy link

anyone noticed that large-v3 translates when setting it to "transcribe" and transcribes when set to "translate" ?

Seems odd to me and i haven't changed anything else in my code that could explain that. 🤔

@Purfview
Copy link
Contributor

...i haven't changed anything else in my code...

That's why.

@Sharrnah
Copy link

well. i updated to the latest ctranslate library and changed the mel_filters. is there anything else i should change?

@Purfview
Copy link
Contributor

well. i updated to the latest ctranslate library and changed the mel_filters. is there anything else i should change?

Dunno what you changed on your side, everything you need is discussed in this PR -> #548

@turicas
Copy link

turicas commented Nov 12, 2023

I've converted the large-v3 model, uploaded to Hugging Face and implemented the changes on my fork of faster-whisper so the usage is the same, without any monkey-patch. More info on this PR.

@souvikqb
Copy link

I've converted the large-v3 model, uploaded to Hugging Face and implemented the changes on my fork of faster-whisper so the usage is the same, without any monkey-patch. More info on this PR.

Can we also use batching with faster_whisper?

@Sharrnah
Copy link

Sharrnah commented Nov 12, 2023

well. i updated to the latest ctranslate library and changed the mel_filters. is there anything else i should change?

Dunno what you changed on your side, everything you need is discussed in this PR -> #548

Okay. I saw https://huggingface.co/turicas/faster-whisper-large-v3 and there, the issue is also mentioned in the comment # TODO: for some reason it's translating, not transcribing

So i am not crazy that large-v3 translates when task is set to "transcribe" and transcribes (mostly) when task is set to "translate".

with the official whisper repo and large-v1 and large-v2 i don't have that behaviour.

sure we can do something like

if model_size == "large-v3":
  if task == "transcribe":
      task = "translate"
  elif task == "translate":
      task = "transcribe"

but that just feels wrong.

okay. its apparently because of the wrong tokenizer. if using the transformers tokenizer, it is working correctly.
see here: master...PythonicCafe:faster-whisper:feature/large-v3#diff-e027202a963607f9c883ac3365c78f2939f3e41bd8fb9b119ac718e0d3e2314cR144

@turicas
Copy link

turicas commented Nov 12, 2023

Okay. I saw https://huggingface.co/turicas/faster-whisper-large-v3 and there, the issue is also mentioned in the comment # TODO: for some reason it's translating, not transcribing

So i am not crazy that large-v3 translates when task is set to "transcribe" and transcribes (mostly) when task is set to "translate".

with the official whisper repo and large-v1 and large-v2 i don't have that behaviour.

sure we can do something like

if model_size == "large-v3":
  if task == "transcribe":
      task = "translate"
  elif task == "translate":
      task = "transcribe"

but that just feels wrong.

@Sharrnah you're not crazy! The problem is that the tokenizer for large-v3 is different and so the tokens related to <|transcribe|>/<|translate|> have different IDs. I've commited a new README to hf repository right now citing this - if you use my fork directly it'll work as a drop-in replacement of faster-whisper, but if you'd like to use the original faster-whisper, there are some monkey patches provided to change the tokenizer so you don't need to use this if/elif that feels wrong. :)

@Purfview
Copy link
Contributor

Purfview commented Nov 12, 2023

I've converted the large-v3 model, uploaded to Hugging Face and implemented the changes on my fork of faster-whisper so the usage is the same, without any monkey-patch. More info on this PR.

Why another PR, is #548 not working for you?

You can test it -> https://github.com/Purfview/whisper-standalone-win/releases/tag/faster-whisper

Not exact like PR548, but similar till they went "preprocessor_config.json" way.

@turicas
Copy link

turicas commented Nov 12, 2023

I've converted the large-v3 model, uploaded to Hugging Face and implemented the changes on my fork of faster-whisper so the usage is the same, without any monkey-patch. More info on this PR.

Why another PR, is #548 not working for you?

You can test it -> https://github.com/Purfview/whisper-standalone-win/releases/tag/faster-whisper

Not exact like PR548, but similar till they went "preprocessor_config.json" way.

I need a codebase more like to the original one so can be used as a drop-in replacement on some systems (not using Windows though), and there's the tokenizer thing I'm not sure the correct way, so preferred to force using the one from OpenAI.

@Sharrnah
Copy link

Okay. I saw https://huggingface.co/turicas/faster-whisper-large-v3 and there, the issue is also mentioned in the comment # TODO: for some reason it's translating, not transcribing
So i am not crazy that large-v3 translates when task is set to "transcribe" and transcribes (mostly) when task is set to "translate".
with the official whisper repo and large-v1 and large-v2 i don't have that behaviour.
sure we can do something like

if model_size == "large-v3":
  if task == "transcribe":
      task = "translate"
  elif task == "translate":
      task = "transcribe"

but that just feels wrong.

@Sharrnah you're not crazy! The problem is that the tokenizer for large-v3 is different and so the tokens related to <|transcribe|>/<|translate|> have different IDs. I've commited a new README to hf repository right now citing this - if you use my fork directly it'll work as a drop-in replacement of faster-whisper, but if you'd like to use the original faster-whisper, there are some monkey patches provided to change the tokenizer so you don't need to use this if/elif that feels wrong. :)

Thanks. i already changed it to use the OpenAI tokenizer. But you are sure the only token ids that have changed are the ones for <|transcribe|>/<|translate|> ? I would not be 100% sure because it still behaved a little bit strange even when i set it to "translate" (and as such it was more or less transcribing)

Will have to test with the OpenAI tokenizer to see if that prevents the occasional strange behaviour.

@turicas
Copy link

turicas commented Nov 13, 2023

Thanks. i already changed it to use the OpenAI tokenizer. But you are sure the only token ids that have changed are the ones for <|transcribe|>/<|translate|> ? I would not be 100% sure because it still behaved a little bit strange even when i set it to "translate" (and as such it was more or less transcribing)

Will have to test with the OpenAI tokenizer to see if that prevents the occasional strange behaviour.

I don't get it. I didn't pick any token by hand - my fork just uses the openai/whisper-large-v3 tokenizer directly and transcribe works as expected (ie does not translate).

@Sharrnah
Copy link

Thanks. i already changed it to use the OpenAI tokenizer. But you are sure the only token ids that have changed are the ones for <|transcribe|>/<|translate|> ? I would not be 100% sure because it still behaved a little bit strange even when i set it to "translate" (and as such it was more or less transcribing)
Will have to test with the OpenAI tokenizer to see if that prevents the occasional strange behaviour.

I don't get it. I didn't pick any token by hand - my fork just uses the openai/whisper-large-v3 tokenizer directly and transcribe works as expected (ie does not translate).

Sorry. i meant, that if only <|transcribe|>/<|translate|> id's have changed, we could just update an existing tokenizer.json and i was trying to say that i am not sure if that would be the only change because of my testing.

Anyway. everything fine now and transcription / translation works for me also with v3. :) And someone already exported a tokenizer.json https://huggingface.co/bababababooey/faster-whisper-large-v3/tree/main

maybe all the whisper-v3 issues and PRs should be merged (if thats possible)

@jnnnnn
Copy link

jnnnnn commented Nov 30, 2023

#578 added v3 support

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants