Working with Whisper-large-v3 #547

UmarRamzan · 2023-11-07T07:19:49Z

Invalid input features shape: expected an input with shape (1, 128, 3000), but got an input with shape (1, 80, 3000) instead

Could an option be added to change the input size of the feature extractor?

UmarRamzan · 2023-11-07T07:36:50Z

For anyone else facing this problem, here's a hacky solution that worked for me.

model.feature_extractor.mel_filters = model.feature_extractor.get_mel_filters(model.feature_extractor.sampling_rate, model.feature_extractor.n_fft, n_mels=128)

souvikqb · 2023-11-08T14:52:15Z

@UmarRamzan Hey I'm trying to use Whisper Large v3 with faster_whisper, but I don't think it's been officially added. How did you do it?

Also are you able to implement batching as shown here -- https://huggingface.co/openai/whisper-large-v3
Cause batching speeds up the transcription process by a lot,

The only reason I wish to use faster_whisper is cause it provides things like srt, verbose, word level transcription

skanda1005 · 2023-11-09T09:37:59Z

model.feature_extractor.mel_filters

Where exactly do I change this line of code?

souvikqb · 2023-11-09T09:40:04Z

model.feature_extractor.mel_filters

Where exactly do I change this line of code?

After calling the model, pass this line

skanda1005 · 2023-11-09T09:51:50Z

cool, thanks!

turicas · 2023-11-11T00:09:21Z

I'm trying to follow @UmarRamzan's tip, but the file for large-v3 is not found (tried to patch faster_whisper.utils._MODELS also to add it, but don't know which model/file to point to):

# this code do not work!
import faster_whisper

faster_whisper.utils._MODELS["large-v3"] = "guillaumekln/faster-whisper-large-v3"  # does not exist
model_size = "large-v3"  # can be one of `faster_whisper.utils._MODELS.keys()`
device = "cuda"
compute_type = "float16"

model = faster_whisper.WhisperModel(model_size, device=device, compute_type=compute_type)
if model_size == "large-v3":
    model.feature_extractor.mel_filters = model.feature_extractor.get_mel_filters(model.feature_extractor.sampling_rate, model.feature_extractor.n_fft, n_mels=128)

# ...do stuff with `model`...

We probably need to wait for @guillaumekln to generate a new model.bin for large-v3 and upload it on hugging face before using these patches. =)

anthonyyuan · 2023-11-11T01:18:45Z

from faster_whisper import WhisperModel
#model = WhisperModel(model_size, device="cuda", compute_type="float16")
model = WhisperModel("faster-whisper-large-v3")

souvikqb · 2023-11-11T03:16:08Z

from faster_whisper import WhisperModel #model = WhisperModel(model_size, device="cuda", compute_type="float16") model = WhisperModel("faster-whisper-large-v3")

Can we also use batching with faster_whisper? Cause batching makes the inference speed really fast

turicas · 2023-11-11T04:01:46Z

from faster_whisper import WhisperModel #model = WhisperModel(model_size, device="cuda", compute_type="float16") model = WhisperModel("faster-whisper-large-v3")

It does not work. There's no such model in https://huggingface.co/guillaumekln?sort_models=modified#models

blackpolarz · 2023-11-11T09:12:37Z

For those who are planning to work with whisper v3, kindly note that our current version of faster-whisper does not support faster-whisper v3. Instead, you will have to either modify your version of faster-whisper to make it work (which is what I did) or you will have to use someone else's fork of the project. I edited mine based on Bungerr's code and used Purfview's model.

Purfview · 2023-11-11T12:42:13Z

We probably need to wait for @guillaumekln to generate a new model.bin for large-v3 and upload it on hugging face before using these patches. =)

Don't hold your breath, usually the big 'evil' corps don't want their employers contributing to the community, even if they do on their own free time, some make crazy contracts that THEY own any code line written by you, even if you wrote it in your sleep. ;)

Sharrnah · 2023-11-11T16:36:20Z

anyone noticed that large-v3 translates when setting it to "transcribe" and transcribes when set to "translate" ?

Seems odd to me and i haven't changed anything else in my code that could explain that. 🤔

Purfview · 2023-11-11T16:38:31Z

...i haven't changed anything else in my code...

That's why.

Sharrnah · 2023-11-11T16:40:24Z

well. i updated to the latest ctranslate library and changed the mel_filters. is there anything else i should change?

Purfview · 2023-11-12T01:00:53Z

well. i updated to the latest ctranslate library and changed the mel_filters. is there anything else i should change?

Dunno what you changed on your side, everything you need is discussed in this PR -> #548

turicas · 2023-11-12T05:21:57Z

I've converted the large-v3 model, uploaded to Hugging Face and implemented the changes on my fork of faster-whisper so the usage is the same, without any monkey-patch. More info on this PR.

souvikqb · 2023-11-12T05:27:17Z

I've converted the large-v3 model, uploaded to Hugging Face and implemented the changes on my fork of faster-whisper so the usage is the same, without any monkey-patch. More info on this PR.

Can we also use batching with faster_whisper?

Sharrnah · 2023-11-12T06:24:03Z

well. i updated to the latest ctranslate library and changed the mel_filters. is there anything else i should change?

Dunno what you changed on your side, everything you need is discussed in this PR -> #548

Okay. I saw https://huggingface.co/turicas/faster-whisper-large-v3 and there, the issue is also mentioned in the comment # TODO: for some reason it's translating, not transcribing

So i am not crazy that large-v3 translates when task is set to "transcribe" and transcribes (mostly) when task is set to "translate".

with the official whisper repo and large-v1 and large-v2 i don't have that behaviour.

sure we can do something like

if model_size == "large-v3":
  if task == "transcribe":
      task = "translate"
  elif task == "translate":
      task = "transcribe"

but that just feels wrong.

okay. its apparently because of the wrong tokenizer. if using the transformers tokenizer, it is working correctly.
see here: master...PythonicCafe:faster-whisper:feature/large-v3#diff-e027202a963607f9c883ac3365c78f2939f3e41bd8fb9b119ac718e0d3e2314cR144

turicas · 2023-11-12T06:42:52Z

Okay. I saw https://huggingface.co/turicas/faster-whisper-large-v3 and there, the issue is also mentioned in the comment # TODO: for some reason it's translating, not transcribing

So i am not crazy that large-v3 translates when task is set to "transcribe" and transcribes (mostly) when task is set to "translate".

with the official whisper repo and large-v1 and large-v2 i don't have that behaviour.

sure we can do something like
if model_size == "large-v3":
  if task == "transcribe":
      task = "translate"
  elif task == "translate":
      task = "transcribe"
but that just feels wrong.

@Sharrnah you're not crazy! The problem is that the tokenizer for large-v3 is different and so the tokens related to <|transcribe|>/<|translate|> have different IDs. I've commited a new README to hf repository right now citing this - if you use my fork directly it'll work as a drop-in replacement of faster-whisper, but if you'd like to use the original faster-whisper, there are some monkey patches provided to change the tokenizer so you don't need to use this if/elif that feels wrong. :)

Purfview · 2023-11-12T10:34:11Z

I've converted the large-v3 model, uploaded to Hugging Face and implemented the changes on my fork of faster-whisper so the usage is the same, without any monkey-patch. More info on this PR.

Why another PR, is #548 not working for you?

You can test it -> https://github.com/Purfview/whisper-standalone-win/releases/tag/faster-whisper

Not exact like PR548, but similar till they went "preprocessor_config.json" way.

turicas · 2023-11-12T14:34:05Z

I've converted the large-v3 model, uploaded to Hugging Face and implemented the changes on my fork of faster-whisper so the usage is the same, without any monkey-patch. More info on this PR.

Why another PR, is #548 not working for you?

You can test it -> https://github.com/Purfview/whisper-standalone-win/releases/tag/faster-whisper

Not exact like PR548, but similar till they went "preprocessor_config.json" way.

I need a codebase more like to the original one so can be used as a drop-in replacement on some systems (not using Windows though), and there's the tokenizer thing I'm not sure the correct way, so preferred to force using the one from OpenAI.

Sharrnah · 2023-11-12T17:55:00Z

Okay. I saw https://huggingface.co/turicas/faster-whisper-large-v3 and there, the issue is also mentioned in the comment # TODO: for some reason it's translating, not transcribing
So i am not crazy that large-v3 translates when task is set to "transcribe" and transcribes (mostly) when task is set to "translate".
with the official whisper repo and large-v1 and large-v2 i don't have that behaviour.
sure we can do something like
if model_size == "large-v3":
  if task == "transcribe":
      task = "translate"
  elif task == "translate":
      task = "transcribe"
but that just feels wrong.
@Sharrnah you're not crazy! The problem is that the tokenizer for large-v3 is different and so the tokens related to <|transcribe|>/<|translate|> have different IDs. I've commited a new README to hf repository right now citing this - if you use my fork directly it'll work as a drop-in replacement of faster-whisper, but if you'd like to use the original faster-whisper, there are some monkey patches provided to change the tokenizer so you don't need to use this if/elif that feels wrong. :)

Thanks. i already changed it to use the OpenAI tokenizer. But you are sure the only token ids that have changed are the ones for <|transcribe|>/<|translate|> ? I would not be 100% sure because it still behaved a little bit strange even when i set it to "translate" (and as such it was more or less transcribing)

Will have to test with the OpenAI tokenizer to see if that prevents the occasional strange behaviour.

turicas · 2023-11-13T03:37:39Z

Thanks. i already changed it to use the OpenAI tokenizer. But you are sure the only token ids that have changed are the ones for <|transcribe|>/<|translate|> ? I would not be 100% sure because it still behaved a little bit strange even when i set it to "translate" (and as such it was more or less transcribing)

Will have to test with the OpenAI tokenizer to see if that prevents the occasional strange behaviour.

I don't get it. I didn't pick any token by hand - my fork just uses the openai/whisper-large-v3 tokenizer directly and transcribe works as expected (ie does not translate).

Sharrnah · 2023-11-13T09:40:00Z

Thanks. i already changed it to use the OpenAI tokenizer. But you are sure the only token ids that have changed are the ones for <|transcribe|>/<|translate|> ? I would not be 100% sure because it still behaved a little bit strange even when i set it to "translate" (and as such it was more or less transcribing)
Will have to test with the OpenAI tokenizer to see if that prevents the occasional strange behaviour.

I don't get it. I didn't pick any token by hand - my fork just uses the openai/whisper-large-v3 tokenizer directly and transcribe works as expected (ie does not translate).

Sorry. i meant, that if only <|transcribe|>/<|translate|> id's have changed, we could just update an existing tokenizer.json and i was trying to say that i am not sure if that would be the only change because of my testing.

Anyway. everything fine now and transcription / translation works for me also with v3. :) And someone already exported a tokenizer.json https://huggingface.co/bababababooey/faster-whisper-large-v3/tree/main

maybe all the whisper-v3 issues and PRs should be merged (if thats possible)

jnnnnn · 2023-11-30T23:30:52Z

#578 added v3 support

UmarRamzan changed the title ~~Working with Whisper-largge-v3~~ Working with Whisper-large-v3 Nov 7, 2023

This was referenced Nov 8, 2023

Implementation with Large-v3 but with Batching #553

Closed

Whisper large-v3 #549

Closed

hopto-dot mentioned this issue Nov 9, 2023

Does this repo support fine-tuning large-v3 models? saurastha/fine-tune-whisper#1

Closed

Jeronymous mentioned this issue Nov 13, 2023

Support of new Whisper model large-v3 linto-ai/faster-whisper#1

Merged

DougTrajano mentioned this issue Nov 16, 2023

Add support for whisper-large-v3 #565

Closed

UmarRamzan closed this as completed Dec 3, 2023

Thundercup mentioned this issue Dec 6, 2023

Some bugs of guide code for large-v3 model #605

Closed

s-h-a-d-o-w mentioned this issue Dec 10, 2023

Whisper v3 m-bain/whisperX#560

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Working with Whisper-large-v3 #547

Working with Whisper-large-v3 #547

UmarRamzan commented Nov 7, 2023

UmarRamzan commented Nov 7, 2023

souvikqb commented Nov 8, 2023 •

edited

skanda1005 commented Nov 9, 2023

souvikqb commented Nov 9, 2023

skanda1005 commented Nov 9, 2023

turicas commented Nov 11, 2023

anthonyyuan commented Nov 11, 2023

souvikqb commented Nov 11, 2023

turicas commented Nov 11, 2023

blackpolarz commented Nov 11, 2023

Purfview commented Nov 11, 2023 •

edited

Sharrnah commented Nov 11, 2023

Purfview commented Nov 11, 2023

Sharrnah commented Nov 11, 2023

Purfview commented Nov 12, 2023

turicas commented Nov 12, 2023

souvikqb commented Nov 12, 2023

Sharrnah commented Nov 12, 2023 •

edited

turicas commented Nov 12, 2023

Purfview commented Nov 12, 2023 •

edited

turicas commented Nov 12, 2023

Sharrnah commented Nov 12, 2023

turicas commented Nov 13, 2023

Sharrnah commented Nov 13, 2023

jnnnnn commented Nov 30, 2023

Working with Whisper-large-v3 #547

Working with Whisper-large-v3 #547

Comments

UmarRamzan commented Nov 7, 2023

UmarRamzan commented Nov 7, 2023

souvikqb commented Nov 8, 2023 • edited

skanda1005 commented Nov 9, 2023

souvikqb commented Nov 9, 2023

skanda1005 commented Nov 9, 2023

turicas commented Nov 11, 2023

anthonyyuan commented Nov 11, 2023

souvikqb commented Nov 11, 2023

turicas commented Nov 11, 2023

blackpolarz commented Nov 11, 2023

Purfview commented Nov 11, 2023 • edited

Sharrnah commented Nov 11, 2023

Purfview commented Nov 11, 2023

Sharrnah commented Nov 11, 2023

Purfview commented Nov 12, 2023

turicas commented Nov 12, 2023

souvikqb commented Nov 12, 2023

Sharrnah commented Nov 12, 2023 • edited

turicas commented Nov 12, 2023

Purfview commented Nov 12, 2023 • edited

turicas commented Nov 12, 2023

Sharrnah commented Nov 12, 2023

turicas commented Nov 13, 2023

Sharrnah commented Nov 13, 2023

jnnnnn commented Nov 30, 2023

souvikqb commented Nov 8, 2023 •

edited

Purfview commented Nov 11, 2023 •

edited

Sharrnah commented Nov 12, 2023 •

edited

Purfview commented Nov 12, 2023 •

edited