Low Accuracy on German #66

MaxS1996 · 2023-02-05T11:12:19Z

I wanted to use WhisperX to do forced alignment on the Mozilla Common Voice German Dataset, but the words are often cut of or the segments do not align at all.

Additionally, some audio tracks are recognized as Farsi instead of German.

Is it because of the short duration of these clips (< 2-5 seconds, each)?
And how can I improve this accuracy?

Is the accuracy of the english models (for english audio) better?

m-bain · 2023-02-06T01:38:27Z

are you passing in --language de, that way it knows it is german?

MaxS1996 · 2023-02-06T08:54:06Z

I am using the Python API (result = model.transcribe(audio_file)) and was not aware of a parameter for the transcribe function, that allowed me to enforce a certain language.

I was able to improve the performance to a usable level by adding the extend_duration parameter with 0.1 as value, but it still cuts of the beginning of the word from time to time

m-bain · 2023-04-04T19:48:58Z

new VAD filtering feature should fix this, feel free to re-open if not

m-bain closed this as completed Apr 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low Accuracy on German #66

Low Accuracy on German #66

MaxS1996 commented Feb 5, 2023

m-bain commented Feb 6, 2023

MaxS1996 commented Feb 6, 2023

m-bain commented Apr 4, 2023

Low Accuracy on German #66

Low Accuracy on German #66

Comments

MaxS1996 commented Feb 5, 2023

m-bain commented Feb 6, 2023

MaxS1996 commented Feb 6, 2023

m-bain commented Apr 4, 2023