Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low Accuracy on German #66

Closed
MaxS1996 opened this issue Feb 5, 2023 · 3 comments
Closed

Low Accuracy on German #66

MaxS1996 opened this issue Feb 5, 2023 · 3 comments

Comments

@MaxS1996
Copy link

MaxS1996 commented Feb 5, 2023

I wanted to use WhisperX to do forced alignment on the Mozilla Common Voice German Dataset, but the words are often cut of or the segments do not align at all.

Additionally, some audio tracks are recognized as Farsi instead of German.

Is it because of the short duration of these clips (< 2-5 seconds, each)?
And how can I improve this accuracy?

Is the accuracy of the english models (for english audio) better?

@m-bain
Copy link
Owner

m-bain commented Feb 6, 2023

are you passing in --language de, that way it knows it is german?

@MaxS1996
Copy link
Author

MaxS1996 commented Feb 6, 2023

I am using the Python API (result = model.transcribe(audio_file)) and was not aware of a parameter for the transcribe function, that allowed me to enforce a certain language.

I was able to improve the performance to a usable level by adding the extend_duration parameter with 0.1 as value, but it still cuts of the beginning of the word from time to time

@m-bain
Copy link
Owner

m-bain commented Apr 4, 2023

new VAD filtering feature should fix this, feel free to re-open if not

@m-bain m-bain closed this as completed Apr 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants