Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model providing not an accurate transcription , mixing some other language . #23

Closed
Bharadwajsai-121 opened this issue Apr 22, 2023 · 1 comment

Comments

@Bharadwajsai-121
Copy link

I tried to get transcriptions for a video of David Silver's reinforcement learning playlist from YouTube .
The model was able to generate very good transcriptions at some timestamps , but at many timestamps , it generates transcriptions of some other language which apart from English. I haven't changed any settings or anything , just copy pasted the url of the video and clicked on transcribe . The result was out in 23.4 seconds but wasn't accurate .

For more information , please have a look at this image I'm attaching below :

!
whisper_accuracy_test

In the image , you can clearly observe that the model is generating transcriptions of other language , even though english is asked for . Some part of it was in English , and the other part in some other language . #

@sanchit-gandhi
Copy link
Owner

Hey @Bharadwajsai-121! Looks like it went into Welsh! This is because the Whisper model makes a prediction for the most likely language for each batch, and then predicts in that language for that batch.

We can actually pass the language to the Flax Whisper Pipeline:

pred_txt = pipeline("audio.mp3", task="transcribe", language="English", return_timestamps=True)

This will force the model to predict in the language that you provide it, which should resolve your issue.

You can use the model for yourself using the Kaggle notebook: https://www.kaggle.com/code/sgandhi99/whisper-jax-tpu

If you pass the language argument as detailed above, you should be able to get perfect transcriptions for the David Silver lectures in nearly the same transcription time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants