-
Notifications
You must be signed in to change notification settings - Fork 840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use DeepSpeech for verification #1666
Comments
Interesting idea. @lissyx do you have a take on whether this could work? |
@Gregoor might be tricky to do, especially when we have common voice data used to train the model? cc @kdavis-mozilla |
Makes sense, thanks! I guess we'd only do it for clips that have not been released yet then. |
But this requires setting up and maintaining infra to deal with that, seems like non trivial to me. |
For reference: This idea was also posted on the forum. https://discourse.mozilla.org/t/use-deepspeech-as-one-positive-validation/41144 We can also use other open datasets to train that deepspeech instance. |
Closing this given that there is a discourse post. |
There are now two high quality speech recognition software available:
We can run them BOTH over ALL unvalidated clips in languages that are supported well. WHEN both validate the clip successfully, only one human vote is required. If one disagrees and one agrees, also a human should decide. If both disagree, two humans must validate. We should monitor them closely and can release a study. Which one is more correct? Which languages work reliable? ... Then we have validated everything in no time! (for popular languages, increasing the gap) |
Could DeepSpeech be used for verification? When 2 verifications are needed, one could be from DeepSpeech when recognition confidence is very high.
It should be able to detect when a whole different sentence is recorded and flag it. Relates to #272
It could also detect offensive words.
The verification status could be saved and used to further train the model comparing to how users verified it.
It would be nice for recording to have a UX where is just speak sentences and the system detects that i have spoken them without the need to press any button. Other STT trainings work this way.
The text was updated successfully, but these errors were encountered: