Use DeepSpeech for verification #1666

davidak · 2018-12-04T13:35:24Z

Could DeepSpeech be used for verification? When 2 verifications are needed, one could be from DeepSpeech when recognition confidence is very high.

It should be able to detect when a whole different sentence is recorded and flag it. Relates to #272
It could also detect offensive words.

The verification status could be saved and used to further train the model comparing to how users verified it.

It would be nice for recording to have a UX where is just speak sentences and the system detects that i have spoken them without the need to press any button. Other STT trainings work this way.

Gregoor · 2019-01-30T10:43:31Z

Interesting idea. @lissyx do you have a take on whether this could work?

lissyx · 2019-01-30T10:50:09Z

@Gregoor might be tricky to do, especially when we have common voice data used to train the model? cc @kdavis-mozilla

Gregoor · 2019-01-30T10:51:22Z

Makes sense, thanks! I guess we'd only do it for clips that have not been released yet then.

lissyx · 2019-01-30T11:51:54Z

But this requires setting up and maintaining infra to deal with that, seems like non trivial to me.

davidak · 2019-06-08T14:34:54Z

For reference: This idea was also posted on the forum.

https://discourse.mozilla.org/t/use-deepspeech-as-one-positive-validation/41144

We can also use other open datasets to train that deepspeech instance.

MichaelKohler · 2020-02-22T00:53:08Z

Closing this given that there is a discourse post.

davidak · 2022-09-23T10:28:45Z

There are now two high quality speech recognition software available:

https://github.com/coqui-ai/STT (active fork of DeepSpeech)
https://github.com/openai/whisper (i had perfect results with german text and medium model)

We can run them BOTH over ALL unvalidated clips in languages that are supported well. WHEN both validate the clip successfully, only one human vote is required. If one disagrees and one agrees, also a human should decide. If both disagree, two humans must validate.

We should monitor them closely and can release a study. Which one is more correct? Which languages work reliable? ...

Then we have validated everything in no time! (for popular languages, increasing the gap)

MichaelKohler closed this as completed Feb 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use DeepSpeech for verification #1666

Use DeepSpeech for verification #1666

davidak commented Dec 4, 2018

Gregoor commented Jan 30, 2019

lissyx commented Jan 30, 2019

Gregoor commented Jan 30, 2019

lissyx commented Jan 30, 2019

davidak commented Jun 8, 2019 •

edited

Loading

MichaelKohler commented Feb 22, 2020

davidak commented Sep 23, 2022

Use DeepSpeech for verification #1666

Use DeepSpeech for verification #1666

Comments

davidak commented Dec 4, 2018

Gregoor commented Jan 30, 2019

lissyx commented Jan 30, 2019

Gregoor commented Jan 30, 2019

lissyx commented Jan 30, 2019

davidak commented Jun 8, 2019 • edited Loading

MichaelKohler commented Feb 22, 2020

davidak commented Sep 23, 2022

davidak commented Jun 8, 2019 •

edited

Loading