asr model evaluator addition + doc #378

bayartsogt-ya · 2022-12-08T01:02:33Z

As discussed #324, I am adding automatic-speech-recognition evaluator here.

Since automatic-speech-recognition pipeline already exists and it supports both Wav2Vec2 and Whisper, I think it is safe addition.

I did have some concern where wer and cer metrics both returning float while contract says it should be dict.
So I kind of hacked the way around by checking the type, but definitely want to hear your opinion on this.

Thanks!

HuggingFaceDocBuilderDev · 2022-12-08T08:41:55Z

The documentation is not available anymore as the PR was closed or merged.

lvwerra

Hi @bayartsogt-ya, thank you for this incredibly clean PR! Just one small nit and I'll fix the output format of WER/CER.

lvwerra · 2022-12-08T08:48:13Z

src/evaluate/evaluator/automatic_speech_recognition.py

+        """
+        Examples:
+        ```python
+        >>> from evaluate import evaluator
+        >>> from datasets import load_dataset
+        >>> task_evaluator = evaluator("automatic-speech-recognition")
+        >>> data = load_dataset("mozilla-foundation/common_voice_11_0", "en", split="validation[:40]")
+        >>> results = task_evaluator.compute(
+        >>>     model_or_pipeline="https://huggingface.co/openai/whisper-tiny.en",
+        >>>     data=data,
+        >>>     input_column="path",
+        >>>     label_column="sentence",
+        >>>     metric="wer",
+        >>> )
+        ```
+        """


For the other Evaluators we added the example also to a string at the beginning that we then added with @add_end_docstrings. Could we do the same here for uniformity?

lvwerra · 2022-12-08T08:51:40Z

src/evaluate/evaluator/base.py

@@ -267,6 +267,12 @@ def compute(
            random_state=random_state,
        )

+        # TODO: To clarify why `wer` and `cer` return float


That's an oversight in the standardization of the metrics. Let me fix that in a separate PR so we can remove this here.

Actually that will probably break a few things so let's keep your workaround for now.

This sounds great and thanks for making this change! Let me merge with your branch and do changes accordingly.

Let's actually keep your workaround for now.

lvwerra · 2022-12-08T15:37:46Z

Hi @bayartsogt-ya, see my comment above: can you keep your workaround and just fix the docstring. also no need to merge the upstream branch into yours then. Thanks!

lvwerra · 2022-12-09T15:12:33Z

Awesome, thanks for this addition!

asr model evaluator addition + doc

3bd2788

bayartsogt-ya mentioned this pull request Dec 8, 2022

[Feature request] Add evaluator for speech2text models #324

Open

lvwerra reviewed Dec 8, 2022

View reviewed changes

lvwerra mentioned this pull request Dec 8, 2022

fix outputs of WER/CER #381

Open

bayartsogt-ya force-pushed the speech-to-text-evaluator-branch branch from 4d5eba7 to 3bd2788 Compare December 8, 2022 16:08

@add_end_docstrings addition

6479087

lvwerra approved these changes Dec 9, 2022

View reviewed changes

lvwerra merged commit 81d34e4 into huggingface:main Dec 9, 2022

bayartsogt-ya deleted the speech-to-text-evaluator-branch branch December 10, 2022 00:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

asr model evaluator addition + doc #378

asr model evaluator addition + doc #378

bayartsogt-ya commented Dec 8, 2022

HuggingFaceDocBuilderDev commented Dec 8, 2022 •

edited

Loading

lvwerra left a comment

lvwerra Dec 8, 2022

lvwerra Dec 8, 2022

lvwerra Dec 8, 2022

bayartsogt-ya Dec 8, 2022

lvwerra Dec 8, 2022

lvwerra commented Dec 8, 2022

lvwerra commented Dec 9, 2022

asr model evaluator addition + doc #378

asr model evaluator addition + doc #378

Conversation

bayartsogt-ya commented Dec 8, 2022

HuggingFaceDocBuilderDev commented Dec 8, 2022 • edited Loading

lvwerra left a comment

Choose a reason for hiding this comment

lvwerra Dec 8, 2022

Choose a reason for hiding this comment

lvwerra Dec 8, 2022

Choose a reason for hiding this comment

lvwerra Dec 8, 2022

Choose a reason for hiding this comment

bayartsogt-ya Dec 8, 2022

Choose a reason for hiding this comment

lvwerra Dec 8, 2022

Choose a reason for hiding this comment

lvwerra commented Dec 8, 2022

lvwerra commented Dec 9, 2022

HuggingFaceDocBuilderDev commented Dec 8, 2022 •

edited

Loading