Compute WER metric iteratively #2111

albertvillanova · 2021-03-25T16:06:48Z

Compute WER metric iteratively to avoid MemoryError.

Fix #2078.

lhoestq

LGTM thank you !

cc @patrickvonplaten any opinion on this ?

elgeish · 2021-03-30T23:08:17Z

metrics/wer/wer.py

+            measures = compute_measures(reference, prediction)
+            incorrect += measures["substitutions"] + measures["deletions"] + measures["insertions"]
+            total += measures["substitutions"] + measures["deletions"] + measures["hits"]
+        return incorrect / total


Just for safety, you may want to handle the case of total = 0 (e.g., empty set). Unit tests to verify the change would be great as well. Thanks for making this change!

Thanks @elgeish for your comments.

The reasons why I have not explicitly handled the edge case total = 0 are:

compute_measure does already raise a ValueError exception if any of the references has length = 0: see https://github.com/jitsi/jiwer/blob/f8e5404e4ddb7259081191443def1b6670480244/jiwer/measures.py#L242

if references and predictions have length = 0, the previous behavior was also raising the same ValueError; the current implementation will raise a ZeroDivisionError in this case

In relation to unit tests, currently we do not test scripts. Maybe @lhoestq can give more insight on this.

Sounds good; makes sense!

lhoestq · 2021-03-31T08:46:08Z

I discussed with Patrick and I think we could have a nice addition: have a parameter concatenate_texts that, if True, uses the old implementation.

By default concatenate_texts would be False, so that sentences are evaluated independently, and to save resources (the WER computation has a quadratic complexity).

Some users might still want to use the old implementation.

albertvillanova · 2021-04-01T08:47:40Z

@lhoestq @patrickvonplaten are you sure of the parameter name concatenate_texts? I was thinking about something like iter...

lhoestq · 2021-04-01T08:49:58Z

Not sure about the name, if you can improve it feel free to do so ^^'
The old implementation computes the WER on the concatenation of all the input texts, while the new one makes WER measures computation independent for each reference/prediction pair.
That's why I thought of concatenate_texts

albertvillanova · 2021-04-01T10:02:08Z

@lhoestq yes, but the end user does not necessarily know the details of the implementation of the WER computation.

From the end user perspective I think it might make more sense: how do you want to compute the metric?

all in once, more RAM memory needed?
iteratively, less RAM requirements?

Because of that I was thinking of something like iter or iterative...

lhoestq · 2021-04-06T07:12:44Z

Personally like concatenate_texts better since I feel like iter or iterate are quite vague

albertvillanova · 2021-04-06T07:19:29Z

Therefore, you can merge... ;)

lhoestq · 2021-04-06T07:20:38Z

Ok ! merging :)

albertvillanova added 2 commits March 25, 2021 17:04

Compute WER metric iteratively

d64fc53

Fix style

9a66cee

albertvillanova linked an issue Mar 25, 2021 that may be closed by this pull request

MemoryError when computing WER metric #2078

Closed

albertvillanova requested a review from lhoestq March 30, 2021 06:21

lhoestq approved these changes Mar 30, 2021

View reviewed changes

patrickvonplaten approved these changes Mar 30, 2021

View reviewed changes

elgeish reviewed Mar 30, 2021

View reviewed changes

lhoestq mentioned this pull request Mar 31, 2021

Add CER metric #2138

Merged

Add parameter to recover previous non-iter behavior

52f1125

lhoestq merged commit 549cd55 into huggingface:master Apr 6, 2021

lhoestq mentioned this pull request Apr 6, 2021

Updated WER metric implementation to avoid memory issues #2169

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute WER metric iteratively #2111

Compute WER metric iteratively #2111

albertvillanova commented Mar 25, 2021

lhoestq left a comment

elgeish Mar 30, 2021

albertvillanova Mar 31, 2021 •

edited

elgeish Mar 31, 2021

lhoestq commented Mar 31, 2021

albertvillanova commented Apr 1, 2021

lhoestq commented Apr 1, 2021 •

edited

albertvillanova commented Apr 1, 2021

lhoestq commented Apr 6, 2021

albertvillanova commented Apr 6, 2021

lhoestq commented Apr 6, 2021

Compute WER metric iteratively #2111

Compute WER metric iteratively #2111

Conversation

albertvillanova commented Mar 25, 2021

lhoestq left a comment

Choose a reason for hiding this comment

elgeish Mar 30, 2021

Choose a reason for hiding this comment

albertvillanova Mar 31, 2021 • edited

Choose a reason for hiding this comment

elgeish Mar 31, 2021

Choose a reason for hiding this comment

lhoestq commented Mar 31, 2021

albertvillanova commented Apr 1, 2021

lhoestq commented Apr 1, 2021 • edited

albertvillanova commented Apr 1, 2021

lhoestq commented Apr 6, 2021

albertvillanova commented Apr 6, 2021

lhoestq commented Apr 6, 2021

albertvillanova Mar 31, 2021 •

edited

lhoestq commented Apr 1, 2021 •

edited