New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute WER metric iteratively #2111
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thank you !
cc @patrickvonplaten any opinion on this ?
metrics/wer/wer.py
Outdated
measures = compute_measures(reference, prediction) | ||
incorrect += measures["substitutions"] + measures["deletions"] + measures["insertions"] | ||
total += measures["substitutions"] + measures["deletions"] + measures["hits"] | ||
return incorrect / total |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for safety, you may want to handle the case of total = 0
(e.g., empty set). Unit tests to verify the change would be great as well. Thanks for making this change!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @elgeish for your comments.
The reasons why I have not explicitly handled the edge case total = 0
are:
compute_measure
does already raise a ValueError exception if any of thereference
s has length = 0: see https://github.com/jitsi/jiwer/blob/f8e5404e4ddb7259081191443def1b6670480244/jiwer/measures.py#L242- if
references
andpredictions
have length = 0, the previous behavior was also raising the same ValueError; the current implementation will raise a ZeroDivisionError in this case
In relation to unit tests, currently we do not test scripts. Maybe @lhoestq can give more insight on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good; makes sense!
I discussed with Patrick and I think we could have a nice addition: have a parameter By default Some users might still want to use the old implementation. |
@lhoestq @patrickvonplaten are you sure of the parameter name |
Not sure about the name, if you can improve it feel free to do so ^^' |
@lhoestq yes, but the end user does not necessarily know the details of the implementation of the WER computation. From the end user perspective I think it might make more sense: how do you want to compute the metric?
Because of that I was thinking of something like |
Personally like |
Therefore, you can merge... ;) |
Ok ! merging :) |
Compute WER metric iteratively to avoid MemoryError.
Fix #2078.