-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CharacTER: MT metric #286
CharacTER: MT metric #286
Conversation
The documentation is not available anymore as the PR was closed or merged. |
Experiencing some isort issues like |
Thanks @BramVanroy, this is cool! Will add to my list to review mid-next week :) |
@mathemakitten I fixed the isort issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @BramVanroy, thanks so much for putting this together! I added some comments and questions, please have a look and let me know if I can clarify anything 🤗
Co-authored-by: helen <31600291+mathemakitten@users.noreply.github.com>
Co-authored-by: helen <31600291+mathemakitten@users.noreply.github.com>
Co-authored-by: helen <31600291+mathemakitten@users.noreply.github.com>
Co-authored-by: helen <31600291+mathemakitten@users.noreply.github.com>
Co-authored-by: helen <31600291+mathemakitten@users.noreply.github.com>
the corpus version now only adds attributes, but cer_scores will always be present and always a list
Thanks for having such a detailed look @mathemakitten! I incorporated your suggestions. I am encountering an issue again with the required format. I have experienced this behavior and I am not sure how to solve this. It seems that when I pass a single sentence (not a list) the string is used as a sequence of letters, with the following errors:
You can trigger this error by running When running |
So compute always expects a list of samples. So if you want to use a feature consists of metric.compute(predictions=["my example"], references=["your example"]) If you only want to pass a single example you can use metric.add(predictions="my example", references="your example") Finally, if your features themselves are lists (e.g. tokenized strings), you should pass them as follows: metric.compute(predictions=[["my", "example"]], references=[["your", "example"]]) Does that make sense? In your case it indeed seems that a string is interpreted as a list and that might be causing the mismatch so I am guessing you passed the |
Now correctly accepts single strings and lists as input. Now only returns cer_scores and not other statistics as this seems rather uncommon and might be confusing for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I wonder if it would make sense to extend it to the case where you have multiple references? E.g. take the minimum in that case, which corresponds to the score to the most similar reference? This would make it easy to combine it with all the other text metrics. What do you think?
Sure, I can do that. Should it also return the most similar reference in that case, or just the score? |
Just the score would be enough. Looking at the code of the underlying library I noticed that you are returning individual scores. To keep the metric in line with the other metrics it would be great if the primary returned value is a scalar not a list of score per example. It's ok to have it as a secondary score - even better would to have a Both (generalize to multiple refs and return only a scalar) also applies to #290. |
Okay. Is there a canonical way to aggregate when using
And a secondary kwarg What do you think? |
add aggregate and return_all_scores arguments
I implemented my suggestion above. Please let me know if that is in line with what you expected @lvwerra, so I can update the PR in charcut in a similar way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's perfect, thanks for adding these features! 🚀
Looks like |
Done! |
Do tests via doctest instead
I saw that there are some suggestions from @mathemakitten left, would you mind adding them? |
Co-authored-by: helen <31600291+mathemakitten@users.noreply.github.com>
Done. |
Add the CharacTER MT evaluation metric that was introduced in CharacTer: Translation Edit Rate on Character Level.
Specifically, this implementation uses the repackaged version of the original for usability reasons.