CharacTER: MT metric #286

BramVanroy · 2022-09-02T13:52:22Z

Add the CharacTER MT evaluation metric that was introduced in CharacTer: Translation Edit Rate on Character Level.

Specifically, this implementation uses the repackaged version of the original for usability reasons.

HuggingFaceDocBuilderDev · 2022-09-02T13:58:11Z

The documentation is not available anymore as the PR was closed or merged.

BramVanroy · 2022-09-02T14:40:34Z

Experiencing some isort issues like WARNING: Unable to parse file tests due to [Errno 13] Permission denied: 'C:\\Python\\projects\\evaluate\\tests'. Will investigate later.

mathemakitten · 2022-09-03T02:07:37Z

Thanks @BramVanroy, this is cool! Will add to my list to review mid-next week :)

BramVanroy · 2022-09-11T09:11:39Z

@mathemakitten I fixed the isort issues.

mathemakitten

Hi @BramVanroy, thanks so much for putting this together! I added some comments and questions, please have a look and let me know if I can clarify anything 🤗

metrics/character/.gitattributes

metrics/character/README.md

metrics/character/requirements.txt

metrics/character/README.md

metrics/character/character.py

metrics/character/README.md

metrics/character/character.py

metrics/character/tests.py

Co-authored-by: helen <31600291+mathemakitten@users.noreply.github.com>

the corpus version now only adds attributes, but cer_scores will always be present and always a list

BramVanroy · 2022-09-15T11:43:39Z

Hi @BramVanroy, thanks so much for putting this together! I added some comments and questions, please have a look and let me know if I can clarify anything 🤗

Thanks for having such a detailed look @mathemakitten! I incorporated your suggestions. I am encountering an issue again with the required format. I have experienced this behavior and I am not sure how to solve this.

It seems that when I pass a single sentence (not a list) the string is used as a sequence of letters, with the following errors:

ValueError: Mismatch in the number of predictions (71) and references (82)

You can trigger this error by running python -m pytest .\metrics\character\tests.py.

When running compute, what kind of format is expected? I tried adding a batch dimension, but that will pass the list to compute, which in turn will trigger cer.calculate_cer_corpus instead of calculate_cer. So it seems I can never "just" pass a string? Please advise me on that.

lvwerra · 2022-09-21T06:44:42Z

When running compute, what kind of format is expected? I tried adding a batch dimension, but that will pass the list to compute, which in turn will trigger cer.calculate_cer_corpus instead of calculate_cer. So it seems I can never "just" pass a string? Please advise me on that.

So compute always expects a list of samples. So if you want to use a feature consists of Value("string") you should pass the following:

metric.compute(predictions=["my example"], references=["your example"])

If you only want to pass a single example you can use add:

metric.add(predictions="my example", references="your example")

Finally, if your features themselves are lists (e.g. tokenized strings), you should pass them as follows:

metric.compute(predictions=[["my", "example"]], references=[["your", "example"]])

Does that make sense? In your case it indeed seems that a string is interpreted as a list and that might be causing the mismatch so I am guessing you passed the add format above to compute. If not, could you share an example?

Now correctly accepts single strings and lists as input. Now only returns cer_scores and not other statistics as this seems rather uncommon and might be confusing for users.

lvwerra

Looks good. I wonder if it would make sense to extend it to the case where you have multiple references? E.g. take the minimum in that case, which corresponds to the score to the most similar reference? This would make it easy to combine it with all the other text metrics. What do you think?

BramVanroy · 2022-12-06T14:57:21Z

Looks good. I wonder if it would make sense to extend it to the case where you have multiple references? E.g. take the minimum in that case, which corresponds to the score to the most similar reference? This would make it easy to combine it with all the other text metrics. What do you think?

Sure, I can do that. Should it also return the most similar reference in that case, or just the score?

lvwerra · 2022-12-06T15:05:28Z

Sure, I can do that. Should it also return the most similar reference in that case, or just the score?

Just the score would be enough. Looking at the code of the underlying library I noticed that you are returning individual scores. To keep the metric in line with the other metrics it would be great if the primary returned value is a scalar not a list of score per example. It's ok to have it as a secondary score - even better would to have a aggregate=True keyword and only return all the scores when set to False. That keeps to user experience leaner.

Both (generalize to multiple refs and return only a scalar) also applies to #290.

BramVanroy · 2022-12-06T15:24:04Z

Sure, I can do that. Should it also return the most similar reference in that case, or just the score?

Just the score would be enough. Looking at the code of the underlying library I noticed that you are returning individual scores. To keep the metric in line with the other metrics it would be great if the primary returned value is a scalar not a list of score per example. It's ok to have it as a secondary score - even better would to have a aggregate=True keyword and only return all the scores when set to False. That keeps to user experience leaner.

Both (generalize to multiple refs and return only a scalar) also applies to #290.

Okay. Is there a canonical way to aggregate when using aggregate=True? Suggestion: add keyword argument aggregate that has the follow options:

True/"mean": returns the average (default)
sum: returns the sum
median: returns the median

And a secondary kwarg return_all_scores that optionally returns all scores as a list. Default=False.

What do you think?

add aggregate and return_all_scores arguments

BramVanroy · 2022-12-07T14:06:41Z

I implemented my suggestion above. Please let me know if that is in line with what you expected @lvwerra, so I can update the PR in charcut in a similar way.

lvwerra

That's perfect, thanks for adding these features! 🚀

lvwerra · 2022-12-08T09:37:48Z

Looks like Literal was added only in Python 3.8 (we are still supporting 3.7). Could you fix this? If the CI is green we can merge :)

…to character

BramVanroy · 2022-12-08T10:52:04Z

Done!

Do tests via doctest instead

lvwerra · 2022-12-08T11:45:08Z

I saw that there are some suggestions from @mathemakitten left, would you mind adding them?

Co-authored-by: helen <31600291+mathemakitten@users.noreply.github.com>

BramVanroy · 2022-12-08T12:21:46Z

I saw that there are some suggestions from @mathemakitten left, would you mind adding them?

Done.

BramVanroy added 3 commits September 2, 2022 15:43

init character MT metric

70645e2

Update README.md

8b13546

make style

735af7a

add isorts fixes

6f2a680

make style

3c66bb8

Bram Vanroy added 2 commits September 11, 2022 11:42

fix example in README

6dc3573

add cer dependency for tests

bc464e4

mathemakitten reviewed Sep 13, 2022

View reviewed changes

Bram Vanroy and others added 13 commits September 14, 2022 22:04

Update metrics/character/requirements.txt

5778a94

Co-authored-by: helen <31600291+mathemakitten@users.noreply.github.com>

Update metrics/character/README.md

5aeb507

Co-authored-by: helen <31600291+mathemakitten@users.noreply.github.com>

Update metrics/character/README.md

2b3b2d4

Co-authored-by: helen <31600291+mathemakitten@users.noreply.github.com>

Update metrics/character/character.py

aac9896

Co-authored-by: helen <31600291+mathemakitten@users.noreply.github.com>

Update metrics/character/character.py

c55a7a2

Co-authored-by: helen <31600291+mathemakitten@users.noreply.github.com>

Delete .gitattributes

8523429

require cer >=1.1.0

fdda8ae

use calculate_cer when given a string

a7ee66b

add separate test for single/corpus

32df29c

streamline output format

3df21b6

the corpus version now only adds attributes, but cer_scores will always be present and always a list

style

5a4985b

update documentation

71927cd

add singleton example

7b82410

Bram Vanroy added 3 commits December 6, 2022 11:43

Merge branch 'huggingface:main' into character

5ce2961

update cer dependency to 1.2.0

3541c98

make metric more robust

7206075

Now correctly accepts single strings and lists as input. Now only returns cer_scores and not other statistics as this seems rather uncommon and might be confusing for users.

Bram Vanroy added 4 commits December 6, 2022 12:19

fix doctest formatting

2ce5bc6

use non-local metric name

fdff370

update dependency

215ae81

simplify metric, assume we always work with batches

b4ec6f6

lvwerra reviewed Dec 6, 2022

View reviewed changes

BramVanroy added 2 commits December 7, 2022 14:35

aggregate scores

21ac66e

add aggregate and return_all_scores arguments

add multi-reference option

7b1d80e

lvwerra approved these changes Dec 8, 2022

View reviewed changes

Merge branch 'main' into character

e8996d6

Bram Vanroy added 2 commits December 8, 2022 11:51

remove "Literal"

9c1338f

Merge branch 'character' of https://github.com/BramVanroy/evaluate in…

7854efc

…to character

Delete tests.py

3c96c3c

Do tests via doctest instead

Bram Vanroy and others added 2 commits December 8, 2022 13:17

Apply suggestions from code review

8d15353

Co-authored-by: helen <31600291+mathemakitten@users.noreply.github.com>

Update description

c35f53a

lvwerra merged commit 544f1e8 into huggingface:main Dec 8, 2022

BramVanroy deleted the character branch December 8, 2022 15:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CharacTER: MT metric #286

CharacTER: MT metric #286

BramVanroy commented Sep 2, 2022

HuggingFaceDocBuilderDev commented Sep 2, 2022 •

edited

Loading

BramVanroy commented Sep 2, 2022

mathemakitten commented Sep 3, 2022

BramVanroy commented Sep 11, 2022

mathemakitten left a comment

BramVanroy commented Sep 15, 2022

lvwerra commented Sep 21, 2022

lvwerra left a comment

BramVanroy commented Dec 6, 2022

lvwerra commented Dec 6, 2022

BramVanroy commented Dec 6, 2022

BramVanroy commented Dec 7, 2022 •

edited

Loading

lvwerra left a comment

lvwerra commented Dec 8, 2022

BramVanroy commented Dec 8, 2022

lvwerra commented Dec 8, 2022

BramVanroy commented Dec 8, 2022

CharacTER: MT metric #286

CharacTER: MT metric #286

Conversation

BramVanroy commented Sep 2, 2022

HuggingFaceDocBuilderDev commented Sep 2, 2022 • edited Loading

BramVanroy commented Sep 2, 2022

mathemakitten commented Sep 3, 2022

BramVanroy commented Sep 11, 2022

mathemakitten left a comment

Choose a reason for hiding this comment

BramVanroy commented Sep 15, 2022

lvwerra commented Sep 21, 2022

lvwerra left a comment

Choose a reason for hiding this comment

BramVanroy commented Dec 6, 2022

lvwerra commented Dec 6, 2022

BramVanroy commented Dec 6, 2022

BramVanroy commented Dec 7, 2022 • edited Loading

lvwerra left a comment

Choose a reason for hiding this comment

lvwerra commented Dec 8, 2022

BramVanroy commented Dec 8, 2022

lvwerra commented Dec 8, 2022

BramVanroy commented Dec 8, 2022

HuggingFaceDocBuilderDev commented Sep 2, 2022 •

edited

Loading

BramVanroy commented Dec 7, 2022 •

edited

Loading