fix: correct reference length calculation #195

yuxqiu · 2024-04-27T13:44:17Z

Summary

This PR fixes the way brevity penalty (specifically the effective reference corpus length) is calculated in BLEU.

Previously, len_reference was calculated as min([len(ref) for ref in references_tokenized]). However, this is incorrect, because according to the paper, we need to find the "best match length", not the minimum reference length.

For more information, see wikipedia - brevity penalty and nltk implementation.

Test plan

I added another unit test to test_bleu.py and compared the results of the calculations to the results of the nltk.translate.bleu_score.corpus_bleu function to make sure the implementation is correct.

facebook-github-bot · 2024-05-01T20:46:56Z

@JKSenthil has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

yuxqiu · 2024-05-08T01:58:15Z

Hi, I am wondering if there is a way for me to get the reason why the test is failing so that I can fix the problem.

JKSenthil · 2024-05-08T22:46:11Z

Hi @yuxqiu, thanks for this contribution! it seems some files unrelated to BLEU have been formatted in a way in which causes our linter to error, do you mind undo-ing those changes?

This reverts commit abe02fd.

yuxqiu · 2024-05-09T03:01:32Z

@JKSenthil I've finished undo-ing all those changes.

facebook-github-bot · 2024-05-09T20:42:52Z

@JKSenthil has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

JKSenthil · 2024-05-10T14:12:13Z

Hi @yuxqiu, thanks for reverting! We have identified the linter issue to be on our end, we'll land a fix first then rerun these tests again :)

facebook-github-bot · 2024-05-14T00:59:34Z

@JKSenthil has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

yuxqiu added 2 commits April 27, 2024 09:39

fix: correct reference length calculation

db000dc

style: format

abe02fd

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 27, 2024

revert: format

64ff52c

This reverts commit abe02fd.

facebook-github-bot closed this in ea813d3 May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: correct reference length calculation #195

fix: correct reference length calculation #195

yuxqiu commented Apr 27, 2024

facebook-github-bot commented May 1, 2024

yuxqiu commented May 8, 2024

JKSenthil commented May 8, 2024

yuxqiu commented May 9, 2024

facebook-github-bot commented May 9, 2024

JKSenthil commented May 10, 2024

facebook-github-bot commented May 14, 2024

fix: correct reference length calculation #195

fix: correct reference length calculation #195

Conversation

yuxqiu commented Apr 27, 2024

Summary

Test plan

facebook-github-bot commented May 1, 2024

yuxqiu commented May 8, 2024

JKSenthil commented May 8, 2024

yuxqiu commented May 9, 2024

facebook-github-bot commented May 9, 2024

JKSenthil commented May 10, 2024

facebook-github-bot commented May 14, 2024