Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: correct reference length calculation #195

Closed
wants to merge 3 commits into from

Conversation

yuxqiu
Copy link
Contributor

@yuxqiu yuxqiu commented Apr 27, 2024

Summary

This PR fixes the way brevity penalty (specifically the effective reference corpus length) is calculated in BLEU.

Previously, len_reference was calculated as min([len(ref) for ref in references_tokenized]). However, this is incorrect, because according to the paper, we need to find the "best match length", not the minimum reference length.

For more information, see wikipedia - brevity penalty and nltk implementation.

Test plan

I added another unit test to test_bleu.py and compared the results of the calculations to the results of the nltk.translate.bleu_score.corpus_bleu function to make sure the implementation is correct.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 27, 2024
@facebook-github-bot
Copy link
Contributor

@JKSenthil has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@yuxqiu
Copy link
Contributor Author

yuxqiu commented May 8, 2024

Hi, I am wondering if there is a way for me to get the reason why the test is failing so that I can fix the problem.

@JKSenthil
Copy link
Contributor

Hi @yuxqiu, thanks for this contribution! it seems some files unrelated to BLEU have been formatted in a way in which causes our linter to error, do you mind undo-ing those changes?

This reverts commit abe02fd.
@yuxqiu
Copy link
Contributor Author

yuxqiu commented May 9, 2024

@JKSenthil I've finished undo-ing all those changes.

@facebook-github-bot
Copy link
Contributor

@JKSenthil has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@JKSenthil
Copy link
Contributor

Hi @yuxqiu, thanks for reverting! We have identified the linter issue to be on our end, we'll land a fix first then rerun these tests again :)

@facebook-github-bot
Copy link
Contributor

@JKSenthil has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants