Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding BLEU Score ('bleu_n') #58

Closed
Axe-- opened this issue Nov 14, 2021 · 4 comments · Fixed by #59
Closed

Understanding BLEU Score ('bleu_n') #58

Axe-- opened this issue Nov 14, 2021 · 4 comments · Fixed by #59
Labels
bug Something isn't working

Comments

@Axe--
Copy link

Axe-- commented Nov 14, 2021

Hey, how are different bleu scores calculated?

For the give snippet, why are all bleu(n) scores identical?
And how does this relate to nltk's sentence_bleu (weights) ?

from jury import Jury

scorer = Jury()
predictions = [
    ["the cat is on the mat", "There is cat playing on the mat"], 
    ["Look!    a wonderful day."]
]
references = [
    ["the cat is playing on the mat.", "The cat plays on the mat."], 
    ["Today is a wonderful day", "The weather outside is wonderful."]
]
scores = scorer(predictions=predictions, references=references)

Output:

{'empty_predictions': 0,
 'total_items': 2,
 'bleu_1': {'score': 0.42370250917168295,
  'precisions': [0.8823529411764706,
   0.6428571428571429,
   0.45454545454545453,
   0.125],
  'brevity_penalty': 1.0,
  'length_ratio': 1.0,
  'translation_length': 11,
  'reference_length': 11},
 'bleu_2': {'score': 0.42370250917168295,
  'precisions': [0.8823529411764706,
   0.6428571428571429,
   0.45454545454545453,
   0.125],
  'brevity_penalty': 1.0,
  'length_ratio': 1.0,
  'translation_length': 11,
  'reference_length': 11},
 'bleu_3': {'score': 0.42370250917168295,
  'precisions': [0.8823529411764706,
   0.6428571428571429,
   0.45454545454545453,
   0.125],
  'brevity_penalty': 1.0,
  'length_ratio': 1.0,
  'translation_length': 11,
  'reference_length': 11},
 'bleu_4': {'score': 0.42370250917168295,
  'precisions': [0.8823529411764706,
   0.6428571428571429,
   0.45454545454545453,
   0.125],
  'brevity_penalty': 1.0,
  'length_ratio': 1.0,
  'translation_length': 11,
  'reference_length': 11},
 'meteor': {'score': 0.5420511682934044},
 'rouge': {'rouge1': 0.7783882783882783,
  'rouge2': 0.5925324675324675,
  'rougeL': 0.7426739926739926,
  'rougeLsum': 0.7426739926739926}}

@devrimcavusoglu
Copy link
Member

devrimcavusoglu commented Nov 14, 2021

Hey @Axe--, there seems to be a bug with bleu which doesnt regard max_order currently, I'll look into it ASAP. Nice catch btw.

@devrimcavusoglu
Copy link
Member

It is currently fixed in #59. I'll draft a new release today.

@devrimcavusoglu
Copy link
Member

devrimcavusoglu commented Nov 14, 2021

@Axe-- , the release is out. It is fixed in the new version (2.1.2). Thanks for bringing it out 👍

@Axe--
Copy link
Author

Axe-- commented Nov 14, 2021

Thanks! :-)

@devrimcavusoglu devrimcavusoglu added the bug Something isn't working label Nov 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants