Understanding BLEU Score ('bleu_n') #58

Axe-- · 2021-11-14T06:41:16Z

Hey, how are different bleu scores calculated?

For the give snippet, why are all bleu(n) scores identical?
And how does this relate to nltk's sentence_bleu (weights) ?

from jury import Jury

scorer = Jury()
predictions = [
    ["the cat is on the mat", "There is cat playing on the mat"], 
    ["Look!    a wonderful day."]
]
references = [
    ["the cat is playing on the mat.", "The cat plays on the mat."], 
    ["Today is a wonderful day", "The weather outside is wonderful."]
]
scores = scorer(predictions=predictions, references=references)

Output:

{'empty_predictions': 0,
 'total_items': 2,
 'bleu_1': {'score': 0.42370250917168295,
  'precisions': [0.8823529411764706,
   0.6428571428571429,
   0.45454545454545453,
   0.125],
  'brevity_penalty': 1.0,
  'length_ratio': 1.0,
  'translation_length': 11,
  'reference_length': 11},
 'bleu_2': {'score': 0.42370250917168295,
  'precisions': [0.8823529411764706,
   0.6428571428571429,
   0.45454545454545453,
   0.125],
  'brevity_penalty': 1.0,
  'length_ratio': 1.0,
  'translation_length': 11,
  'reference_length': 11},
 'bleu_3': {'score': 0.42370250917168295,
  'precisions': [0.8823529411764706,
   0.6428571428571429,
   0.45454545454545453,
   0.125],
  'brevity_penalty': 1.0,
  'length_ratio': 1.0,
  'translation_length': 11,
  'reference_length': 11},
 'bleu_4': {'score': 0.42370250917168295,
  'precisions': [0.8823529411764706,
   0.6428571428571429,
   0.45454545454545453,
   0.125],
  'brevity_penalty': 1.0,
  'length_ratio': 1.0,
  'translation_length': 11,
  'reference_length': 11},
 'meteor': {'score': 0.5420511682934044},
 'rouge': {'rouge1': 0.7783882783882783,
  'rouge2': 0.5925324675324675,
  'rougeL': 0.7426739926739926,
  'rougeLsum': 0.7426739926739926}}

The text was updated successfully, but these errors were encountered:

devrimcavusoglu · 2021-11-14T09:57:18Z

Hey @Axe--, there seems to be a bug with bleu which doesnt regard max_order currently, I'll look into it ASAP. Nice catch btw.

devrimcavusoglu · 2021-11-14T10:33:25Z

It is currently fixed in #59. I'll draft a new release today.

devrimcavusoglu · 2021-11-14T12:09:00Z

@Axe-- , the release is out. It is fixed in the new version (2.1.2). Thanks for bringing it out 👍

Axe-- · 2021-11-14T17:52:22Z

Thanks! :-)

devrimcavusoglu mentioned this issue Nov 14, 2021

Bug fix: bleu returning same score with different max_order is fixed. #59

Merged

devrimcavusoglu closed this as completed Nov 14, 2021

devrimcavusoglu added the bug Something isn't working label Nov 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding BLEU Score ('bleu_n') #58

Understanding BLEU Score ('bleu_n') #58

Axe-- commented Nov 14, 2021

devrimcavusoglu commented Nov 14, 2021 •

edited

Loading

devrimcavusoglu commented Nov 14, 2021

devrimcavusoglu commented Nov 14, 2021 •

edited

Loading

Axe-- commented Nov 14, 2021

Understanding BLEU Score ('bleu_n') #58

Understanding BLEU Score ('bleu_n') #58

Comments

Axe-- commented Nov 14, 2021

devrimcavusoglu commented Nov 14, 2021 • edited Loading

devrimcavusoglu commented Nov 14, 2021

devrimcavusoglu commented Nov 14, 2021 • edited Loading

Axe-- commented Nov 14, 2021

devrimcavusoglu commented Nov 14, 2021 •

edited

Loading

devrimcavusoglu commented Nov 14, 2021 •

edited

Loading