<a href="https://colab.research.google.com/github/gcunhase/NLPMetrics/blob/master/notebooks/gleu.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## GLEU: Google-BLEU

*NLP evaluation metric used in Machine Translation tasks*

*Suitable for measuring sentence level similarity*

*Range: 0 (no match) to 1 (exact match)*

### 1. Libraries
*Install and import necessary libraries*


In [0]:
import nltk
import nltk.translate.gleu_score as gleu

import numpy
import os

try:
  nltk.data.find('tokenizers/punkt')
except LookupError:
  nltk.download('punkt')

### 2. Dataset
*Array of words: candidate and reference sentences split into words*

In [0]:
hyp = str('she read the book because she was interested in world history').split()
ref_a = str('she read the book because she was interested in world history').split()
ref_b = str('she was interested in world history because she read the book').split()

### 3. *Sentence* score calculation
*Compares 1 hypothesis (candidate or source sentence) with 1+ reference sentences, returning the highest score when compared to multiple reference sentences.*

In [12]:
score_ref_a = gleu.sentence_gleu([ref_a], hyp)
print("Hyp and ref_a are the same: {}".format(score_ref_a))
score_ref_b = gleu.sentence_gleu([ref_b], hyp)
print("Hyp and ref_b are different: {}".format(score_ref_b))
score_ref_ab = gleu.sentence_gleu([ref_a, ref_b], hyp)
print("Hyp vs multiple refs: {}".format(score_ref_ab))

Hyp and ref_a are the same: 1.0
Hyp and ref_b are different: 0.7894736842105263
Hyp vs multiple refs: 1.0


### 4. *Corpus* score calculation
*Compares 1 candidate document with multiple sentence and 1+ reference documents also with multiple sentences.*

In [13]:
score_ref_a = gleu.corpus_gleu([[ref_a]], [hyp])
print("1 document with 1 reference sentence: {}".format(score_ref_a))
score_ref_a = gleu.corpus_gleu([[ref_a, ref_b]], [hyp])
print("1 document with 2 reference sentences: {}".format(score_ref_a))
score_ref_a = gleu.corpus_gleu([[ref_a], [ref_b]], [hyp, hyp])
print("2 documents with 1 reference sentence each: {}".format(score_ref_a))

1 document with 1 reference sentence: 1.0
1 document with 2 reference sentences: 1.0
2 documents with 1 reference sentence each: 0.8947368421052632


### 5. GLEU-$n$
*In GLEU-$n$, $n$-gram scores can be obtained in both **sentence** and **corpus** calculations and they're indicated by **min_len** and **max_len** parameters.*

* *min_len*: minimum order of $n$-grams it should extract
* *max_len*: maximum order of $n$-grams it should extract


In [17]:
score_1to4grams = gleu.sentence_gleu([ref_b], hyp, min_len=1, max_len=4)
score_1to2grams = gleu.sentence_gleu([ref_b], hyp, min_len=1, max_len=2)
print("1 to 4 grams: {}".format(score_1to4grams))
print("1 to 2 grams: {}".format(score_1to2grams))

1 to 4 grams: 0.7894736842105263
1 to 2 grams: 0.9523809523809523
