<a href="https://colab.research.google.com/github/mrhamedani/LLM-Agents/blob/main/10_bleu_evaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div>
    <h1>Large Language Models Projects</a></h1>
    <h3>Apply and Implement Strategies for Large Language Models</h3>
    <h2>4.1-BLEU,  ROUGE and N-Grams. </h2>
    <h3>Evaluating translations with BLEU</h3>
</div>

by [Pere Martra](https://www.linkedin.com/in/pere-martra/)
________
Models: nllb-200-distilled-600M

Colab environment: CPU.

Keys:
* Bleu Evaluation.
* Translation Pipeline.
* Google Translator API.
______



In this notebook, we will use the BLEU metric to compare the quality of two different approaches for performing translations.

As my primary language is Spanish, I will translate a few lines from the beginning of this chapter 4-Evaluating Models from my book [Large Language Models Projects](https://www.amazon.com/Pere-Martra-Manonelles/dp/B0D6XQ44ZP)from English to Spanish.

My translations will be taken as the reference translations. In other words, they will be used as the basis upon which the quality of the automatic translations will be determined.



In [None]:
!pip install -q googletrans==3.1.0a0
!pip install -q evaluate==0.4.2
!pip install -q transformers==4.42.4

  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.1/55.1 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m133.4/133.4 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.6/42.6 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.8/58.8 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m65.0/65.0 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m13.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.6/53.6 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for googletrans (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
from googletrans import Translator
import transformers
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
import evaluate

In [None]:
#Sentences to Translate.
sentences = [
    "In the previous chapters, you've mainly seen how to work with OpenAI models, and you've had a very practical introduction to Hugging Face's open-source models, the use of embeddings, vector databases, and agents.",
    "These have been very practical chapters in which I've tried to gradually introduce concepts that have allowed you, or at least I hope so, to scale up your knowledge and start creating projects using the current technology stack of large language models."
    ]

In [None]:
#Spanish Translation References.
reference_translations = [
    ["En los capítulos anteriores has visto mayoritariamente como trabajar con los modelos de OpenAI, y has tenido una introducción muy práctica a los modelos Open Source de Hugging Face, al uso de embeddings, las bases de datos vectoriales, los agentes."],
    ["Han sido capítulos muy prácticos en los que he intentado ir introduciendo conceptos que te han permitido, o eso espero, ir escalando en tus conocimientos y empezar a crear proyectos usando el stack tecnológico actual de los grandes modelos de lenguaje."]
    ]

We will perform the first translation using the NLLB model, a small model specialized in performing translations, which we will retrieve from Hugging Face.

In [None]:
model_id = "facebook/nllb-200-distilled-600M"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

tokenizer_config.json:   0%|          | 0.00/564 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/4.85M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.3M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/3.55k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/846 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.46G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/189 [00:00<?, ?B/s]

When creating the pipeline, we pass the source language and the target language of the translation to it.

In [None]:
translator = pipeline('translation', model=model, tokenizer=tokenizer,
                        src_lang="eng_Latn", tgt_lang="spa_Latn")

In [None]:
translations_nllb = []

for text in sentences:
  print ("to translate: " + text)
  translation = ""
  translation = translator(text)

  #Add the summary to summaries list
  translations_nllb += translation[0].values()

to translate: In the previous chapters, you've mainly seen how to work with OpenAI models, and you've had a very practical introduction to Hugging Face's open-source models, the use of embeddings, vector databases, and agents.
to translate: These have been very practical chapters in which I've tried to gradually introduce concepts that have allowed you, or at least I hope so, to scale up your knowledge and start creating projects using the current technology stack of large language models.


Now we have the translations stored in the list 'translations_nllb'.

In [None]:
translations_nllb

['En los capítulos anteriores, han visto principalmente cómo trabajar con modelos OpenAI, y han tenido una introducción muy práctica a los modelos de código abierto de Hugging Face, el uso de embebidos, bases de datos vectoriales y agentes.',
 'Estos han sido capítulos muy prácticos en los que he intentado introducir gradualmente conceptos que han permitido, o al menos espero que lo hagan, ampliar sus conocimientos y comenzar a crear proyectos utilizando la tecnología actual de los modelos de lenguaje grande.']

##Create Translations with Google Traslator.

As a second source for translations, we will use the Google Translator API.

In [None]:
translator_google = Translator()

In [None]:
translations_google = []

for text in sentences:
  print ("to translate: " + text)
  translation = ""
  translation = translator_google.translate(text, dest="es")

  #Add the summary to summaries list
  translations_google.append(translation.text)
  print (translation.text)

to translate: In the previous chapters, you've mainly seen how to work with OpenAI models, and you've had a very practical introduction to Hugging Face's open-source models, the use of embeddings, vector databases, and agents.
En los capítulos anteriores, vio principalmente cómo trabajar con modelos OpenAI y tuvo una introducción muy práctica a los modelos de código abierto de Hugging Face, el uso de incrustaciones, bases de datos vectoriales y agentes.
to translate: These have been very practical chapters in which I've tried to gradually introduce concepts that have allowed you, or at least I hope so, to scale up your knowledge and start creating projects using the current technology stack of large language models.
Estos han sido capítulos muy prácticos en los que he intentado introducir gradualmente conceptos que te han permitido, o al menos eso espero, ampliar tus conocimientos y empezar a crear proyectos utilizando la tecnología actual de grandes modelos de lenguaje.


In this list, we have the translations created by Google.

In [None]:
translations_google

['En los capítulos anteriores, vio principalmente cómo trabajar con modelos OpenAI y tuvo una introducción muy práctica a los modelos de código abierto de Hugging Face, el uso de incrustaciones, bases de datos vectoriales y agentes.',
 'Estos han sido capítulos muy prácticos en los que he intentado introducir gradualmente conceptos que te han permitido, o al menos eso espero, ampliar tus conocimientos y empezar a crear proyectos utilizando la tecnología actual de grandes modelos de lenguaje.']

## Evaluate translations with BLEU

We will use the BLEU implementation from the Evaluate library by Hugging Face.

In [None]:
bleu = evaluate.load('bleu')

Downloading builder script:   0%|          | 0.00/5.94k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/3.34k [00:00<?, ?B/s]

In [None]:
results_nllb = bleu.compute(predictions=translations_nllb, references=reference_translations)


To obtain the metrics, we pass the translated text and the reference text to the BLEU function.

Note that the translated text is a list of translations:
["Translation1", "Translation2"]

Whereas the reference texts are a list of lists of text. This allows for providing multiple references per translation:

[["reference1 Translation1", "reference2 Translation1"],
["reference2 Translation2", "reference2 Translation2"]]


In [None]:
results_google = bleu.compute(predictions=translations_google, references=reference_translations)

In [None]:
print(results_nllb)

{'bleu': 0.3686324165619373, 'precisions': [0.7159090909090909, 0.47674418604651164, 0.30952380952380953, 0.18292682926829268], 'brevity_penalty': 0.988700685876667, 'length_ratio': 0.9887640449438202, 'translation_length': 88, 'reference_length': 89}


In [None]:
print(results_google)

{'bleu': 0.44975901966417653, 'precisions': [0.7710843373493976, 0.5679012345679012, 0.4177215189873418, 0.2987012987012987], 'brevity_penalty': 0.9302618655343314, 'length_ratio': 0.9325842696629213, 'translation_length': 83, 'reference_length': 89}


It appears that the translation performed by the Google API is significantly better than the one performed by the NLLB model.