<a href="https://colab.research.google.com/github/finardi/WatSpeed_LLM_foundation/blob/main/Module4%3A%20Multilingual_Question_Answering_with_Transformers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Module 4 - Multilingual Question Answering with Transformers

In this notebook, we will explore multilingual question answering using Transformers. Specifically, we will be using the XLM-RoBERTa model, which is a powerful language model pre-trained on an extensive dataset of 2.5TB of filtered CommonCrawl data, encompassing 100 different languages. The model was introduced in the paper "[Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116)" by Conneau et al. and initially released in the repository associated with the paper.

Our focus will be on leveraging the capabilities of the XLM-RoBERTa model that has been fine-tuned for question answering using Squad2 dataset for training: [xlm-roberta-large-squad2](https://huggingface.co/deepset/xlm-roberta-large-squad2).
Squad2 primarily consists of question answering examples in English. However, our ultimate goal is to evaluate the model's performance on a diverse range of languages. For this purpose, we will employ the [MLQA dataset](https://github.com/facebookresearch/MLQA), which contains multilingual and cross-lingual examples from seven different languages: English, Arabic, German, Spanish, Hindi, Vietnamese, and Simplified Chinese.

By conducting question answering experiments on the MLQA dataset, we aim to assess the XLM-RoBERTa model's ability to handle questions and provide accurate answers across multiple languages. 


# Installing required packages

In this example, we have to install three libraries:  `transformers`, `datasets`, and `sentencepiece`, all from Hugging Face. [Hugging Face](https://huggingface.co/) is an AI startup that has gained significant traction in the NLP community recently. It provides a suite of open-source tools and libraries, making it easy for developers and researchers to work with state-of-the-art NLP models.

**`transformers`**:

Transformers is an open-source library for NLP developed by Hugging Face. It provides state-of-the-art pre-trained models for various NLP tasks, such as text classification, sentiment analysis, question-answering, named entity recognition, etc. The library is built on top of PyTorch and TensorFlow and provides easy-to-use interfaces to access pre-trained models and fine-tune them on specific tasks. The library also provides tools for training custom models and sharing them with the community.

**`datasets`**:

Datasets is another open-source library developed by Hugging Face that provides a collection of preprocessed datasets for various NLP tasks, such as sentiment analysis, natural language inference, machine translation, and many more. The library provides a unified API to access these datasets, making it easy to load, process, and analyze them.

**`sentencepiece`**:

Sentencepiece is an open-source library developed by Google for subword text processing. It is an unsupervised learning method that constructs a fixed-size vocabulary of subword units for a given language corpus. Sentencepiece enables the generation of a custom tokenization scheme that divides text into smaller subword units, which can better handle rare words and out-of-vocabulary words than traditional word-based tokenization.

In [None]:
!pip install transformers
!pip install datasets
!pip install sentencepiece

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.29.2-py3-none-any.whl (7.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.1/7.1 MB[0m [31m42.3 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers)
  Downloading huggingface_hub-0.14.1-py3-none-any.whl (224 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m224.5/224.5 kB[0m [31m24.1 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m107.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.14.1 tokenizers-0.13.3 transformers-4.29.2
Looking in i

# Setting the device

In this example, we will use a GPU to speed up the processing of our model. GPUs (Graphics Processing Units) are specialized processors that are optimized for performing large-scale computations in parallel. By using a GPU, we can accelerate the training and inference of a machine learning model, which can significantly reduce the time required to complete these tasks.

Before we begin, we need to check whether a GPU is available and select it as the default device for our PyTorch operations. This is because PyTorch can use either a CPU or a GPU to perform computations, and by default, it will use the CPU. 

For using a GPU in Google Colab: 
1. Click on the "Runtime" menu at the top of the screen.
2. From the dropdown menu, click on "Change runtime type".
3. In the popup window that appears, select "GPU" as the hardware accelerator.
4. Click on the "Save" button.

That's it! Now you can use the GPU for faster computations in your notebook. 

In [None]:
!nvidia-smi

Sat May 20 17:57:59 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    23W / 300W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
import torch

if torch.cuda.is_available(): 
   dev = "cuda:0"
else: 
   dev = "cpu"
device = torch.device(dev)
print('Using {}'.format(device))

Using cuda:0


# Download the model

To download the XLM-RoBERTa model pre-trained for question answering, you can use the code below.

In this code snippet, we utilize the **`AutoTokenizer`** and **`AutoModelForQuestionAnswering`** classes from the Transformers library. The **`AutoTokenizer`** is responsible for loading the appropriate tokenizer for the model, while **`AutoModelForQuestionAnswering`** loads the pre-trained XLM-RoBERTa model fine-tuned for question answering.

Make sure to replace **`device`** with the appropriate device you are using for training (e.g., **`device = torch.device("cuda")`** for GPU acceleration).

In [None]:
from transformers import AutoTokenizer, AutoModelForQuestionAnswering

tokenizer = AutoTokenizer.from_pretrained("deepset/xlm-roberta-large-squad2")

model = AutoModelForQuestionAnswering.from_pretrained("deepset/xlm-roberta-large-squad2")

model.to(device)

Downloading (…)okenizer_config.json:   0%|          | 0.00/179 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/606 [00:00<?, ?B/s]

Downloading (…)tencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/150 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/2.24G [00:00<?, ?B/s]

XLMRobertaForQuestionAnswering(
  (roberta): XLMRobertaModel(
    (embeddings): XLMRobertaEmbeddings(
      (word_embeddings): Embedding(250002, 1024, padding_idx=1)
      (position_embeddings): Embedding(514, 1024, padding_idx=1)
      (token_type_embeddings): Embedding(1, 1024)
      (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): XLMRobertaEncoder(
      (layer): ModuleList(
        (0-23): 24 x XLMRobertaLayer(
          (attention): XLMRobertaAttention(
            (self): XLMRobertaSelfAttention(
              (query): Linear(in_features=1024, out_features=1024, bias=True)
              (key): Linear(in_features=1024, out_features=1024, bias=True)
              (value): Linear(in_features=1024, out_features=1024, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): XLMRobertaSelfOutput(
              (dense): Linear(in_features=1024, out_feature

# Download dataset

In this code snippet, we use the **`load_dataset`** function from the Hugging Face **`datasets`** library to load the MLQA dataset. We specify the language code for the desired language in the **`language`** variable. The code will download the MLQA dataset for the specified language and assign it to the **`dataset`** variable.

In [None]:
from datasets import load_dataset

language = "es" # @param ["es","ar","de","en","hi","vi","ze"]

dataset = load_dataset("mlqa", f"mlqa.{language}.{language}")

Downloading builder script:   0%|          | 0.00/8.44k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/114k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/34.9k [00:00<?, ?B/s]

Downloading and preparing dataset mlqa/mlqa.es.es to /root/.cache/huggingface/datasets/mlqa/mlqa.es.es/1.0.0/224fde9ea61350ffb013e4beff31d44c6e125ce82c3aa4af70298eceabc8f7f7...


Downloading data:   0%|          | 0.00/75.7M [00:00<?, ?B/s]

Generating test split:   0%|          | 0/5253 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/500 [00:00<?, ? examples/s]

Dataset mlqa downloaded and prepared to /root/.cache/huggingface/datasets/mlqa/mlqa.es.es/1.0.0/224fde9ea61350ffb013e4beff31d44c6e125ce82c3aa4af70298eceabc8f7f7. Subsequent calls will reuse this data.


  0%|          | 0/2 [00:00<?, ?it/s]

# Evaluation

For computing metrics, we will utilize the official MLQA (Multilingual Question Answering) script provided by Facebook Research. To ensure accurate and standardized metric calculation, we will clone the MLQA repository using the command `!git clone https://github.com/facebookresearch/MLQA.git`. This repository contains the necessary evaluation script, which we will employ to evaluate our model's performance on the MLQA dataset. By leveraging this official script, we can obtain reliable and consistent metrics to assess the effectiveness of our XLM-RoBERTa model in a multilingual question answering setting.

In [None]:
!git clone https://github.com/facebookresearch/MLQA.git

Cloning into 'MLQA'...
remote: Enumerating objects: 19, done.[K
remote: Counting objects: 100% (2/2), done.[K
remote: Compressing objects: 100% (2/2), done.[K
remote: Total 19 (delta 0), reused 0 (delta 0), pack-reused 17[K
Unpacking objects: 100% (19/19), 12.49 KiB | 2.50 MiB/s, done.


The code below calculates the F1 score and exact match (EM) score for a set of predictions and their corresponding reference answers in the context of the MLQA evaluation. The `compute_metrics` function uses the official MLQA script to obtain the average F1 and EM scores for a given language.

In [None]:
from MLQA.mlqa_evaluation_v1 import f1_score, exact_match_score
import numpy as np

def compute_metrics(references, predictions,language):
  f1s = []
  ems = []
  for i,item in enumerate(predictions):
    f1s.append(f1_score(item,references[i],lang=language))
    ems.append(exact_match_score(item,references[i],lang=language))
  
  return {
      "f1": np.mean(f1s),
      "em": np.mean(ems)
  }


The code below performs the evaluation of the XLM-RoBERTa question answering model using MLQA dataset.

1. **Importing Dependencies**: The code imports the necessary dependencies, including `torch` for tensor operations and deep learning, and `tqdm` for displaying progress bars during the evaluation.

2. **Defining the Evaluation Function**: The function `evaluate` takes three parameters: `model`, `tokenizer`, and `dataset`. `model` represents the question answering model, `tokenizer` is responsible for encoding the input, and `dataset` contains the validation examples.

3. **Initializing Lists**: Two empty lists, `predictions` and `references`, are created to store the predicted answers and reference answers, respectively.

4. **Performing Evaluation**: The code iterates over the examples in the validation subset of the dataset using `tqdm` to display a progress bar. For each example, it retrieves the context, question, and reference answer.

5. **Tokenizing the Input**: The input is tokenized using the tokenizer's `encode_plus` method. The question and context are concatenated and passed as input. The resulting tensors are returned as a dictionary and moved to the `device` (assumed to be previously defined).

6. **Performing Question Answering Inference**: The model is invoked with the input tensors using the `model` object. The output is obtained using the `start_logits` and `end_logits` attributes of the `outputs` object, representing the predicted start and end positions of the answer in the input sequence.

7. **Decoding Predicted Answers**: The predicted answer is obtained by decoding the token IDs corresponding to the predicted start and end positions. The `tokenizer.decode` method is used to convert the token IDs back into text.

8. **Collecting Predictions and References**: The predicted answers and reference answer are collected by extending the `predictions` and `references` lists, respectively. If there are multiple predicted answers for the same question, they are all added to the `predictions` list, and the reference answer is repeated for each predicted answer.

9. **Returning the Metrics**: The `compute_metrics` function is called to compute the evaluation metrics (F1 score and exact match) based on the collected predictions and references. The `references`, `predictions`, and `language` parameters are passed to this function.

10. **Printing the Results**: The computed metrics are printed to the console. The F1 score is displayed as "F1-bow", and the exact match score is displayed as "Exact match".

In [None]:
import torch
from tqdm import tqdm

# Define the evaluation function
def evaluate(model, tokenizer, dataset):
    predictions = []
    references = []

    for example in tqdm(dataset["validation"]):
        context = example["context"]
        question = example["question"]
        reference = example["answers"]["text"][0]  # Only consider the first answer as the ground truth

        # Tokenize the input
        inputs = tokenizer.encode_plus(question, context, return_tensors="pt", truncation=True, padding="max_length").to(device)

        # Perform the question answering inference
        with torch.no_grad():
            outputs = model(**inputs)

        # Move the outputs to CPU and decode the predicted answer
        answer_start = torch.argmax(outputs.start_logits, dim=1).squeeze().cpu()
        answer_end = torch.argmax(outputs.end_logits, dim=1).squeeze().cpu() + 1
        predicted_answer = [tokenizer.decode(ids) for ids in inputs["input_ids"][:, answer_start:answer_end]]

        # Collect predictions and references
        predictions.extend(predicted_answer)
        references.extend([reference] * len(predicted_answer))

    return  compute_metrics(references,predictions, language)

# Evaluate the model
result = evaluate(model, tokenizer, dataset)

print()
print(f"F1-bow: {result['f1']:.2f}")
print(f"Exact match: {result['em']:.2f}")

100%|██████████| 500/500 [00:26<00:00, 19.01it/s]


F1-bow: 0.62
Exact match: 0.46





# Try your own context and question

In the code below you can try multilingual or cross lingual question answering using XMLRoBERTa.

Try providing the context of one language and the question of another.

In [None]:
context="O município de Penaforte inicialmente denominou-se de Baixio do Couro, mais tarde, chamou-se Presidente Vargas e finalmente recebeu o nome de Penaforte, em homenagem ao ilustre filho de Jardim, o Cônego Raimundo Ulisses Penaforte. Era este, figura destacada do clero cearense, jornalista, orador primoroso, autor de vários trabalhos de real mérito sobre assuntos religiosos, filosóficos e históricos, além de pertencer a inúmeras associações culturais brasileiras e estrangeiras. O município de Penaforte foi desmembrado do de Jati, que também fizera parte do território de Jardim. Seu povoamento está ligado ao intercâmbio entre Pernambuco e Ceará, graças à sua posição fronteiriça e de parada para muitos viajantes que enfrentavam as poeirentes estradas em busca do Cariri cearense. Dentre as famílias dos primeiros povoadores destacam-se os Matias, Ângelo, Leite e Ferreira. Penaforte é o município mais meridional do Estado do Ceará. Gentílico: peanafortense" # @param
question = "Who does the name of the city honor?" # @param

inputs = tokenizer.encode_plus(question, context, return_tensors="pt", truncation=True, padding="max_length").to(device)

# Perform the question answering inference
with torch.no_grad():
    outputs = model(**inputs)

# Move the outputs to CPU and decode the predicted answer
answer_start = torch.argmax(outputs.start_logits, dim=1).squeeze().cpu()
answer_end = torch.argmax(outputs.end_logits, dim=1).squeeze().cpu() + 1
predicted_answer = [tokenizer.decode(ids) for ids in inputs["input_ids"][:, answer_start:answer_end]]
print(f"Answer: {predicted_answer[0]}")

Answer: Cônego Raimundo Ulisses Penaforte
