<a href="https://colab.research.google.com/github/mmaguero/diploma_fpuna_nlp_ia/blob/master/2025/guarani_wiki_question_answering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Transformers installation
! pip install transformers datasets evaluate accelerate
# To install from source instead of the last release, comment the command above and uncomment the following one.
# ! pip install git+https://github.com/huggingface/transformers.git

Collecting evaluate
  Downloading evaluate-0.4.6-py3-none-any.whl.metadata (9.5 kB)
Downloading evaluate-0.4.6-py3-none-any.whl (84 kB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m84.1/84.1 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.6


# Question answering

In [2]:
#@title
from IPython.display import HTML

HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/ajPx5LwJD-I?rel=0&amp;controls=0&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>')



Question answering tasks return an answer given a question. If you've ever asked a virtual assistant like Alexa, Siri or Google what the weather is, then you've used a question answering model before. There are two common types of question answering tasks:

- Extractive: extract the answer from the given context.
- Abstractive: generate an answer from the context that correctly answers the question.

This guide will show you how to:

1. Finetune [DistilBERT](https://huggingface.co/distilbert/distilbert-base-uncased) on the [SQuAD](https://huggingface.co/datasets/squad) dataset for extractive question answering.
2. Use your finetuned model for inference.

<Tip>

To see all architectures and checkpoints compatible with this task, we recommend checking the [task-page](https://huggingface.co/tasks/question-answering)

</Tip>

Before you begin, make sure you have all the necessary libraries installed:

```bash
pip install transformers datasets evaluate
```

We encourage you to login to your Hugging Face account so you can upload and share your model with the community. When prompted, enter your token to login:

In [24]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv‚Ä¶

## Load SQuAD dataset

Start by loading a smaller subset of the SQuAD dataset from the ü§ó Datasets library. This'll give you a chance to experiment and make sure everything works before spending more time training on the full dataset.

In [4]:
#from datasets import load_dataset

#squad = load_dataset("squad")

Split the dataset's `train` split into a train and test set with the [train_test_split](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset.train_test_split) method:

In [5]:
from datasets import load_dataset

# 1. Reload the original squad dataset
squad = load_dataset("alexandrainst/multi-wiki-qa", "gn")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

gn/train-00000-of-00001.parquet:   0%|          | 0.00/2.06M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/5003 [00:00<?, ? examples/s]

Then take a look at an example:

In [31]:
import random
rnd = random.randint(0, len(squad["train"]))
rnd, squad["train"][rnd]

(2253,
 {'id': 'https://gn.wikipedia.org/wiki/Ant%C3%B4nio%20Fagundes',
  'title': 'Ant√¥nio Fagundes',
  'context': 'Ant√¥nio da Silva Fagundes Filho (Rio de Janeiro, 18 jasyrundy ary 1949-pe) ha\'e peteƒ© karai mba\'eapoh√°ra Pindorama ret√£ pegua, hembiapo het√°re opu\'aka heta jop√≥i omomba\'eguasu chupe.\n\nHembiapo\n\nTa\'angambyr√Ωpe \nTembiasagua\'u - Hekoha\'√£ngandy\n 1968 - Antonio Maria\n 1969 - Nenhum Homem √© Deus.... Netinho\n 1972 - A Revolta dos Anjos.... V√≠tor\n 1972 - Bel-Ami.... Cadu\n 1973 - Mulheres de Areia.... Alaor\n 1974 - O Mach√£o.... Petruchio\n 1976 - Saramandaia.... Lua Viana\n 1977 - Nina - Bruno\n 1978 - Dancin\' Days.... Cac√°\n 1979/1981 - Carga Pesada.... Pedro\n 1981 - Amizade Colorida.... Edu\n 1982 - Avenida Paulista.... Alex Torres\n 1983 - Champagne.... Jo√£o Maria\n 1983 - Louco Amor.... Jorge Augusto\n 1984 - Corpo a Corpo.... Osmar\n 1988 - Vale Tudo.... Ivan Meireles\n 1990 - Rainha da Sucata.... Caio Szimanski\n 1991 - Mundo da Lua.... Rog

Adding test and validation sets...

In [7]:

# 2. Split the original 'train' split into:
# 80% for new 'train' and 20% for 'temp_test'
squad_temp = squad["train"].train_test_split(test_size=0.2, seed=42)
print("Initial split (train and temp_test):")
print(squad_temp)

# 3. add validation
squad_final = squad_temp["train"].train_test_split(test_size=0.2, seed=42)
squad_final["validation"] = squad_final.pop("test")
squad_final["test"] = squad_temp["test"]

print("Final dataset splits (train, validation, test):")
print(squad_final)

Initial split (train and temp_test):
DatasetDict({
    train: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 4002
    })
    test: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 1001
    })
})
Final dataset splits (train, validation, test):
DatasetDict({
    train: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 3201
    })
    validation: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 801
    })
    test: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 1001
    })
})


There are several important fields here:

- `answers`: the starting location of the answer token and the answer text.
- `context`: background information from which the model needs to extract the answer.
- `question`: the question a model should answer.

## Preprocess

In [8]:
#@title
from IPython.display import HTML

HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/qgaM0weJHpA?rel=0&amp;controls=0&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>')



The next step is to load a DistilBERT tokenizer to process the `question` and `context` fields:

In [9]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("mmaguero/gn-bert-tiny-cased")#"distilbert/distilbert-base-uncased")

tokenizer_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

There are a few preprocessing steps particular to question answering tasks you should be aware of:

1. Some examples in a dataset may have a very long `context` that exceeds the maximum input length of the model. To deal with longer sequences, truncate only the `context` by setting `truncation="only_second"`.
2. Next, map the start and end positions of the answer to the original `context` by setting
   `return_offset_mapping=True`.
3. With the mapping in hand, now you can find the start and end tokens of the answer. Use the [sequence_ids](https://huggingface.co/docs/tokenizers/main/en/api/encoding#tokenizers.Encoding.sequence_ids) method to
   find which part of the offset corresponds to the `question` and which corresponds to the `context`.

Here is how you can create a function to truncate and map the start and end tokens of the `answer` to the `context`:

In [10]:
def preprocess_function(examples):
    questions = [q.strip() for q in examples["question"]]
    inputs = tokenizer(
        questions,
        examples["context"],
        max_length=384,
        truncation="only_second",
        return_offsets_mapping=True,
        padding="max_length",
    )

    offset_mapping = inputs.pop("offset_mapping")
    answers = examples["answers"]
    start_positions = []
    end_positions = []

    for i, offset in enumerate(offset_mapping):
        answer = answers[i]
        start_char = answer["answer_start"][0]
        end_char = answer["answer_start"][0] + len(answer["text"][0])
        sequence_ids = inputs.sequence_ids(i)

        # Find the start and end of the context
        idx = 0
        while sequence_ids[idx] != 1:
            idx += 1
        context_start = idx
        while sequence_ids[idx] == 1:
            idx += 1
        context_end = idx - 1

        # If the answer is not fully inside the context, label it (0, 0)
        if offset[context_start][0] > end_char or offset[context_end][1] < start_char:
            start_positions.append(0)
            end_positions.append(0)
        else:
            # Otherwise it's the start and end token positions
            idx = context_start
            while idx <= context_end and offset[idx][0] <= start_char:
                idx += 1
            start_positions.append(idx - 1)

            idx = context_end
            while idx >= context_start and offset[idx][1] >= end_char:
                idx -= 1
            end_positions.append(idx + 1)

    inputs["start_positions"] = start_positions
    inputs["end_positions"] = end_positions
    return inputs

To apply the preprocessing function over the entire dataset, use ü§ó Datasets [map](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset.map) function. You can speed up the `map` function by setting `batched=True` to process multiple elements of the dataset at once. Remove any columns you don't need:

In [11]:
tokenized_squad = squad_final.map(preprocess_function, batched=True, remove_columns=squad_final["train"].column_names)
print("Tokenized dataset structure:")
print(tokenized_squad)

Map:   0%|          | 0/3201 [00:00<?, ? examples/s]

Map:   0%|          | 0/801 [00:00<?, ? examples/s]

Map:   0%|          | 0/1001 [00:00<?, ? examples/s]

Tokenized dataset structure:
DatasetDict({
    train: Dataset({
        features: ['input_ids', 'token_type_ids', 'attention_mask', 'start_positions', 'end_positions'],
        num_rows: 3201
    })
    validation: Dataset({
        features: ['input_ids', 'token_type_ids', 'attention_mask', 'start_positions', 'end_positions'],
        num_rows: 801
    })
    test: Dataset({
        features: ['input_ids', 'token_type_ids', 'attention_mask', 'start_positions', 'end_positions'],
        num_rows: 1001
    })
})


Now create a batch of examples using [DefaultDataCollator](https://huggingface.co/docs/transformers/main/en/main_classes/data_collator#transformers.DefaultDataCollator). Unlike other data collators in ü§ó Transformers, the [DefaultDataCollator](https://huggingface.co/docs/transformers/main/en/main_classes/data_collator#transformers.DefaultDataCollator) does not apply any additional preprocessing such as padding.

In [12]:
from transformers import DefaultDataCollator

data_collator = DefaultDataCollator()

## Evaluate

Evaluation for question answering requires a significant amount of postprocessing. To avoid taking up too much of your time, this guide skips the evaluation step. The [Trainer](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer) still calculates the evaluation loss during training so you're not completely in the dark about your model's performance.

If you have more time and you're interested in how to evaluate your model for question answering, take a look at the [Question answering](https://huggingface.co/course/chapter7/7?fw=pt#post-processing) chapter from the ü§ó Hugging Face Course!

In [13]:
def preprocess_validation_function(examples):
    questions = [q.strip() for q in examples["question"]]
    inputs = tokenizer(
        questions,
        examples["context"],
        max_length=384,
        truncation="only_second",
        return_offsets_mapping=True,
        padding="max_length",
        return_overflowing_tokens=True,
        stride=128,
    )

    sample_map = inputs.pop("overflow_to_sample_mapping")
    inputs["example_id"] = []

    for i in range(len(inputs["input_ids"])):
        sample_idx = sample_map[i]
        inputs["example_id"].append(examples["id"][sample_idx])

    return inputs


In [14]:
tokenized_squad_train = squad_final["train"].map(
    preprocess_function,
    batched=True,
    remove_columns=squad_final["train"].column_names
)

# For validation and test sets, remove all original columns. The preprocess_validation_function
# will add new tokenized features including `example_id`, which will be used to link back to
# the original `squad_final` datasets to retrieve `id` and `answers` for evaluation.
tokenized_squad_validation = squad_final["validation"].map(
    preprocess_validation_function,
    batched=True,
    remove_columns=squad_final["validation"].column_names
)

tokenized_squad_test = squad_final["test"].map(
    preprocess_validation_function,
    batched=True,
    remove_columns=squad_final["test"].column_names
)

from datasets import DatasetDict
tokenized_squad = DatasetDict({
    "train": tokenized_squad_train,
    "validation": tokenized_squad_validation,
    "test": tokenized_squad_test
})

print("Tokenized dataset structure:")
print(tokenized_squad)

Map:   0%|          | 0/3201 [00:00<?, ? examples/s]

Map:   0%|          | 0/801 [00:00<?, ? examples/s]

Map:   0%|          | 0/1001 [00:00<?, ? examples/s]

Tokenized dataset structure:
DatasetDict({
    train: Dataset({
        features: ['input_ids', 'token_type_ids', 'attention_mask', 'start_positions', 'end_positions'],
        num_rows: 3201
    })
    validation: Dataset({
        features: ['input_ids', 'token_type_ids', 'attention_mask', 'offset_mapping', 'example_id'],
        num_rows: 3013
    })
    test: Dataset({
        features: ['input_ids', 'token_type_ids', 'attention_mask', 'offset_mapping', 'example_id'],
        num_rows: 3976
    })
})


In [15]:
n_best_size = 20
max_answer_length = 30

print(f"Set n_best_size: {n_best_size}")
print(f"Set max_answer_length: {max_answer_length}")

Set n_best_size: 20
Set max_answer_length: 30


In [16]:
import collections
import numpy as np

def postprocess_qa_predictions(examples, features, raw_predictions, n_best_size = 20, max_answer_length = 30):
    all_start_logits, all_end_logits = raw_predictions
    # Build a map from example to its corresponding features.
    example_id_to_index = {k: i for i, k in enumerate(examples["id"])}
    features_per_example = collections.defaultdict(list)
    for i, feature in enumerate(features):
        features_per_example[example_id_to_index[feature["example_id"]]].append(i)

    predictions = collections.OrderedDict()

    # Loop through all the examples to get the predictions for each example.
    for example_index, example in enumerate(examples):
        feature_indices = features_per_example[example_index]
        min_null_score = None # Only used if squad_v2 is True.
        valid_answers = []

        context = example["context"]
        # Looping through all the features associated with the current example.
        for feature_index in feature_indices:
            # We grab the predictions of the model for this feature.
            start_logits = all_start_logits[feature_index]
            end_logits = all_end_logits[feature_index]
            offset_mapping = features[feature_index]["offset_mapping"]

            # Update minimum null score.
            cls_index = features[feature_index]["input_ids"].index(tokenizer.cls_token_id)
            feature_null_score = start_logits[cls_index] + end_logits[cls_index]
            if min_null_score is None or min_null_score < feature_null_score:
                min_null_score = feature_null_score

            # Go through all possibilities for the `n_best_size` greater start and end logits.
            start_indexes = np.argsort(start_logits)[-1 : -n_best_size - 1 : -1].tolist()
            end_indexes = np.argsort(end_logits)[-1 : -n_best_size - 1 : -1].tolist()
            for start_index in start_indexes:
                for end_index in end_indexes:
                    # Don't consider out-of-scope answers, either because the indices are out of bounds or correspond to part of the input_ids that are not in the context.
                    if (
                        start_index >= len(offset_mapping)
                        or end_index >= len(offset_mapping)
                        or offset_mapping[start_index] is None
                        or offset_mapping[end_index] is None
                    ):
                        continue
                    # Don't consider answers with a length that is either negative or too long.
                    if end_index < start_index or end_index - start_index + 1 > max_answer_length:
                        continue

                    start_char = offset_mapping[start_index][0]
                    end_char = offset_mapping[end_index][1]
                    valid_answers.append(
                        {
                            "offsets": {"start": start_char, "end": end_char},
                            "score": start_logits[start_index] + end_logits[end_index],
                            "text": context[start_char:end_char]
                        }
                    )

        if len(valid_answers) > 0:
            best_answer = sorted(valid_answers, key=lambda x: x["score"], reverse=True)[0]
        else:
            # In the event where a feature could not find any answer, we default to the example's question.
            best_answer = {"text": "", "score": 0.0}

        # Let's pick the best answer or the null answer (which is the empty string).
        predictions[example["id"]] = best_answer["text"]

    return predictions

In [17]:
import evaluate
import numpy as np
import collections

# Load the SQuAD metric
metric = evaluate.load("squad")

def compute_metrics(eval_preds):
    start_logits, end_logits = eval_preds.predictions

    # Assuming squad_final and tokenized_squad are globally accessible
    # These are needed for postprocess_qa_predictions
    val_examples = squad_final["validation"]
    val_features = tokenized_squad["validation"]

    # Post-process predictions to get formatted answers
    predictions = postprocess_qa_predictions(
        val_examples,
        val_features,
        (start_logits, end_logits),
        n_best_size=n_best_size, # Using globally defined parameter
        max_answer_length=max_answer_length # Using globally defined parameter
    )

    # Prepare references in the SQuAD format
    references = [{
        "id": ex["id"],
        "answers": ex["answers"]
    } for ex in val_examples]

    # Compute and return the SQuAD metric scores
    result = metric.compute(predictions=predictions, references=references)
    result["combined"]=result['exact_match'] * 0.5 + result['f1'] * 0.5

    return result

print("Modified 'compute_metrics' function to use SQuAD metric.")

Downloading builder script: 0.00B [00:00, ?B/s]

Downloading extra modules: 0.00B [00:00, ?B/s]

Modified 'compute_metrics' function to use SQuAD metric.


## Train

<Tip>

If you aren't familiar with finetuning a model with the [Trainer](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer), take a look at the basic tutorial [here](https://huggingface.co/docs/transformers/main/en/tasks/../training#train-with-pytorch-trainer)!

</Tip>

You're ready to start training your model now! Load DistilBERT with [AutoModelForQuestionAnswering](https://huggingface.co/docs/transformers/main/en/model_doc/auto#transformers.AutoModelForQuestionAnswering):

In [18]:
from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer

model = AutoModelForQuestionAnswering.from_pretrained("mmaguero/gn-bert-tiny-cased")#"distilbert/distilbert-base-uncased")

pytorch_model.bin:   0%|          | 0.00/37.2M [00:00<?, ?B/s]

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at mmaguero/gn-bert-tiny-cased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


At this point, only three steps remain:

1. Define your training hyperparameters in [TrainingArguments](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments). The only required parameter is `output_dir` which specifies where to save your model. You'll push this model to the Hub by setting `push_to_hub=True` (you need to be signed in to Hugging Face to upload your model).
2. Pass the training arguments to [Trainer](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer) along with the model, dataset, tokenizer, and data collator.
3. Call [train()](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer.train) to finetune your model.

In [19]:
training_args = TrainingArguments(
    output_dir="multi-wiki-qa-gn-bert-tiny-cased",
    eval_strategy="epoch", # "no"
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=20,
    weight_decay=0.01,
    save_total_limit=3,
    #metric_for_best_model="combined",
    push_to_hub=False,
    fp16=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_squad["train"],
    eval_dataset=tokenized_squad["validation"],
    processing_class=tokenizer,
    data_collator=data_collator,
    #compute_metrics=compute_metrics,

)

trainer.train()

model.safetensors:   0%|          | 0.00/37.2M [00:00<?, ?B/s]

The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 0}.
  | |_| | '_ \/ _` / _` |  _/ -_)
[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
[34m[1mwandb[0m: Paste an API key from your profile and hit enter:

 ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mmmaguero[0m to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Epoch,Training Loss,Validation Loss
1,No log,No log
2,No log,No log
3,4.937200,No log
4,4.937200,No log
5,4.227500,No log
6,4.227500,No log
7,4.227500,No log
8,3.954900,No log
9,3.954900,No log
10,3.747300,No log


TrainOutput(global_step=4020, training_loss=3.8123631624440057, metrics={'train_runtime': 258.6483, 'train_samples_per_second': 247.518, 'train_steps_per_second': 15.542, 'total_flos': 194391516211200.0, 'train_loss': 3.8123631624440057, 'epoch': 20.0})

In [26]:
try:
    #predictions, _, _ = trainer.predict(tokenized_squad["validation"])
    #start_logits, end_logits = predictions
    compute_metrics(trainer.predict(tokenized_squad["validation"]))#start_logits, end_logits, tokenized_squad["validation"], squad_final["validation"])
except Exception as e:
    print(e)

0


In [21]:
trainer.evaluate(tokenized_squad["test"])

{'eval_runtime': 4.311,
 'eval_samples_per_second': 922.292,
 'eval_steps_per_second': 57.759,
 'epoch': 20.0}

In [27]:
try:
    #predictions, _, _ = trainer.predict(tokenized_squad["test"])
    #start_logits, end_logits = predictions
    compute_metrics(trainer.predict(tokenized_squad["test"]))#start_logits, end_logits, tokenized_squad["test"], squad_final["test"])
except Exception as e:
    print(e)

0


Once training is completed, share your model to the Hub with the [push_to_hub()](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer.push_to_hub) method so everyone can use your model:

In [28]:
trainer.push_to_hub()

Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...y-cased/model.safetensors:   2%|1         |  555kB / 36.5MB            

  ...817047.e343f5aedf50.944.0:   2%|1         |   175B / 11.6kB            

  ...817314.e343f5aedf50.944.1:   1%|1         |  4.00B /   311B            

  ...y-cased/training_args.bin:   2%|1         |  89.0B / 5.91kB            

CommitInfo(commit_url='https://huggingface.co/mmaguero/multi-wiki-qa-gn-bert-tiny-cased/commit/b83f3da5ed510bf464d80139f5dc24f9434ee605', commit_message='End of training', commit_description='', oid='b83f3da5ed510bf464d80139f5dc24f9434ee605', pr_url=None, repo_url=RepoUrl('https://huggingface.co/mmaguero/multi-wiki-qa-gn-bert-tiny-cased', endpoint='https://huggingface.co', repo_type='model', repo_id='mmaguero/multi-wiki-qa-gn-bert-tiny-cased'), pr_revision=None, pr_num=None)

<Tip>

For a more in-depth example of how to finetune a model for question answering, take a look at the corresponding
[PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering.ipynb).

</Tip>

## Inference

Great, now that you've finetuned a model, you can use it for inference!

Come up with a question and some context you'd like the model to predict:

In [1]:
question = "Mba‚Äôe √°rapepa he√±√≥ikuri Jos√© Carlos Cabrera?"
context = """Jos√© Carlos Cabrera (Sapucai, 1 jasypok√µi ary 1989 -pe) ha'e peteƒ© artista paraguayo concierto guitarra cl√°sica rehegua.

Mba'apokuaa te√©va
Ary 2010 guive oiko Buenos Aires, Argentina-pe, up√©pe o√±emotenonde licenciatura de M√∫sica orek√≥va especializaci√≥n Guitarra-pe, o√±emoarand√∫vo Javier Bravo ndive Departamento de Artes Musicales y Sonido "Carlos L√≥pez Buchardo" Universidad Nacional de Artes-pe.

Ko'√£ga, ojeguereko ha'eha peteƒ©va umi omomba'eguas√∫va Agust√≠n P√≠o Barrios "Mangor√©" purah√©i, ha, jep√©mo imit√£, ha'e guitarrista paraguayo omoingev√©va Barrios rembiapokue irrepertorio-pe.

Marzo 2010 jave, ojere Europa-pe, ombovy'√°vo p√∫blico-pe umi obra "Mangor√©" Francia ha Holanda-pe.

Ome'√™ heta concierto Argentina-pe, um√≠va apyt√©pe Festival Internacional Guitarras del Mundo, 2o Encuentro Internacional Guitarra, ha Festival Internacional TSONAMI de M√∫sica Contempor√°nea, orepresent√°va Paraguay orek√≥va estreno obra contempor√°nea "Mangor√©" compositor paraguayo Nicol√°s P√©rez Gonz√°lez. Avei oime kuri Paragu√°i representante ramo Feria Internacional del Libro Buenos Aires 2011-pe.

Ojekuaa solista invitado ramo heta orquesta ndive: Orquesta Sinf√≥nica Nacional de Argentina, Orquesta Sinf√≥nica Ciudad de Asunci√≥n, Camerata Miranda, ha Orquesta C√°mara Centro Cultural Paraguayo-Americano, up√©pe oestrena concierto guitarra, flauta ha orquesta "Homage to Mangor√©" Maestro Luis Szar√°n. Omotenonde director nacional ha internacional, ha'eh√°icha Diego S√°nchez Haase, C√©sar Manuel "Lito" Barrios, Miguel A. Gilardi, ha Javier Aquino Maidana, ambue apyt√©pe.

Paragu√°ipe ombosako‚Äôi mok√µi programa amplio orek√≥va Agust√≠n Barrios rembiapo, ojejap√≥va op√°ichagua tend√°re tet√£ pukukue. Pete√Æva umi concierto oiko Mangor√© mansi√≥n San Juan Bautista-pe, oipor√∫vo guitarra Morant ha'eva'ekue Agust√≠n Barrios mba'e. Avei omimbi oparticip√°vo 5o Festival Internacional de Guitarra "Homage a Mangor√©", o√±emotenond√©va Asunci√≥n-pe. Ome'√´ actuaciones significativas, ha'eh√°icha concierto omotenond√©va estreno mundial √∫nico obra guitarra cl√°sica-pe guar√£ ilustre compositor paraguayo Carlos Lara Bareiro, 22 ary oman√≥ha. Ohupyty pete√Æha jop√≥i concurso internacional interpretaci√≥n "Momento Musical Opus 2009" agosto upe ar√Ωpe Asunci√≥n, Paraguay-pe, ha oime juez ramo upe competencia-pe guar√£ ambue ar√Ωpe. Avei ohupyty mok√µiha jop√≥i "Musicampus 2007 Guitarra Cl√°sica Concurso" C√≥rdoba, Argentina-pe.

Oike mundo de la m√∫sica-pe orek√≥pe 11 ary, m√∫sica folkl√≥rica paraguaya rupive. Orek√≥pe 14 ary, o√±epyr≈© ijestudio viol√≠n rehegua, ha orek√≥pe 15 ary, oiporavo definitivamente guitarra cl√°sica instrumento principal ramo. O√±embokatupyry i√±epyr√ªh√°pe umi conservatorio ojeguerohor√Ωva mbo'eh√°ra paraguayo ojekua√°va ndive, omohu'√£vo honores orek√≥va 18 ary orek√≥va carrera Actuaci√≥n de Guitarra Cl√°sica ha Teor√≠a de la M√∫sica ha Solf√®ge. Ojapo op√°ichagua curso avanzado guitarrista heraku√£it√©va ndive, um√≠va apyt√©pe Pablo M√°rquez, Eduardo Fern√°ndez, Jos√© Antonio Escobar, Berta Rojas, ha V√≠ctor Villadangos, ambue apyt√©pe.

...‚ÄúRohecha h√≠na peteƒ© talento excepcional, peteƒ© mit√£rusu, adem√°s de italento, orek√≥va virtudes ha‚Äôeh√°icha humildad, seriedad, dedicaci√≥n ha peteƒ© capacidad expresiva ha‚Äô√©va peteƒ© joya ojejuh√∫va mbovyeterei int√©rprete-pe‚Äù (Berta Rojas).
"""

The simplest way to try out your finetuned model for inference is to use it in a [pipeline()](https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.pipeline). Instantiate a `pipeline` for question answering with your model, and pass your text to it:

In [2]:
from transformers import pipeline

question_answerer = pipeline("question-answering", model="mmaguero/multi-wiki-qa-gn-bert-tiny-cased")
question_answerer(question=question, context=context)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/637 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/36.5M [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Device set to use cpu


{'score': 0.018730811774730682,
 'start': 30,
 'end': 54,
 'answer': '1 jasypok√µi ary 1989 -pe'}

In [19]:
question = "Moo oiko Jos√© Carlos Cabrera?"
question_answerer(question=question, context=context)

{'score': 0.010301382280886173,
 'start': 160,
 'end': 172,
 'answer': 'Buenos Aires'}

In [20]:
question_answerer(question=question, context=context, top_k=3)

[{'score': 0.010301382280886173,
  'start': 160,
  'end': 172,
  'answer': 'Buenos Aires'},
 {'score': 0.00930881081148982,
  'start': 2465,
  'end': 2524,
  'answer': '14 ary, o√±epyr≈© ijestudio viol√≠n rehegua, ha orek√≥pe 15 ary'},
 {'score': 0.00651614461094141,
  'start': 2465,
  'end': 2471,
  'answer': '14 ary'}]

You can also manually replicate the results of the `pipeline` if you'd like:

Tokenize the text and return PyTorch tensors:

In [61]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("mmaguero/multi-wiki-qa-gn-bert-tiny-cased")
inputs = tokenizer(question, context, return_tensors="pt", truncation=True, max_length=384)

Pass your inputs to the model and return the `logits`:

In [62]:
import torch
from transformers import AutoModelForQuestionAnswering

model = AutoModelForQuestionAnswering.from_pretrained("mmaguero/multi-wiki-qa-gn-bert-tiny-cased")
with torch.no_grad():
    outputs = model(**inputs)

Get the highest probability from the model output for the start and end positions:

In [63]:
answer_start_index = outputs.start_logits.argmax()
answer_end_index = outputs.end_logits.argmax()

Decode the predicted tokens to get the answer:

In [66]:
predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
#tokenizer.decode(predict_answer_tokens)
predict_answer_tokens

tensor([2])