# SQuAD and MNLI on IPUs using BART-LARGE - Inference
This notebook provides an implementation of two natural language understanding (NLU) tasks using small, efficient models: [Facebook BART-LARGE](https://huggingface.co/facebook/bart-large-mnli) for sequence classification and question answering. The notebook demonstrates how these models can achieve good performance on standard benchmarks while being relatively lightweight and easy to use.

The two NLU tasks covered in this notebook are:
- Multi-Genre Natural Language Inference (MNLI) - a sentence-pair classification task
Hardware requirements: The models show each BART-Large model running on two IPUs. 

[![Run on Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/internetoftim/Gradient-HuggingFace?machine=Free-IPU-POD4&container=graphcore/pytorch-jupyter%3A3.2.1-ubuntu-20.04-20230531&file=natural-language-processing%2Fother-use-cases%2Fbart-large-mnli-notebook.ipynb)  [![Join our Slack Community](https://img.shields.io/badge/Slack-Join%20Graphcore's%20Community-blue?style=flat-square&logo=slack)](https://www.graphcore.ai/join-community)



##### Optimum Graphcore
The notebook also demonstrates [Optimum Graphcore](https://github.com/huggingface/optimum-graphcore). Optimum Graphcore is the interface between the Hugging Face Transformers library and [Graphcore IPUs](https://www.graphcore.ai/products/ipu). This notebook demonstrates a more explicit way of using Huggingface models with the IPU. This method is particularly useful when the task in question is not supported by the Huggingface pipelines API.

The easiest way to run a Huggingface inference model would be to instantiate the pipeline as follows:

```
oracle = pipeline(model="Palak/microsoft_deberta-base_squad")
oracle(question="Where do I live?", context="My name is Wolfgang and I live in Berlin")
```

However in some cases such as MNLI, there is no off-the-shelf pipeline ready to use. In this case, you could simply:
- Instantiate the model with the correct execution mode
- Use the optimum-specific call `to_pipelined` to return the model with changes and annotations for running on the IPU
- Set the model to run in `eval` mode and use the `parallelize` method on the new model to parallelize it across IPUs
- Prepare it for inference using `poptorch.inferenceModel()`

```
model = DebertaForQuestionAnswering.from_pretrained("Palak/microsoft_deberta-base_squad")

ipu_config = IPUConfig(ipus_per_replica=2, matmul_proportion=0.2, executable_cache_dir="./exe_cache")
pipelined_model = to_pipelined(model, ipu_config).eval().parallelize()
pipelined_model = poptorch.inferenceModel(pipelined_model, options=ipu_config.to_options(for_inference=True))
```

This method is demoed in this notebook, as Huggingface do not natively support the MNLI inference task.

## Setup
Install the optimum library

In [1]:
%pip install "optimum-graphcore==0.6.1"

Collecting optimum-graphcore==0.6.1
  Downloading optimum_graphcore-0.6.1-py3-none-any.whl (212 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m212.9/212.9 kB[0m [31m10.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting transformers==4.25.1 (from optimum-graphcore==0.6.1)
  Downloading transformers-4.25.1-py3-none-any.whl (5.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.8/5.8 MB[0m [31m75.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting optimum==1.6.1 (from optimum-graphcore==0.6.1)
  Downloading optimum-1.6.1-py3-none-any.whl (222 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m222.6/222.6 kB[0m [31m98.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting diffusers[torch]==0.12.1 (from optimum-graphcore==0.6.1)
  Downloading diffusers-0.12.1-py3-none-any.whl (604 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m604.0/604.0 kB[0m [31m113.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting data

We read some configuration from the environment to support environments like Paperspace Gradient.

In [2]:
import os

executable_cache_dir = os.getenv("POPLAR_EXECUTABLE_CACHE_DIR", "./exe_cache")

Imports

In [3]:
import os
import torch
from datasets import load_dataset, Dataset

import poptorch
from optimum.graphcore import IPUConfig
from optimum.graphcore.modeling_utils import to_pipelined

from transformers import BartForConditionalGeneration, BartTokenizerFast, BartForSequenceClassification
from transformers import DebertaForSequenceClassification, DebertaTokenizerFast
from transformers import DebertaForQuestionAnswering, AutoTokenizer

## Multi-Genre Natural Language Inference (MNLI)

MNLI is a sentence-pair classification task, where the goal is to predict whether a given hypothesis is true (entailment) or false (contradiction) given a premise. The task has been proposed as a benchmark for evaluating natural language understanding models. 

In this notebook, we use the Facebook/BART-large model to classify pairs of sentences on the MNLI task. We first load the model and the tokenizer, then prepare an example input. Finally, we execute the model on an IPU device using PopTorch and obtain the predicted probabilities for the entailment classes.


First, load the model and tokeniser from the Huggingface Model Hub

In [8]:
# tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-base-mnli")
# model = DebertaForSequenceClassification.from_pretrained("microsoft/deberta-base-mnli")
# model.half()


from transformers import BartForSequenceClassification

model_checkpoint = "facebook/bart-large-mnli"
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-mnli")
model = BartForSequenceClassification.from_pretrained("facebook/bart-large-mnli")

Create some example inputs, and encoder those using the tokeniser

In [5]:
premise = "A man inspects the uniform of a figure in some East Asian country."
hypothesis = "The man is in an East Asian country."

inputs = tokenizer.encode(
    premise, hypothesis, return_tensors="pt", truncation_strategy="only_first"
)



Configure the instantiated model to run on IPUs

In [9]:
# ipu_config = IPUConfig(ipus_per_replica=2, matmul_proportion=0.6, executable_cache_dir=executable_cache_dir)
ipu_config = IPUConfig(layers_per_ipu=[0,12,6,6], ipus_per_replica=4, matmul_proportion=0.6, executable_cache_dir=executable_cache_dir)
# ipu_config = IPUConfig(layers_per_ipu=[8,16], ipus_per_replica=2,replication_factor=2, matmul_proportion=0.6, executable_cache_dir=executable_cache_dir)

pipelined_model = to_pipelined(model, ipu_config).eval().parallelize()
pipelined_model = poptorch.inferenceModel(pipelined_model, options=ipu_config.to_options(for_inference=True))

Run the MNLI model and print the probability of entailment. We calculate this by throwing away neutral (index 1) and running softmax over the remaining logits.

In [None]:
logits = pipelined_model(inputs)[0]
entail_contradiction_logits = logits[:, [0, 2]]
prob_label_is_true = entail_contradiction_logits.softmax(dim=1)[:, 1]
print(prob_label_is_true)

Graph compilation:  55%|█████▌    | 55/100 [01:03<00:33]