# Text generation with GPT-J 6B

[GPT-J](https://huggingface.co/EleutherAI/gpt-j-6B) is a causal decoder-only transformer model which can be used for text-generation.
Causal means that a causal mask is used in the decoder attention, so that each token has visibility on previous tokens only.

Language models are very powerful because a huge variety of tasks can be formulated as a text-to-text problem and thus adapted to fit the generative setup, where the model is asked to correctly predict future tokens. This idea has been widely explored in [T5 paper: Exploring the Limits of Transfer Learning with a Unified
Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf).

In 5 lines of code, we show how GPT-J can be run on the Graphcore IPU and used to complete arbitrary NLP tasks using examples and structured prompting.
For more complex tasks we show the benefit of fine-tuning and load a checkpoint from the Hugging Face Hub which achieves much better performance on the specific task.

While no finetuning is performed in this notebook, you can find out how to fine-tune the model for your own dataset in the [finetuning notebook](finetuning.ipynb).

In [1]:
%load_ext autoreload
%autoreload 2

## Environment setup

In order to run this notebook you will need to be in an environment with the Poplar SDK and PopART installed and enabled.


To ensure smooth execution of the notebook, we load and check environment variables.

In [2]:
import os

number_of_ipus = int(os.getenv("NUM_AVAILABLE_IPU", 16))
if number_of_ipus < 16:
    raise ValueError("This notebook is designed to run with at least 16 IPUs")

executable_cache_dir = os.getenv("POPLAR_EXECUTABLE_CACHE_DIR", "./exe_cache/")
os.environ["POPART_CACHE_DIR"] = executable_cache_dir
checkpoint_directory = os.getenv("CHECKPOINT_DIR")

## Running GPT-J on the IPU

This notebook demonstrate an interactive interface to GPT-J which can be used to do text generation from arbitrary prompts.
While this application implements GPT-J in Graphcore's PopXL framework, no knowledge of the framework is required to use or train this application as all parameters are controlled through configuration options.

 <!-- PopXL is a framework which provides fine grained control of execution, memory and parallelism.  -->
Several model configurations are available in the `config/` folder. They can be loaded as follows:

In [3]:
# --- Setup ---
import run_inference

config, *_ = run_inference.gptj_config_setup(
    "config/inference.yml", "release", "gpt-j-mnli"
)
print(config.dumps_yaml())

2023-02-15 17:22:03 INFO: Starting. Process id: 4054544
checkpoint:
  load: null
  optim_state: true
  save: null
  steps: 0
  to_keep: 4
execution:
  attention_serialisation: 1
  available_memory_proportion:
  - 0.4
  code_load: false
  data_parallel: 1
  device_iterations: 1
  io_tiles: 1
  loss_scaling: 1
  micro_batch_size: 16
  tensor_parallel: 16
inference:
  output_length: 5
model:
  attention:
    heads: 16
    rotary_dim: 64
    rotary_positional_embeddings_base: 10000
  dropout_prob: 0.0
  embedding:
    vocab_size: 50400
  eval: true
  hidden_size: 4096
  layers: 28
  precision: float16
  seed: 42
  sequence_length: 1024
training:
  global_batch_size: 32
  optimizer:
    beta1: 0.9
    beta2: 0.999
    gradient_clipping: 1.0
    learning_rate:
      maximum: 0.01
      warmup_proportion: 0.0
    name: adamw
    weight_decay: 0.01
  steps: 1
  stochastic_rounding: true



This configuration object can be edited and stored in a new file to suit your needs. It contains all the arguments which control the execution of the application on the IPU and might need to be tuned to ensure the best performance for the application you are looking to deploy.

Next we're going to combine this IPU configuration with pre-trained weights. The `pipeline` utility accepts directly the name of a pre-trained checkpoint from the Hugging Face Hub.
In this case we are loading the [6 billion parameter GPT-J checkpoint from EleutherAI](https://huggingface.co/EleutherAI/gpt-j-6B), this checkpoint is trained on [the Pile](https://pile.eleuther.ai/) open source dataset.

This checkpoint is a general language modelling checkpoint and has not been fine-tuned on a specific task.

We create a `IPUGPTJPipeline`, in this first step the weights are downloaded from the Hugging Face Hub and a PopXL session is created, this initial step takes a few minutes:

In [4]:
from utils import pipeline

general_model = pipeline.IPUGPTJPipeline(
    config,
    "EleutherAI/gpt-j-6B",
    sequence_length=256,
    print_live=True,
)

2023-02-15 17:22:08 INFO: Creating session
2023-02-15 17:22:08 INFO: Starting PopXL IR construction
2023-02-15 17:22:35 INFO: PopXL IR construction duration: 0.44 mins
2023-02-15 17:22:35 INFO: Starting PopXL compilation
2023-02-15 17:22:38 INFO: PopXL compilation duration: 0.05 mins
2023-02-15 17:22:38 INFO: Downloading 'EleutherAI/gpt-j-6B' pretrained weights and tokenizer


2023-02-15T17:22:36.112930Z popart:popart 4054544.4054544 W: [Ir::setIsPrepared] setIsPrepared was already called. It should only be called once.


2023-02-15 17:23:55 INFO: Starting Loading HF pretrained model to IPU
2023-02-15 17:25:13 INFO: Loading HF pretrained model to IPU duration: 1.31 mins


Attributes of the `general_model` can be explored:

- `model` contains the `GPTJForCausalLM` from the Transformers library which is used to load the weights,
- `tokenizer` contains the tokenizer loaded with the pre-trained checkpoint from the Hugging Face Hub,
- `config` has the input config,
- `session` is the PopXL session which controls the IPU executions.

In [5]:
general_model.tokenizer

PreTrainedTokenizerFast(name_or_path='EleutherAI/gpt-j-6B', vocab_size=50257, model_max_len=2048, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'eos_token': AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'unk_token': AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'pad_token': '<|extratoken_1|>'})

The pipeline class can be used to do standard text generation, here we ask it a simple questions:

In [6]:
out = general_model("What is the capital of France?")



  0%|          | 0/1 [00:00<?, ?ba/s]

2023-02-15 17:29:10 INFO: Attach to IPUs
2023-02-15 17:31:10 INFO: Start inference
Prompt: 'What is the capital of France?'

The capital of France is Paris.

What is the capital of Germany?

The capital of Germany is Berlin.

What is the capital of Italy?

The capital of Italy is Rome.

What is the capital of Spain?

The capital of Spain is Madrid.

What is the capital of the United Kingdom?

The capital of the United Kingdom is London.

What is the capital of the United States?

The capital of the United States is Washington, D.C.

What is the capital of Canada?

The capital of Canada is Ottawa

## Using prompt structure for improved results

While it does get the correct answer it includes it in a long form answer and continues generating similar questions and answers.
To refine this format, we can provide in the prompt a structure to our answer.

In [7]:
out = general_model(
    """Question: What is the capital of Country?
Answer: City
Question: What is the capital of France?
Answer:""",
)
out

  0%|          | 0/1 [00:00<?, ?ba/s]

2023-02-15 17:32:11 INFO: Attach to IPUs
2023-02-15 17:32:11 INFO: Start inference
Prompt: 'Question: What is the capital of Country?
Answer: City
Question: What is the capital of France?
Answer:' Paris
Question: What is the capital of Germany?
Answer: Berlin
Question: What is the capital of Italy?
Answer: Rome
Question: What is the capital of Spain?
Answer: Madrid
Question: What is the capital of the United States?
Answer: Washington
Question: What is the capital of the United Kingdom?
Answer: London
Question: What is the capital of Australia?
Answer: Canberra
Question: What is the capital of New Zealand?
Answer: Wellington
Question: What is the capital of Canada?
Answer: Ottawa
Question: What is the capital of Japan?

[' Paris\nQuestion: What is the capital of Germany?\nAnswer: Berlin\nQuestion: What is the capital of Italy?\nAnswer: Rome\nQuestion: What is the capital of Spain?\nAnswer: Madrid\nQuestion: What is the capital of the United States?\nAnswer: Washington\nQuestion: What is the capital of the United Kingdom?\nAnswer: London\nQuestion: What is the capital of Australia?\nAnswer: Canberra\nQuestion: What is the capital of New Zealand?\nAnswer: Wellington\nQuestion: What is the capital of Canada?\nAnswer: Ottawa\nQuestion: What is the capital of Japan?']

The format of the answer is more predictable: if we needed to extract the answer for a downstream task it would be much easier with this prompt.
Now the model still continues to generate questions and answer after it has answered.

The pipeline supports a callback which lets us specify a string on which it terminate. We are going to stop on the string `Question:`

In [8]:
out = general_model(
    f"""Question: What is the capital of Country?
Answer: city
Question: What is the capital of France?
Answer:""",
    terminate_on_string="Question:",
)
out

  0%|          | 0/1 [00:00<?, ?ba/s]

2023-02-15 17:39:25 INFO: Attach to IPUs
2023-02-15 17:39:25 INFO: Start inference
Prompt: 'Question: What is the capital of Country?
Answer: city
Question: What is the capital of France?
Answer:' Paris
Question<|endoftext|>

[' Paris\nQuestion<|endoftext|>']

Our output is now the correct answer with some standard strings that can easily be removed.

In [9]:
capitals = [answer.splitlines()[0].strip() for answer in out]
capitals

['Paris']

This model configuration supports batched generation, with a batch of 16 this model can generate answers to 16 questions at a time. Lets generate 

In [10]:
prompt_for_countries = "List countries in Europe, America, Asia and Africa: France, "
out = general_model(prompt_for_countries, print_live=True)

  0%|          | 0/1 [00:00<?, ?ba/s]

2023-02-15 17:39:31 INFO: Attach to IPUs
2023-02-15 17:39:31 INFO: Start inference
Prompt: 'List countries in Europe, America, Asia and Africa: France, '
Germany, 
Italy, 
Spain, 
United Kingdom, 
United States, 
Canada, 
Mexico, 
Argentina, 
Brazil, 
Chile, 
Colombia, 
Costa Rica, 
Ecuador, 
El Salvador, 
Guatemala, 
Honduras, 
Haiti, 
Jamaica, 
Mexico, 
Nicaragua, 
Panama, 
Paraguay, 
Peru, 
Puerto Rico, 
Uruguay

In [11]:
countries = [
    c.strip() for c in (prompt_for_countries + out[0]).split(":")[1].split(",")
]
countries

['France',
 'Germany',
 'Italy',
 'Spain',
 'United Kingdom',
 'United States',
 'Canada',
 'Mexico',
 'Argentina',
 'Brazil',
 'Chile',
 'Colombia',
 'Costa Rica',
 'Ecuador',
 'El Salvador',
 'Guatemala',
 'Honduras',
 'Haiti',
 'Jamaica',
 'Mexico',
 'Nicaragua',
 'Panama',
 'Paraguay',
 'Peru',
 'Puerto Rico',
 'Uruguay']

Using these generated countries, we compose a structured prompt for each and pass it off to the model:

In [12]:
out = general_model(
    [
        f"""Question: What is the capital of China?
Answer: Beijing
Question: What is the capital of {country}?
Answer:"""
        for country in countries
    ],
    print_live=False,
    terminate_on_string="Question:",
)
capitals = [answer.splitlines()[0].strip() for answer in out]
list(zip(countries, capitals))

  0%|          | 0/1 [00:00<?, ?ba/s]

2023-02-15 17:41:00 INFO: Attach to IPUs
2023-02-15 17:41:00 INFO: Start inference


[('France', 'Paris'),
 ('Germany', 'Berlin'),
 ('Italy', 'Rome'),
 ('Spain', 'Madrid'),
 ('United Kingdom', 'London'),
 ('United States', 'Washington'),
 ('Canada', 'Ottawa'),
 ('Mexico', 'Mexico City'),
 ('Argentina', 'Buenos Aires'),
 ('Brazil', 'Brasilia'),
 ('Chile', 'Santiago'),
 ('Colombia', 'Bogota'),
 ('Costa Rica', 'San Jose'),
 ('Ecuador', 'Quito'),
 ('El Salvador', 'San Salvador'),
 ('Guatemala', 'Guatemala City'),
 ('Honduras', 'Tegucigalpa'),
 ('Haiti', 'Port-au-Prince'),
 ('Jamaica', 'Kingston'),
 ('Mexico', 'Mexico City'),
 ('Nicaragua', 'Managua'),
 ('Panama', 'Panama City'),
 ('Paraguay', 'Asunción'),
 ('Peru', 'Lima'),
 ('Puerto Rico', 'San Juan'),
 ('Uruguay', 'Montevideo')]

## Determining entailment

One common language task is to determine if statements agree, disagree or a neutral relative to each other.
This task is commonly referred to as determining entailment and there are datasets like the [MNLI task of the GLUE dataset](https://huggingface.co/datasets/glue).

The Mnli dataset consists of pairs of sentences, a *premise* and a *hypothesis*.
The task is to predict the relation between the premise and the hypothesis, which can be:
- `entailment`: hypothesis follows from the premise,
- `contradiction`: hypothesis contradicts the premise,
- `neutral`: hypothesis and premise are unrelated.

We can use our generative model to tackle this task by creating prompts which have the following format:


In [13]:
def entailment_prompt(hypothesis, premise, target=""):
    sep = ".\n" if target else ""
    return f"hypothesis: {hypothesis} premise: {premise} target: {target}{sep}"


entailment_prompt("The person is leaving.", "Hello, welcome to the country.")

'hypothesis: The person is leaving. premise: Hello, welcome to the country. target: '

We are attempting to get the model to correctly recognise that the statements:

 "The person is leaving." and "Hello, welcome." disagree with each other.

This task is more complicated than the previous one so we provide more instructions and two examples to get the model to perform the task:

In [14]:
entailment_instructions = (
    "Tell me if the statements agree, disagree, neutral.\n"
    + "Example - "
    + entailment_prompt("Goodbye.", "Hey there.", "disagree")
    + "Example - "
    + entailment_prompt("Hello.", "Hey there.", "agree")
)
general_model(
    entailment_instructions
    + entailment_prompt("The person is leaving.", "Hello, welcome."),
    print_live=True,
    output_length=40,
)

  0%|          | 0/1 [00:00<?, ?ba/s]

2023-02-15 17:41:17 INFO: Attach to IPUs
2023-02-15 17:41:17 INFO: Start inference
Prompt: 'Tell me if the statements agree, disagree, neutral.
Example - hypothesis: Goodbye. premise: Hey there. target: disagree.
Example - hypothesis: Hello. premise: Hey there. target: agree.
hypothesis: The person is leaving. premise: Hello, welcome. target: ' neutral.

A:

I would say that the first two are neutral, the third is a disagreement, and the fourth is a disagreement.
The first two are neutral because they are

[' neutral.\n\nA:\n\nI would say that the first two are neutral, the third is a disagreement, and the fourth is a disagreement.\nThe first two are neutral because they are']

While the model does pick one of the 3 options, it does not select the right one. If we let the answer continue running we see that the response feels relevant to the task but is quite imprecise.

To do more thorough testing we can load the MNLI task from the GLUE dataset using the 🤗 Datasets library.

In [15]:
import datasets
from data import mnli_data

dataset = datasets.load_dataset("glue", "mnli", split="validation_mismatched")

dataset[0]



{'premise': 'Your contribution helped make it possible for us to provide our students with a quality education.',
 'hypothesis': "Your contributions were of no help with our students' education.",
 'label': 2,
 'idx': 0}

In [16]:
out = general_model(
    [
        entailment_instructions + entailment_prompt(hypothesis, premise)
        for hypothesis, premise in zip(
            dataset[:16]["hypothesis"], dataset[:16]["premise"]
        )
    ],
    print_live=True,
    output_length=10,
)
# Strip out everything in the output after new lines
mnli_id_2_class = ["entailment", "neutral", "contradiction"]
[
    (mnli_id_2_class[label], answer.splitlines()[0].strip())
    for label, answer in zip(dataset[:16]["label"], out)
]

  0%|          | 0/1 [00:00<?, ?ba/s]

2023-02-15 17:41:43 INFO: Attach to IPUs
2023-02-15 17:41:43 INFO: Start inference
Prompt: 'Tell me if the statements agree, disagree, neutral.
Example - hypothesis: Goodbye. premise: Hey there. target: disagree.
Example - hypothesis: Hello. premise: Hey there. target: agree.
hypothesis: Your contributions were of no help with our students' education. premise: Your contribution helped make it possible for us to provide our students with a quality education. target: ' neutral.

A:

I would

[('contradiction', 'neutral.'),
 ('contradiction', ''),
 ('entailment', 'We serve a classic Tuscan meal that includes a'),
 ('contradiction', 'neutral.'),
 ('entailment', 'neutral.'),
 ('entailment', 'neutral.'),
 ('contradiction', 'neutral.'),
 ('contradiction', ''),
 ('neutral', ''),
 ('contradiction', 'neutral.'),
 ('neutral', ''),
 ('contradiction', 'neutral.'),
 ('contradiction', 'neutral.'),
 ('contradiction', ''),
 ('contradiction', 'agree.'),
 ('neutral', 'neutral.')]

As we can see our prompts are not sufficient to use that model to complete this task.

In [17]:
general_model.detach()

## Using a fine-tuned model

In order to complete the entailment task we are going to use a fine-tuned model on the MNLI task of the GLUE dataset.

The checkpoint we will be using was fine-tuned on the Graphcore IPU and is hosted on the 🤗 hub at [Graphcore/gptj-mnli](https://huggingface.co/Graphcore/gptj-mnli). As we did before we can load it in with a single command:

In [18]:
mnli_model = pipeline.IPUGPTJPipeline(
    config,
    "Graphcore/gptj-mnli",
    sequence_length=256,
    print_live=True,
)

2023-02-15 17:41:53 INFO: Creating session
2023-02-15 17:41:53 INFO: Starting PopXL IR construction
2023-02-15 17:42:14 INFO: PopXL IR construction duration: 0.34 mins
2023-02-15 17:42:14 INFO: Starting PopXL compilation
2023-02-15 17:42:16 INFO: PopXL compilation duration: 0.04 mins
2023-02-15 17:42:16 INFO: Downloading 'Graphcore/gptj-mnli' pretrained weights and tokenizer


2023-02-15T17:42:14.717167Z popart:popart 4054544.4054544 W: [Ir::setIsPrepared] setIsPrepared was already called. It should only be called once.


2023-02-15 17:43:07 INFO: Starting Loading HF pretrained model to IPU
2023-02-15 17:44:19 INFO: Loading HF pretrained model to IPU duration: 1.19 mins


Just like the previous checkpoint, the model can handle arbitrary text generation questions:

In [19]:
mnli_model("Hey there")

  0%|          | 0/1 [00:00<?, ?ba/s]

2023-02-15 17:44:25 INFO: Attach to IPUs
2023-02-15 17:46:26 INFO: Start inference
Prompt: 'Hey there', I'm a new member to the forum and I'm looking for some advice. I'm a college student and I'm looking to buy a new car. I'm not sure what kind of car I want to buy, but I'm leaning towards a Honda Accord. I'm looking for a car that's reliable, but I also want something that's fun to drive. I'm not sure if I want a V6 or a V8, but I'm leaning towards the V6. I'm also looking for a car that's fairly inexpensive. I'm not sure what kind of car I want to buy, but I'm leaning

[", I'm a new member to the forum and I'm looking for some advice. I'm a college student and I'm looking to buy a new car. I'm not sure what kind of car I want to buy, but I'm leaning towards a Honda Accord. I'm looking for a car that's reliable, but I also want something that's fun to drive. I'm not sure if I want a V6 or a V8, but I'm leaning towards the V6. I'm also looking for a car that's fairly inexpensive. I'm not sure what kind of car I want to buy, but I'm leaning"]

However on the entailment task the performance should be much better, even without instructions:

In [20]:
mnli_model(entailment_prompt("The person is leaving.", "Hello, welcome."))

  0%|          | 0/1 [00:00<?, ?ba/s]

2023-02-15 17:52:55 INFO: Attach to IPUs
2023-02-15 17:52:55 INFO: Start inference
Prompt: 'hypothesis: The person is leaving. premise: Hello, welcome. target: ' contradiction<|endoftext|>

[' contradiction<|endoftext|>']

It got it right! Those sentences are contradictory. Now let's try our samples from the GLUE dataset:

In [63]:
out = mnli_model(
    [
        entailment_prompt(hypothesis, premise)
        for hypothesis, premise in zip(
            dataset[:16]["hypothesis"], dataset[:16]["premise"]
        )
    ],
    print_live=True,
    output_length=10,
)
# Strip out everything in the output after new lines
mnli_id_2_class = ["entailment", "neutral", "contradiction", "unknown"]
[
    (
        mnli_id_2_class[label],
        answer.splitlines()[0].strip().replace("<|endoftext|>", ""),
    )
    for label, answer in zip(dataset[:16]["label"], out)
]

  0%|          | 0/1 [00:00<?, ?ba/s]

2023-02-15 18:54:18 INFO: Attach to IPUs
2023-02-15 18:54:18 INFO: Start inference
Prompt: 'mnli hypothesis: Your contributions were of no help with our students' education. premise: Your contribution helped make it possible for us to provide our students with a quality education. target: ' contradiction<|endoftext|><|endoftext|>

[('contradiction', 'contradiction'),
 ('contradiction', 'neutral'),
 ('entailment', 'entailment'),
 ('contradiction', 'contradiction'),
 ('entailment', 'entailment'),
 ('entailment', 'neutral'),
 ('contradiction', 'contradiction'),
 ('contradiction', 'contradiction'),
 ('neutral', 'entailment'),
 ('contradiction', 'neutral'),
 ('neutral', 'neutral'),
 ('contradiction', 'neutral'),
 ('contradiction', 'neutral'),
 ('contradiction', 'contradiction'),
 ('contradiction', 'contradiction'),
 ('neutral', 'neutral')]

It gets almost all of them right, it clearly has some knowledge of the task we need it to complete.

To take that one step further we can create a pipeline specific to this task which handles the prompt pre-processing and the post-processing of the generated text.

In that way the generative model appears as a much simpler classifier. To avoid having to create a new IPU session with the same model, we can use the `from_gptj_pipeline` factory method, this gives us a ready to use pipeline for MNLI:

In [22]:
mnli_pipeline = pipeline.GPTJEntailmentPipeline.from_gptj_pipeline(mnli_model)
mnli_pipeline("Hey there.", "Goodbye.")

  0%|          | 0/1 [00:00<?, ?ex/s]

1


  0%|          | 0/1 [00:00<?, ?ba/s]

2023-02-15 17:53:09 INFO: Attach to IPUs
2023-02-15 17:53:09 INFO: Start inference
Prompt: 'mnli hypothesis: Goodbye. premise: Hey there. target:' contradiction<|endoftext|>

['contradiction']

In [69]:
sample_size = 200
out = mnli_pipeline(
    premise=dataset[:sample_size]["premise"],
    hypothesis=dataset[:sample_size]["hypothesis"],
)

import pandas as pd

results = pd.DataFrame(
    [
        (mnli_id_2_class[label] == answer, mnli_id_2_class[label], answer)
        for label, answer in zip(dataset[:sample_size]["label"], out)
    ],
    columns=["correct", "label", "prediction"],
)
results.head(16)

  0%|          | 0/200 [00:00<?, ?ex/s]

200


  0%|          | 0/1 [00:00<?, ?ba/s]

2023-02-15 19:02:06 INFO: Attach to IPUs
2023-02-15 19:02:06 INFO: Start inference
Prompt: 'mnli hypothesis: Your contributions were of no help with our students' education. premise: Your contribution helped make it possible for us to provide our students with a quality education. target:' contradiction<|endoftext|> entailment<|endoftext|> entailment<|endoftext|> contradiction<|endoftext|> entailment<|endoftext|> entailment<|endoftext|> entailment<|endoftext|> contradiction<|endoftext|> entailment<|endoftext|> entailment<|endoftext|> entailment<|endoftext|><|endoftext|>

Unnamed: 0,correct,label,prediction
0,True,contradiction,contradiction
1,True,contradiction,contradiction
2,True,entailment,entailment
3,True,contradiction,contradiction
4,True,entailment,entailment
5,True,entailment,entailment
6,True,contradiction,contradiction
7,True,contradiction,contradiction
8,False,neutral,entailment
9,True,contradiction,contradiction


In [70]:
print(f"Got {results['correct'].sum()}/{len(results)} correct")

Got 160/200 correct


Unlike the previous prompts we hand crafted in this notebook, the entailment pipeline uses the same prompt processing that was used during training of the MNLI model.
By using the same prompt format now, as during training we maximise the performance and throughput of the model by limiting the size of the prompt to it's strict minimum.


In [None]:
mnli_pipeline.detach()

: 

: 

: 

## Conclusion

