First of all, make sure your environment has installed the latest version of [🤗 Optimum Graphcore](https://github.com/huggingface/optimum-graphcore) as well as other dependencies:


In order to improve usability and support for future users, Graphcore would like to collect information about the
applications and code being run in this notebook. The following information will be anonymised before being sent to Graphcore:

- User progression through the notebook
- Notebook details: number of cells, code being run and the output of the cells
- Environment details

You can disable logging at any time by running `%unload_ext gc_logger` from any cell.

In [1]:
# %pip install "optimum-graphcore>=0.6.0, <0.7.0" rouge-score nltk
%pip install git+https://github.com/huggingface/optimum-graphcore@v0.6.1-release rouge-score nltk 
%pip install examples-utils[common]@git+https://github.com/graphcore/examples-utils@latest_stable
from examples_utils import notebook_logging

# %load_ext gc_logger

Looking in indexes: https://mwizak%40graphcore.ai:****@artifactory.sourcevertex.net:443/api/pypi/pypi-virtual/simple, https://pypi.python.org/simple/
Collecting git+https://github.com/huggingface/optimum-graphcore@v0.6.1-release
  Cloning https://github.com/huggingface/optimum-graphcore (to revision v0.6.1-release) to /tmp/pip-req-build-ctscjzc6
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/optimum-graphcore /tmp/pip-req-build-ctscjzc6
  Running command git checkout -b v0.6.1-release --track origin/v0.6.1-release
  Switched to a new branch 'v0.6.1-release'
  Branch 'v0.6.1-release' set up to track remote branch 'v0.6.1-release' from 'origin'.
  Resolved https://github.com/huggingface/optimum-graphcore to commit 614e0510de01f1f66dbd73ca43b8f95905f0035d
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting rouge-score
  Us

cp: failed to access '/root/.ipython/extensions': Permission denied


To be able to share your model with the community and generate results like the one shown in the picture below via the inference API, there are a few more steps to follow.

First you have to store your authentication token from the Hugging Face website (sign up [here](https://huggingface.co/join) if you haven't already!) then execute the following cell and input your username and password:

In [2]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Then you need to install Git-LFS:

In [4]:
!apt install git-lfs

[1;31mE: [0mCould not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied)[0m
[1;31mE: [0mUnable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root?[0m


Let's print out the versions of Transformers and Optimum Graphcore:

In [5]:
import transformers
import optimum.graphcore

print(transformers.__version__)
print(optimum.graphcore.__version__)

4.25.1
0.6.1


Values for machine size and cache directories can be configured through environment variables or directly in the notebook:

In [6]:
import os

pod_type = os.getenv("GRAPHCORE_POD_TYPE", "pod4")
executable_cache_dir = (
    os.getenv("POPLAR_EXECUTABLE_CACHE_DIR", "/tmp/exe_cache/") + "/summarization"
)

# Fine-tuning a model on a summarization task

In this notebook, we will see how to fine-tune one of the [🤗 Transformers](https://github.com/huggingface/transformers) model for a summarization task. We will use the [XSum dataset](https://arxiv.org/pdf/1808.08745.pdf) (for extreme summarization) which contains BBC articles accompanied with single-sentence summaries.

![Widget inference on a summarization task](images/summarization.png)

We will see how to easily load the dataset for this task using 🤗 Datasets and how to fine-tune a model on it using the `IPUSeq2SeqTrainer` API.

In [7]:
model_checkpoint = "t5-small"

This notebook is built to run  with any model checkpoint from the [Model Hub](https://huggingface.co/models) as long as that model has a sequence-to-sequence version in the Transformers library and is supported by Optimum Graphcore. Here we picked the [`t5-small`](https://huggingface.co/t5-small) checkpoint. 

## Loading the dataset

We will use the [🤗 Datasets](https://github.com/huggingface/datasets) library to download the data and get the metric we need to use for evaluation (to compare our model to the benchmark). This can be easily done with the functions `load_dataset` and `load_metric`.  

In [8]:
from datasets import load_dataset, load_metric

raw_datasets = load_dataset("xsum")
metric = load_metric("rouge")

Found cached dataset xsum (/nethome/mwizak/.cache/huggingface/datasets/xsum/default/1.2.0/082863bf4754ee058a5b6f6525d0cb2b18eadb62c7b370b095d1364050a52b71)


  0%|          | 0/3 [00:00<?, ?it/s]

  metric = load_metric("rouge")


Downloading builder script:   0%|          | 0.00/2.17k [00:00<?, ?B/s]

The `dataset` object itself is [`DatasetDict`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasetdict), which contains one key for the training, validation and test set:

In [9]:
raw_datasets

DatasetDict({
    train: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 204045
    })
    validation: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 11332
    })
    test: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 11334
    })
})

To access an actual element, you need to select a split first, then give an index:

In [10]:
raw_datasets["train"][0]

 'summary': 'Clean-up operations are continuing across the Scottish Borders and Dumfries and Galloway after flooding caused by Storm Frank.',
 'id': '35232142'}

In [15]:
import nltk

nltk.sent_tokenize(raw_datasets["train"][0]["document"].strip())

['The full cost of damage in Newton Stewart, one of the areas worst affected, is still being assessed.',
 'Repair work is ongoing in Hawick and many roads in Peeblesshire remain badly affected by standing water.',
 'Trains on the west coast mainline face disruption due to damage at the Lamington Viaduct.',
 'Many businesses and householders were affected by flooding in Newton Stewart after the River Cree overflowed into the town.',
 'First Minister Nicola Sturgeon visited the area to inspect the damage.',
 'The waters breached a retaining wall, flooding many commercial properties on Victoria Street - the main shopping thoroughfare.',
 'Jeanette Tate, who owns the Cinnamon Cafe which was badly affected, said she could not fault the multi-agency response once the flood hit.',
 'However, she said more preventative work could have been carried out to ensure the retaining wall did not fail.',
 '"It is difficult but I do think there is so much publicity for Dumfries and the Nith - and I tota

To get a sense of what the data looks like, the following function will show some examples picked randomly in the dataset.

In [11]:
import datasets
import random
import pandas as pd
from IPython.display import display, HTML


def show_random_elements(dataset, num_examples=5):
    assert num_examples <= len(
        dataset
    ), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset) - 1)
        while pick in picks:
            pick = random.randint(0, len(dataset) - 1)
        picks.append(pick)

    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [12]:
show_random_elements(raw_datasets["train"])

Unnamed: 0,document,summary,id
0,"That's 70 every day. The vast majority were men.\nThose figures do not make Japan's the highest suicide rate in the world in a developed nation.\nThat dubious title belongs to South Korea. But it is still far, far higher than virtually all other wealthy countries.\nIt is three times the suicide rate in the United Kingdom.\nThe grim self-immolation of a 71-year-old man aboard a Japanese bullet train on Tuesday has once again rammed the issue back in to the headlines here.\nWhat drove a quiet, elderly man, to douse himself with fuel and set fire to it in a packed carriage on a speeding train?\nAs he tipped the liquid over himself he is reported to have shooed away other passengers, telling them it was dangerous.\nSome said there were tears in his eyes as he did so.\nNow, as they start to dig in to his background, members of the Japanese media are turning up the tell-tale signs of a man on the edge. He lived alone and had no job. He spent his days collecting aluminium cans to sell for recycling.\nNeighbours told reporters they had heard him smash a window after locking himself out of his dilapidated apartment.\nOthers said they rarely saw him outside, but could often hear the sound of a television playing. Poor, old and alone. It is an all too familiar tale.\n""Isolation is the number one precursor for depression and suicide,"" says Wataru Nishida, a psychologist at Tokyo's Temple University.\n""Now it's more and more common to read stories about old people dying alone in their apartments,"" he says. ""They are being neglected. Kids used to take care of their parents in old age in Japan, but not any more.""\nPeople often cite Japan's long tradition of ""honourable suicide"" as a reason for the high rate here.\nThey point to the Samurai practice of committing ""seppuku"" or to the young ""kamikaze"" pilots of 1945, to show there are distinct cultural reasons why Japanese are more likely to take their own lives.\nTo an extent Mr Nishida agrees.\n""Japan has no history of Christianity,"" he says ""so here suicide is not a sin. In fact, some look at it as a way of taking responsibility.""\nKen Joseph from the Japan Helpline agrees. He says their experience over the last 40 years shows that elderly people who are in financial trouble may see suicide as a way out of their problems.\n""The insurance system in Japan is very lax when it comes to paying out for suicide,"" he says.\n""So when all else fails - some people feel - you can just kill yourself and the insurance will pay out.\n""There is sometimes an intolerable pressure on the elderly that the most loving thing they can do is take their lives and thereby provide for their family.""\nBecause of this, some experts think Japan's suicide rate is actually much higher than reported.\nA lot of lone deaths of elderly people are never fully investigated by the police.\nAccording to Ken Joseph, the almost universal practice of cremating bodies here also means that any evidence is quickly destroyed.\nBut it is not only elderly men in financial trouble who are taking their own lives.\nThe fastest growing suicide demographic is young men. It is now the single biggest killer of men in Japan aged 20-44.\nAnd the evidence suggests these young people are killing themselves because they have lost hope and are incapable of seeking help.\nThe numbers first began to rise after the Asian financial crisis in 1998. They climbed again after the 2008 worldwide financial crisis.\nExperts think those rises are directly linked to the increase in ""precarious employment"", the practice of employing young people on short-term contracts.\nJapan was once known as the land of lifetime employment.\nBut while many older people still enjoy job security and generous benefits, nearly 40% of young people in Japan are unable to find stable jobs.\nFinancial anxiety and insecurity are compounded by Japan's culture of not complaining.\n""There are not many ways to express anger or frustration in Japan,"" says Mr Nishida.\n""This is a rule-oriented society. Young people are moulded to fit in to a very small box. They have no way to express their true feelings.\n""If they feel under pressure from their boss and get depressed, some feel the only way out is to die.""\nTechnology may be making things worse, increasing young people's isolation. Japan is famous for a condition called hikikomori, a type of acute social withdrawal.\nWhat is hikikomori?\nMore about hikkomori\nThe young person affected may completely shut himself - it is most often a male - off from the outside world, withdrawing in to a room and not coming out for months or even years.\nBut that is only the most extreme form of what is now a widespread loss of direct face-to-face socialising.\nA recent survey of young Japanese people's attitudes to relationships and sex turned up some extraordinary results. Published in January by the Japan Family Planning Association, it found that 20% of men aged 25-29 had little or no interest in having a sexual relationship.\nWataru Nishida points to the internet and the pervasive influence of online pornography.\n""Young people in Japan have a lot of knowledge,"" Mr Nishida says, ""But they have no life experience. They have no idea how to express their emotions.\n""They have forgotten what it's like to touch a person. When they think about sex they have high anxiety and no idea how to deal with it.""\nAnd when young people do find themselves isolated and depressed, they have few places to turn to.\nMental illness is still very much a taboo here. There is little popular understanding of depression. Those suffering its symptoms are often too scared to talk about it.\nJapan's mental healthcare system is also a mess.\nThere is an acute shortage of psychiatrists. There is also no tradition of psychiatrists working together with clinical psychologists.\nPeople suffering from mental illness may be prescribed powerful psychotropic medicines but unlike in the West, this will often not be accompanied by a recommendation that the patient seek counselling.\nThe counselling industry itself is a free-for-all.\nUnlike in America or Europe, there is no government-mandated system of training and qualifying clinical psychologists.\nAnybody can set him or herself up as a ""counsellor"" and it's very hard for someone seeking help to know whether they actually know what they are doing.\nIt is not a happy picture, and while the suicide rate has actually begun to decline in the last three years, it is still woefully high.\nWataru Nishida says Japan needs to start talking about mental illness much more, and not just as something scary and strange that afflicts a few.\n""When you see a television discussion on mental illness in Japan they still talk as if 'depression equals suicide',"" he says. ""That needs to change.""","Last year in Japan, more than 25,000 people took their own lives.",33362387
1,"For them, Super Tuesday could become Black Tuesday. Friday must have been gloomy enough, when Chris Christie, supposedly a card-carrying member of the establishment, kissed Donald Trump's hands and gave this political outsider his endorsement.\nChristie's blessing came as a bolt from the blue, and taught us once more to expect the unexpected. But shouldn't the establishment - and us in the media, for that matter - have seen the billionaire coming? After all, for years the Republican standard bearers have been vulnerable to a challenge from an anti-establishment candidate.\nBefore going on, we should say what we mean by the Republican Party establishment, a term regularly bandied around but rarely explained. Fifty years ago, it was easier to identify.\nIt was an eastern establishment dominated by Wall Street bankers and corporate executives, who were strongly pro-business, ideologically moderate and politically pragmatic.\nNelson Rockefeller, the scion of the banking dynasty and Governor of New York - who lived, like Donald Trump, in great splendour on Fifth Avenue - was their figurehead.\nThese days, however, the Republican establishment is harder to define and more diffuse, which also explains why it is easier to topple.\nMore on Trump and the Republican race for the White House:\nAnthony Zurcher: Day one of the Republican civil war\nThree things Donald Trump always says\nWhat Mexicans think of Trump and his proposed wall\nCommonly it is broadly taken to mean the Republican National Committee, senior office-holders (like Chris Christie), present and past, conservative lobbyists, like the US Chamber of Commerce, big-money donors and opinion-formers, who write for publications like the Weekly Standard, the National Review and op-ed pages of the Wall Street Journal. But that definition is open to debate. Its disparate membership explains its inability to exert control.\nThe most obvious reason for the decline of the Republican establishment has been the rise of anti-establishment adversaries. The Tea Party, an insurgent grassroots movement that emerged after Barack Obama's inauguration, has posed the most serious threat.\nIts hatred of the president is matched almost by its loathing for establishment Republicans in Washington, like the Senate Majority leader Mitch McConnell, who activists complain could have done more to thwart the White House.\nTea Party primary challengers have ousted senior establishment fixtures, like Senator Richard Lugar, who represented Indiana for 36 years.\nThe ""Hell No"" Caucus on Capitol Hill, a rump of 50 or so Tea Party-backed Republican hardliners in the House of Representatives, was strong enough to push the former House Speaker John Boehner to the point of resignation.\nAs for opinion formers, most of the loudest and dominant voices in the modern-day conservative movement, like the talk show hosts Rush Limbaugh and Glenn Beck and the commentator Ann Coulter, are vehement critics of the establishment.\nThe Fox News channel, even though it has often given a platform for anti-establishment voices, doesn't fall into that same category. But it has become a rival power centre, outside the control of the GOP high command.\nGoing into the 2016 campaign, there were big clues that establishment candidates would be vulnerable. Eric Cantor, the House Republican majority leader, was ousted, unexpectedly, ahead of the 2014 congressional mid-terms. Boehner was pressured to resign as House Speaker.\nHowever, most of us made the mistake of interpreting the results of the congressional mid-term elections as a major setback for insurgents, because they failed to make more breakthroughs.\nTheir attempt, for example, to oust the Republican Senator Thad Cochran in Mississippi, then a six-term incumbent, was unsuccessful. In Kentucky, Mitch McConnell also crushed a Tea Party challenge in the Republican primary.\nAccording to polls, Tea Party favourites, like Sarah Palin and Michelle Bachmann, also lost their lustre.\nEven though the Tea Party was waning - in October last year, a Gallup poll suggested its support had dwindled to just 17% - the anger and rage that gave rise to it had not gone away.\nConservative insurgents just needed a better candidate and more effective mouthpiece.\nThe most obvious figure was Ted Cruz, a long-time darling of the Tea Party. But Donald Trump has proved more adept at giving voice to the politics of frustration and rage, even though he is not a Tea Party candidate per se.\nLong before announcing his presidential bid, the billionaire had already burnished his reputation among Tea Party devotees by becoming the most prominent ""birther"" - claiming, falsely, that Barack Obama is not a natural-born citizen of the United States. His outspoken attacks on Mexicans and Muslims, combined with his contempt for political correctness, are music to insurgents' ears.\nAnother analytical failure was to assume that the Republican establishment could do in 2016 what it has done successfully in the past seven presidential elections: to see its anointed favourite become the nominee.\nGeorge Herbert Walker Bush, Bob Dole, George W Bush, John McCain and Mitt Romney. All were Republican establishment favourites. What's perhaps most remarkable about that run of success for the party's high command is that it continued so long.\nWe should have paid more attention to the difficulty Mitt Romney had securing the nomination in 2012 and also the extent to which he was assisted by the absence of a strong establishment rival.\nA central problem for the GOP high command this year, of course, has been that Marco Rubio, John Kasich, Jeb Bush and Chris Christie have split the vote.\nNot only that, we should have been more mindful of Rick Santorum's surprise showing four years ago. The right-wing former Pennsylvania Senator won 11 states and four million votes, even though he was viewed at the outset of the race as a woefully weak candidate.\nIt suggested that the Republican establishment would face a more serious problem in 2016 if a more compelling right-winger emerged.\nBesides, one of the reasons why anti-establishment fervour is so strong this time round is because the grass roots is so fed up with being saddled with establishment moderates, like Romney.\nHad we reached further back into Republican Party history we would have seen that hostile takeovers have succeeded in the past.\nIn 1980, Reagan ran as an anti-establishment candidate, beating the blue-blood Republican George HW Bush, a scion of the establishment.\nThen there was Barry Goldwater's success in 1964, when he scored that highly symbolic victory over Nelson Rockefeller, the great pillar of the establishment.\nThe victory of an Arizonian right-wing firebrand over a New York moderate personified the shift in the Republican Party's centre of gravity during the civil rights era from the north-east to the south and south-west.\nIt changed the character of the party, setting it on its present course.\nRevulsion right now of the permanent political class and party elites seems to be a global phenomenon, but in America it is particularly pronounced, on the left as well as the right.\nBut an anti-establishment figure like Donald Trump would not have become so strong had not the party establishment become so weak. The GOP, the Grand Old Party, has been ripe for a takeover for years.","On Super Tuesday Donald Trump's hostile takeover of the Republican Party should come even closer, and like stiff-collared executives in some wood-panelled boardroom trying belatedly to fight off a corporate raid, the GOP high command seems incapable of stopping him.",35662836
2,"Arborhill Ltd has permission to build a 100-bedroom hotel on the Hillsborough Road.\nFilings at Companies House suggest the firm also owns an industrial property in east Belfast.\nThe firm's last set of accounts, for 2014, show it had assets of around Â£4m and liabilities of around Â£5m.\nIt had borrowing with the Bank of Ireland.\nThe firm was controlled by the businessman Ken Cleland.\nMr Cleland is a board member of the Maze Long Kesh Development Corporation",A property firm which had been planning to develop a hotel in Lisburn has been put into administration.,34626100
3,"The 19-year-old, from the St Paul's club in Belfast, beat Germany's Hamza Touba on a unanimous decision to move into the flyweight quarter-finals.\nIrvine needs to come in the top three in Istanbul to be assured of a spot on the Ireland boxing team for Rio.\nOlympic champion Katie Taylor had an easy win over Martina Schmaranzova of the Czech Republic at the qualifiers.\nShe will now face Yvonne Rasmussen of Denmark in the quarter-finals of the lightweight division on Wednesday.\nCork's Christina Desmond beat top seed Nouchka Fontijan of the Netherlands at middleweight, while Ceire Smith saw off Hungary's Virginia Barankas in the flyweight division.\nWexford's Dean Walsh suffered a split decision defeat by top seeded light-welterweight Lorenzo Sotomayor of Azerbaijan while Clonmel super-heavyweight Dean Gardiner was outpointed by Mahammadrausl Majidor, also from Azerbaijan.",Belfast boxer Brendan Irvine has cleared his first hurdle at the European Olympic qualifiers in Turkey.,36020078
4,"The accolade was awarded by cycling's international governing body, the UCI.\nThe Scottish event was held in June and drew a crowd of almost 20,000 people.\nThe competition forms the third stage of the UCI World Cup Downhill championships and was first held 14 years ago.\nThe competitions are held on a course at Nevis Range, near Fort William.\nThis year, British downhill rider Rachel Atherton won the women's final for the ninth consecutive time.\nSalisbury-born Atherton finished 12 seconds ahead of second-placed Tracey Hannah, from Australia.\nManon Carpenter, from Caerphilly, South Wales, recovered from a crash to finish third.\nSouth African Greg Minaar won the men's final. The USA's Aaron Gwin was second and Danny Hart, from Redcar, third.",This year's Fort William Mountain Bike World Cup and Buff 4X Pro Tour weekend has been named the best downhill event in the world in 2016.,38156620


The metric is an instance of [`datasets.Metric`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Metric):

In [13]:
metric

Metric(name: "rouge", features: {'predictions': Value(dtype='string', id='sequence'), 'references': Value(dtype='string', id='sequence')}, usage: """
Calculates average rouge scores for a list of hypotheses and references
Args:
    predictions: list of predictions to score. Each prediction
        should be a string with tokens separated by spaces.
    references: list of reference for each prediction. Each
        reference should be a string with tokens separated by spaces.
    rouge_types: A list of rouge types to calculate.
        Valid names:
        `"rouge{n}"` (e.g. `"rouge1"`, `"rouge2"`) where: {n} is the n-gram based scoring,
        `"rougeL"`: Longest common subsequence based scoring.
        `"rougeLSum"`: rougeLsum splits text using `"
"`.
        See details in https://github.com/huggingface/datasets/issues/617
    use_stemmer: Bool indicating whether Porter stemmer should be used to strip word suffixes.
    use_aggregator: Return aggregates if this is set to True
Retu

You can call its `compute` method with your predictions and labels, which need to be list of decoded strings:

In [14]:
fake_preds = ["hello there", "general kenobi"]
fake_labels = ["hello there", "general kenobi"]
metric.compute(predictions=fake_preds, references=fake_labels)

{'rouge1': AggregateScore(low=Score(precision=1.0, recall=1.0, fmeasure=1.0), mid=Score(precision=1.0, recall=1.0, fmeasure=1.0), high=Score(precision=1.0, recall=1.0, fmeasure=1.0)),
 'rouge2': AggregateScore(low=Score(precision=1.0, recall=1.0, fmeasure=1.0), mid=Score(precision=1.0, recall=1.0, fmeasure=1.0), high=Score(precision=1.0, recall=1.0, fmeasure=1.0)),
 'rougeL': AggregateScore(low=Score(precision=1.0, recall=1.0, fmeasure=1.0), mid=Score(precision=1.0, recall=1.0, fmeasure=1.0), high=Score(precision=1.0, recall=1.0, fmeasure=1.0)),
 'rougeLsum': AggregateScore(low=Score(precision=1.0, recall=1.0, fmeasure=1.0), mid=Score(precision=1.0, recall=1.0, fmeasure=1.0), high=Score(precision=1.0, recall=1.0, fmeasure=1.0))}

## Preprocessing the data

Before we can feed those texts to our model, we need to preprocess them. This is done by a 🤗 Transformers `Tokenizer` which will (as the name indicates) tokenize the inputs (including converting the tokens to their corresponding IDs in the pretrained vocabulary) and put it in a format the model expects, as well as generate the other inputs that the model requires.

To do all of this, we instantiate our tokenizer with the `AutoTokenizer.from_pretrained` method, which will ensure:

- we get a tokenizer that corresponds to the model architecture we want to use,
- we download the vocabulary used when pretraining this specific checkpoint.

That vocabulary will be cached, so it's not downloaded again the next time we run the cell.

In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

By default, the call above will use one of the fast tokenizers (backed by Rust) from the 🤗 Tokenizers library.

You can directly call this tokenizer on one sentence or a pair of sentences:

In [None]:
tokenizer("Hello, this one sentence!")

Depending on the model you selected, you will see different keys in the dictionary returned by the cell above. They don't matter much for what we're doing here (just know they are required by the model we will instantiate later), you can learn more about them in [this tutorial](https://huggingface.co/transformers/preprocessing.html) if you're interested.

Instead of one sentence, we can pass along a list of sentences:

In [None]:
tokenizer(["Hello, this one sentence!", "This is another sentence."])

To prepare the targets for our model, we need to tokenize them inside the `as_target_tokenizer` context manager. This will make sure the tokenizer uses the special tokens corresponding to the targets:

In [None]:
with tokenizer.as_target_tokenizer():
    print(tokenizer(["Hello, this one sentence!", "This is another sentence."]))

If you are using one of the five T5 checkpoints we have to prefix the inputs with "summarize:" (the model can also translate and it needs the prefix to know which task it has to perform).

In [None]:
if model_checkpoint in ["t5-small", "t5-base", "t5-larg", "t5-3b", "t5-11b"]:
    prefix = "summarize: "
else:
    prefix = ""

We can then write the function that will preprocess our samples. We just feed them to the `tokenizer` with the three arguments. `padding="max_length"` will ensure that an input shorter than maximum length will be padded to the maximum length. `truncation=True` will ensure that an input longer than maximum length will be truncated to the maximum length. `max_length=max_input/target_length` sets the maximum length of a sequence.

Note that it is necessary to pad all the sentences to the same length since currently Graphcore's PyTorch implementation only runs in static mode.

In [None]:
max_input_length = 1024
max_target_length = 128


def preprocess_function(examples):
    inputs = [prefix + doc for doc in examples["document"]]
    model_inputs = tokenizer(
        inputs, max_length=max_input_length, padding="max_length", truncation=True
    )

    # Setup the tokenizer for targets
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(
            examples["summary"],
            max_length=max_target_length,
            padding="max_length",
            truncation=True,
        )

    # Since we are padding here, replace all tokenizer.pad_token_id in the labels by -100 when we want to ignore
    # padding in the loss.
    labels["input_ids"] = [
        [(l if l != tokenizer.pad_token_id else -100) for l in label]
        for label in labels["input_ids"]
    ]

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

This function works with one or several examples. In the case of several examples, the tokenizer will return a list of lists for each key:

In [None]:
preprocess_function(raw_datasets["train"][:2])

To apply this function on all the pairs of sentences in our dataset, we just use the `map` method of our `dataset` object we created earlier. This will apply the function on all the elements of all the splits in `dataset`, so our training, validation and testing data will be preprocessed in one single command.

In [None]:
tokenized_datasets = raw_datasets.map(preprocess_function, batched=True)

Even better, the results are automatically cached by the 🤗 Datasets library to avoid spending time on this step the next time you run your notebook. The 🤗 Datasets library is normally smart enough to detect when the function you pass to map has changed (and thus requires to not use the cache data). For instance, it will properly detect if you change the task in the first cell and rerun the notebook. 🤗 Datasets warns you when it uses cached files, you can pass `load_from_cache_file=False` in the call to `map` to not use the cached files and force the preprocessing to be applied again.

Note that we passed `batched=True` to encode the texts by batches together. This is to leverage the full benefit of the fast tokenizer we loaded earlier, which will use multi-threading to treat the texts in a batch concurrently.

## Fine-tuning the model

Now that our data is ready, we can download the pretrained model and fine-tune it. Since our task is of the sequence-to-sequence kind, we use the `AutoModelForSeq2SeqLM` class. Like with the tokenizer, the `from_pretrained` method will download and cache the model for us.

In [None]:
from transformers import AutoModelForSeq2SeqLM, DataCollatorForSeq2Seq
from optimum.graphcore import IPUConfig, IPUSeq2SeqTrainer, IPUSeq2SeqTrainingArguments

model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)

Note that  we don't get a warning like in our classification example. This means we used all the weights of the pretrained model and there is no randomly initialized head in this case.

To instantiate a `IPUSeq2SeqTrainer`, we will need to define four more things. The first thing we need to define is the `IPUConfig`, which is a class that specifies attributes and configuration parameters to compile and put the model on the device. We initialize it with one config name or path:

In [None]:
ipu_config_name = "Graphcore/t5-small-ipu"

# Below, `inference_layers_per_ipu` uses the -1 wildcard
# to split encoder and decoder layers evenly across IPUs
# for inference
ipu_config = IPUConfig.from_pretrained(
    ipu_config_name,
    executable_cache_dir=executable_cache_dir,
    inference_layers_per_ipu=[-1],
)

The other thing we need to define is the `IPUSeq2SeqTrainingArguments`, which is a class that contains all the attributes to customize the training. It requires one folder name, which will be used to save the checkpoints of the model, and all other arguments are optional:

In [None]:
micro_batch_size = 1
gradient_accumulation_steps = 16

model_name = model_checkpoint.split("/")[-1]
args = IPUSeq2SeqTrainingArguments(
    f"{model_name}-finetuned-xsum",
    evaluation_strategy="steps",
    eval_steps=3,
    learning_rate=2e-5,
    per_device_train_batch_size=micro_batch_size,
    per_device_eval_batch_size=micro_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    pod_type=pod_type,
    weight_decay=0.01,
    save_total_limit=3,
    num_train_epochs=1,
    predict_with_generate=True,
    generation_max_length=max_target_length,
    dataloader_drop_last=True,
    logging_steps=100,
    push_to_hub=False,
)

Here we set the evaluation to be done at the end of each epoch, tweak the learning rate, use the three batch-size-related arguments, namely `micro_batch_size`, `gradient_accumulation_steps` and `pod_type` defined at the top of the cell and customize the weight decay. Since the `IPUSeq2SeqTrainer` will save the model regularly and our dataset is quite large, we tell it to make three saves maximum.

The last argument to setup everything so we can push the model to the [Hub](https://huggingface.co/models) regularly during training. Remove it if you didn't follow the installation steps at the top of the notebook. If you want to save your model locally in a name that is different than the name of the repository it will be pushed, or if you want to push your model under an organization and not your name space, use the `hub_model_id` argument to set the repo name (it needs to be the full name, including your namespace: for instance `"sgugger/t5-finetuned-xsum"` or `"huggingface/t5-finetuned-xsum"`).

Then, we need a special kind of data collator, which will prepare the `decoder_input_ids`:

In [None]:
data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)

The last thing to define for our `IPUSeq2SeqTrainer` is how to compute the metrics from the predictions. We need to define a function for this, which will just use the `metric` we loaded earlier, and we have to do a bit of pre-processing to decode the predictions into texts:

In [None]:
import nltk
import numpy as np


def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)
    # Replace -100 in the labels as we can't decode them.
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

    # Rouge expects a newline after each sentence
    decoded_preds = [
        "\n".join(nltk.sent_tokenize(pred.strip())) for pred in decoded_preds
    ]
    decoded_labels = [
        "\n".join(nltk.sent_tokenize(label.strip())) for label in decoded_labels
    ]

    result = metric.compute(
        predictions=decoded_preds, references=decoded_labels, use_stemmer=True
    )
    # Extract a few results
    result = {key: value.mid.fmeasure * 100 for key, value in result.items()}

    # Add mean generated length
    prediction_lens = [
        np.count_nonzero(pred != tokenizer.pad_token_id) for pred in predictions
    ]
    result["gen_len"] = np.mean(prediction_lens)

    return {k: round(v, 4) for k, v in result.items()}

Then we just need to pass all of this along with our datasets to the `IPUSeq2SeqTrainer`:

In [None]:
trainer = IPUSeq2SeqTrainer(
    model,
    ipu_config,
    args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

We can now finetune our model by just calling the `train` method:

In [None]:
trainer.train()

You can now upload the result of the training to the Hub, just execute this instruction:

In [None]:
# trainer.push_to_hub()

You can now share this model with all your friends, family, favorite pets: they can all load it with the identifier `"your-username/the-name-you-picked"` so for instance:

```python
from transformers import AutoModelForSeq2SeqLM

model = AutoModelForSeq2SeqLM.from_pretrained("sgugger/my-awesome-model")
```