If you're opening this Notebook on colab, you will probably need to install 🤗 Transformers and 🤗 Datasets. Uncomment the following cell and run it.

In [1]:
#! pip install datasets transformers

If you're opening this notebook locally, make sure your environment has the last version of those libraries installed.

# Fine-tuning a model on a summarization task

In this notebook, we will see how to fine-tune one of the [🤗 Transformers](https://github.com/huggingface/transformers) model to a summarizationk task. We will use the [XSum dataset](https://arxiv.org/pdf/1808.08745.pdf) (for extreme summarization) which contains BBC articles accompanied with single-sentence summaries.

We will see how to easily load the dataset for this task using 🤗 Datasets and how to fine-tune a model on it using the `Trainer` API.

In [2]:
model_checkpoint = "t5-small"

This notebook is built to run  with any model checkpoint from the [Model Hub](https://huggingface.co/models) as long as that model has a version with a classification head in the Transformers library. Here we picked the [`t5-small`](https://huggingface.co/t5-small) checkpoint. 

## Loading the dataset

We will use the [🤗 Datasets](https://github.com/huggingface/datasets) library to download the data and get the metric we need to use for evaluation (to compare our model to the benchmark). This can be easily done with the functions `load_dataset` and `load_metric`.  

In [3]:
from datasets import load_dataset, load_metric

raw_datasets = load_dataset("xsum")
metric = load_metric("rouge")

Using custom data configuration default
Reusing dataset xsum (/home/sgugger/.cache/huggingface/datasets/xsum/default/1.2.0/f9abaabb5e2b2a1e765c25417264722d31877b34ec34b437c53242f6e5c30d6d)


The `dataset` object itself is [`DatasetDict`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasetdict), which contains one key for the training, validation and test set:

In [4]:
raw_datasets

DatasetDict({
    train: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 204045
    })
    validation: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 11332
    })
    test: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 11334
    })
})

To access an actual element, you need to select a split first, then give an index:

In [5]:
raw_datasets["train"][0]

{'document': 'Recent reports have linked some France-based players with returns to Wales.\n"I\'ve always felt - and this is with my rugby hat on now; this is not region or WRU - I\'d rather spend that money on keeping players in Wales," said Davies.\nThe WRU provides £2m to the fund and £1.3m comes from the regions.\nFormer Wales and British and Irish Lions fly-half Davies became WRU chairman on Tuesday 21 October, succeeding deposed David Pickering following governing body elections.\nHe is now serving a notice period to leave his role as Newport Gwent Dragons chief executive after being voted on to the WRU board in September.\nDavies was among the leading figures among Dragons, Ospreys, Scarlets and Cardiff Blues officials who were embroiled in a protracted dispute with the WRU that ended in a £60m deal in August this year.\nIn the wake of that deal being done, Davies said the £3.3m should be spent on ensuring current Wales-based stars remain there.\nIn recent weeks, Racing Metro fla

To get a sense of what the data looks like, the following function will show some examples picked randomly in the dataset.

In [6]:
import datasets
import random
import pandas as pd
from IPython.display import display, HTML

def show_random_elements(dataset, num_examples=5):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)
    
    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [7]:
show_random_elements(raw_datasets["train"])

Unnamed: 0,document,id,summary
0,"The 21-year-old scored his first try with a clever chip-and-chase against Northampton and followed that with a dancing effort against Harlequins the week after.\nThe ex-England Under-19 international made his debut in April 2013, but made just four outings before this campaign.\n""It's good to get my chance and grab it with both hands,"" he told BBC Sport.\n""Sale give a lot of opportunities for young boys, so taking those opportunities is a step forward .""\nThe former Manchester Rugby Club player was a fly-half growing up, but now hopes to cement a position in the Sharks midfield.\n""Anywhere in the centre, I don't mind either inside or outside,"" he said.\n""I'm kind of leaning away from fly-half now as there is too much responsibility for me.\n""I'm getting plenty of advice off Jonny [Leota] and Sammy [Tuitupou] who are great role models for me, I couldn't ask for anything better.\n""Every day you're reminded boys pushing through like last year with Mike Haley - he was in my year so to see him go through to the first-team motivates you more to push on as a player.""",34768054,Sale Sharks centre Sam James aims to continue taking his opportunities after his run in the first-team this season.
1,"Edwin Poots has been given leave by the Court of Appeal to appeal its ruling that any ban on gay and lesbian couples adopting is unlawful.\nThe Attorney General had his request for clarification on the issue refused.\nThe case is now expected to go before the Supreme Court in London.\nThe department of health's legal team can now petition the higher court directly to hear its case.\nIn October last year, the ban based on relationship status was held to discriminate against those in civil partnerships and to breach their human rights.\nPreviously, a single gay or lesbian person could adopt children in NI, but a couple in a civil partnership could not.\nAfter the Court of Appeal ruling, adoption agencies were told they were able to accept applications from same-sex and unmarried couples and those in civil partnerships.\nAt the time, the Human Rights Commission (NIHRC) said the ruling would bring NI into line with the rest of the UK.\nRepresentatives of the Rainbow Project, Northern Ireland's largest lesbian, gay, bisexual and transgender advocacy organisation, expressed dismay that the department is now seeking to go to the Supreme Court over the issue.\nRainbow Project director, John O'Doherty said they were disappointed with the minister's decision.\n""Both the High Court and the Court of Appeal have noted the practice of banning same-sex and unmarried couples from adopting is discriminatory,"" he said.\n""Enough public money has been spent on this fool's errand. The minister should focus his time on ensuring the best available homes for children in care in Northern Ireland.""",24081255,The health minister is set to take his fight against the extension of adoption rights to Northern Ireland's gay and unmarried couples to the UK's highest court.
2,"Many parents have struggled to register on Childcare Choices, which offers access to two new childcare schemes.\nNicky Morgan, who now chairs the Treasury Committee, has demanded answers from the boss of Revenue & Customs, which runs the website.\nHMRC apologised for the inconvenience, saying it had improved the website.\nBut nursery providers fear it could further jeopardise the success of the schemes.\nParents first reported glitches while using the website in mid-May - just weeks after it was made live.\nThe website is the key way in which parents access help with two new government-funded childcare schemes:\nOne mother, writing in the Financial Times, said things kept going wrong no matter what time of day or night she logged on.\nAnother tweeted: ""I am having a nightmare with the 30 hours' free childcare application.""\nMrs Morgan wrote to permanent secretary Jon Thompson asking numerous questions about the issue.\n""It's concerning that some parents have struggled to apply for childcare funding, due to technical issues with the government's childcare service website,"" she said.\n""To make matters worse, it appears that the childcare service helpline, for parents suffering problems with the website, is also experiencing technical difficulties.""\nMrs Morgan asked for information about:\nAn HMRC spokesman said: ""We know that some parents and childcare providers have experienced difficulties accessing the service, and we are sorry about the inconvenience.\n""We've now made significant improvements based on customer feedback, and on average more than 2,000 parents are applying successfully every single day.""\n""We continue to make updates to improve the customer experience and ensure everyone can easily access their account.\n""Parents and providers experiencing difficulties or needing technical support can phone the childcare service helpline on 0300 123 4097.""\nBut the Pre-School Learning Alliance said it was not acceptable that so many parents were still struggling to actually use the system.\nChief executive Neil Leitch said: ""The government has claimed that working families could benefit from thousands of pounds of savings as a result of the tax-free childcare and the 30-hour offers, but of course, for that to happen, the IT system underpinning both schemes has to actually work.\n""Indeed, the recent 30-hour pilot evaluation report explicitly stated that the effective implementation of the childcare service system was vital for the policy to have any chance of succeeding.""\nNational Day Nurseries Association chief executive Purnima Tanuku said: ""With the rollout of the government's 30 hour funded childcare places from 1 September, some parents are getting worried that they won't even be able to access their places in time because they are being let down by the IT system.\n""HMRC must ensure they resolve problems with the website as a matter of urgency and reassure parents and providers.""",40812455,Concerns about technical problems with the website parents in England have to use to apply for help with childcare have been raised with the government.
3,"Kevin Devaney scored a close-range opener in the 28th minute and Vinny Faherty wrapped it up near the end.\nKenny Shiels fielded a much-changed City side, giving opportunities to reserve team players keeper Eric Grimes and 18-year-old striker Cathal Farren.\nIt was a rare win for Galway who are bottom of the Premier Division table.\nThey have had four draws and four defeats in their first eight league matches.\nDerry started well, dominating possession without really troubling home keeper Ciaran Nugent.\nBut they fell behind when Devaney stabbed in at the back post from Faherty's flick-on following a Gary Kinneen corner.\nDerry will be disappointed at how they defended the set piece with Aaron McEneff appearing to be caught out.\nFarren, Harry Monaghan and McEneff had chances in the second half but the visitors could not find an equaliser.\nGalway made sure of their progress when Faherty made it 2-0 in the 90th minute with a shot which bounced over keeper Grimes.",39614711,Derry City lost for a fourth successive match as Galway United secured a deserved victory in Monday's EA Sports Cup second round match.
4,"The paper on the UK's role in the world was being compiled by the Labour politician in the weeks before she was killed in June.\nBrendan Cox said his wife felt strongly the UK must ""stand up"" for civilians threatened by war and genocide.\nThe Cost of Doing Nothing has been published by think-tank Policy Exchange.\nIt warns the reaction to the Iraq war had prompted a rise in ""knee-jerk isolationism, unthinking pacifism and anti-interventionism"".\nWithdrawing from the world stage posed ""dangerous"" implications for security and increased the risk of further global instability, it added.\nMr Cox said his wife had been passionately committed to the work, which was forged by her experiences of meeting survivors of genocide.\n""Last week I was clearing some of Jo's things and found the first draft of the report that she had scribbled all over,"" he said.\n""At the top she had written 'Britain must lead again'.""\nThe report offers examples of successful interventions, such as the introduction of a no-fly zone in northern Iraq in 1991 to protect Kurds.\nIt also points to the consequences of doing nothing, such as the 1994 Rwandan genocide.\nMrs Cox had been working on a draft of the paper with Tom Tugendhat, the Conservative MP for Tonbridge and Malling.\nHe said: ""To stand aside would not make us or the world safer, but leave us vulnerable to the whims of others rather than doing what we have always done - shape our own destiny and be a force for good.""\nThe document was completed by Mrs Cox's friend Alison McGovern, the Labour MP for Wirral South.\nShe added: ""Jo never believed that simply doing nothing in the face of atrocities was good enough, and neither should we.""\nMrs Cox was shot and stabbed by Thomas Mair in Birstall in her West Yorkshire constituency. Her killer was jailed for life in November 2016.",38755467,"Britain must not ""shy away"" from military action, according to a report worked on by murdered MP Jo Cox."


The metric is an instance of [`datasets.Metric`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Metric):

In [8]:
metric

Metric(name: "rouge", features: {'predictions': Value(dtype='string', id='sequence'), 'references': Value(dtype='string', id='sequence')}, usage: """
Calculates average rouge scores for a list of hypotheses and references
Args:
    predictions: list of predictions to score. Each predictions
        should be a string with tokens separated by spaces.
    references: list of reference for each prediction. Each
        reference should be a string with tokens separated by spaces.
    rouge_types: A list of rouge types to calculate.
        Valid names:
        `"rouge{n}"` (e.g. `"rouge1"`, `"rouge2"`) where: {n} is the n-gram based scoring,
        `"rougeL"`: Longest common subsequence based scoring.
        `"rougeLSum"`: rougeLsum splits text using `"
"`.
        See details in https://github.com/huggingface/datasets/issues/617
    use_stemmer: Bool indicating whether Porter stemmer should be used to strip word suffixes.
    use_agregator: Return aggregates if this is set to True
Retu

You can call its `compute` method with your predictions and labels, which need to be list of decoded strings:

In [9]:
fake_preds = ["hello there", "general kenobi"]
fake_labels = ["hello there", "general kenobi"]
metric.compute(predictions=fake_preds, references=fake_labels)

{'rouge1': AggregateScore(low=Score(precision=1.0, recall=1.0, fmeasure=1.0), mid=Score(precision=1.0, recall=1.0, fmeasure=1.0), high=Score(precision=1.0, recall=1.0, fmeasure=1.0)),
 'rouge2': AggregateScore(low=Score(precision=1.0, recall=1.0, fmeasure=1.0), mid=Score(precision=1.0, recall=1.0, fmeasure=1.0), high=Score(precision=1.0, recall=1.0, fmeasure=1.0)),
 'rougeL': AggregateScore(low=Score(precision=1.0, recall=1.0, fmeasure=1.0), mid=Score(precision=1.0, recall=1.0, fmeasure=1.0), high=Score(precision=1.0, recall=1.0, fmeasure=1.0)),
 'rougeLsum': AggregateScore(low=Score(precision=1.0, recall=1.0, fmeasure=1.0), mid=Score(precision=1.0, recall=1.0, fmeasure=1.0), high=Score(precision=1.0, recall=1.0, fmeasure=1.0))}

## Preprocessing the data

Before we can feed those texts to our model, we need to preprocess them. This is done by a 🤗 Transformers `Tokenizer` which will (as the name indicates) tokenize the inputs (including converting the tokens to their corresponding IDs in the pretrained vocabulary) and put it in a format the model expects, as well as generate the other inputs that model requires.

To do all of this, we instantiate our tokenizer with the `AutoTokenizer.from_pretrained` method, which will ensure:

- we get a tokenizer that corresponds to the model architecture we want to use,
- we download the vocabulary used when pretraining this specific checkpoint.

That vocabulary will be cached, so it's not downloaded again the next time we run the cell.

In [10]:
from transformers import AutoTokenizer
    
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

Bu default, the call above will use one of the fast tokenizers (backed by Rust) from the 🤗 Tokenizers library.

You can directly call this tokenizer on one sentence or a pair of sentences:

In [11]:
tokenizer("Hello, this one sentence!")

{'input_ids': [8774, 6, 48, 80, 7142, 55, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1]}

Depending on the model you selected, you will see different keys in the dictionary returned by the cell above. They don't matter much for what we're doing here (just know they are required by the model we will instantiate later), you can learn more about them in [this tutorial](https://huggingface.co/transformers/preprocessing.html) if you're interested.

Instead of one sentence, we can pass along a list of sentences:

In [12]:
tokenizer(["Hello, this one sentence!", "This is another sentence."])

{'input_ids': [[8774, 6, 48, 80, 7142, 55, 1], [100, 19, 430, 7142, 5, 1]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1]]}

To prepare the targets for our model, we need to tokenize them inside the `as_target_tokenizer` context manager. This will make sure the tokenizer uses the special tokens corresponding to the targets:

In [13]:
with tokenizer.as_target_tokenizer():
    print(tokenizer(["Hello, this one sentence!", "This is another sentence."]))

{'input_ids': [[8774, 6, 48, 80, 7142, 55, 1], [100, 19, 430, 7142, 5, 1]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1]]}


We can them write the function that will preprocess our samples. We just feed them to the `tokenizer` with the argument `truncation=True`. This will ensure that an input longer that what the model selected can handle will be truncated to the maximum length accepted by the model. The padding will be dealt with later on (in a data collator) so we pad examples to the longest length in the batch and not the whole dataset.

In [14]:
max_input_length = 1024
max_target_length = 128

def preprocess_function(examples):
    model_inputs = tokenizer(examples["document"], max_length=max_input_length, truncation=True)

    # Setup the tokenizer for targets
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(examples["summary"], max_length=max_target_length, truncation=True)

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

This function works with one or several examples. In the case of several examples, the tokenizer will return a list of lists for each key:

In [15]:
preprocess_function(raw_datasets['train'][:2])

{'input_ids': [[17716, 2279, 43, 5229, 128, 1410, 18, 390, 1508, 28, 5146, 12, 10256, 5, 96, 196, 31, 162, 373, 1800, 3, 18, 11, 48, 19, 28, 82, 22209, 3, 547, 30, 230, 117, 48, 19, 59, 1719, 42, 549, 8503, 3, 18, 27, 31, 26, 1066, 1492, 24, 540, 30, 2627, 1508, 16, 10256, 976, 243, 28571, 5, 37, 549, 8503, 795, 17586, 51, 12, 8, 3069, 11, 3996, 13606, 51, 639, 45, 8, 6266, 5, 18263, 10256, 11, 2390, 11, 7262, 10371, 7, 3971, 18, 17114, 28571, 1632, 549, 8503, 13404, 30, 2818, 1401, 1797, 6, 7229, 53, 20, 12151, 1955, 8356, 49, 53, 826, 3, 19585, 643, 9768, 5, 216, 19, 230, 3122, 3, 9, 2103, 1059, 12, 1175, 112, 1075, 38, 24260, 350, 16103, 10282, 7, 5752, 4297, 227, 271, 3, 11060, 30, 12, 8, 549, 8503, 1476, 16, 1600, 5, 28571, 47, 859, 8, 1374, 5638, 859, 10282, 7, 6, 411, 7, 2026, 63, 7, 6, 14586, 7677, 11, 26911, 2419, 7, 4298, 113, 130, 10960, 52, 26786, 16, 3, 9, 813, 11674, 11044, 28, 8, 549, 8503, 24, 3492, 16, 3, 9, 3996, 3328, 51, 1154, 16, 1660, 48, 215, 5, 86, 8, 7178, 13, 

To apply this function on all the pairs of sentences in our dataset, we just use the `map` method of our `dataset` object we created earlier. This will apply the function on all the elements of all the splits in `dataset`, so our training, validation and testing data will be preprocessed in one single command.

In [16]:
tokenized_datasets = raw_datasets.map(preprocess_function, batched=True)

Loading cached processed dataset at /home/sgugger/.cache/huggingface/datasets/xsum/default/1.2.0/f9abaabb5e2b2a1e765c25417264722d31877b34ec34b437c53242f6e5c30d6d/cache-4ea29f204e9f1dd6.arrow
Loading cached processed dataset at /home/sgugger/.cache/huggingface/datasets/xsum/default/1.2.0/f9abaabb5e2b2a1e765c25417264722d31877b34ec34b437c53242f6e5c30d6d/cache-7b6296e39606cd94.arrow
Loading cached processed dataset at /home/sgugger/.cache/huggingface/datasets/xsum/default/1.2.0/f9abaabb5e2b2a1e765c25417264722d31877b34ec34b437c53242f6e5c30d6d/cache-5b5e3594573a10e1.arrow


Even better, the results are automatically cached by the 🤗 Datasets library to avoid spending time on this step the next time you run your notebook. The 🤗 Datasets library is normally smart enough to detect when the function you pass to map has changed (and thus requires to not use the cache data). For instance, it will properly detect if you change the task in the first cell and rerun the notebook. 🤗 Datasets warns you when it uses cached files, you can pass `load_from_cache_file=False` in the call to `map` to not use the cached files and force the preprocessing to be applied again.

Note that we passed `batched=True` to encode the texts by batches together. This is to leverage the full benefit of the fast tokenizer we loaded earlier, which will use multi-threading to treat the texts in a batch concurrently.

## Fine-tuning the model

Now that our data is ready, we can download the pretrained model and fine-tune it. Since our task is of the sequence-to-sequence kind, we use the `AutoModelForSeq2SeqLM` class. Like with the tokenizer, the `from_pretrained` method will download and cache the model for us.

In [17]:
from transformers import AutoModelForSeq2SeqLM, DataCollatorForSeq2Seq, Seq2SeqTrainingArguments, Seq2SeqTrainer

model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)

Note that  we don't get a warning like i our classification example. This means we used all the weights of the pretrained model and there is no randomly initialized head in this case.

To instantiate a `Seq2SeqTrainer`, we will need to define three more things. The most important is the [`Seq2SeqTrainingArguments`](https://huggingface.co/transformers/main_classes/trainer.html#transformers.Seq2SeqTrainingArguments), which is a class that contains all the attributes to customize the training. It requires one folder name, which will be used to save the checkpoints of the model, and all other arguments are optional:

In [18]:
batch_size = 16
args = Seq2SeqTrainingArguments(
    "test-summarization",
    evaluation_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    weight_decay=0.01,
    save_total_limit=3,
    num_train_epochs=1,
    predict_with_generate=True,
    fp16=True,
)

Here we set the evaluation to be done at the end of each epoch, tweak the learning rate, use the `batch_size` defined at the top of the cell and customize the weight decay. Since the `Seq2SeqTrainer` will save the model regularly and our dataset is quite large, we tell it to make three saves maximum. Lastly, we use the `predict_with_generate` option (to properly generate summaries) and activate mixed precision training (to go a bit faster).

Then, we need a special kind of data collator, which will not only pad the inputs to the maximum length in the batch, but also the labels:

In [19]:
data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)

The last thing to define for our `Seq2SeqTrainer` is how to compute the metrics from the predictions. We need to define a function for this, which will just use the `metric` we loaded earlier, and we have to do a bit of pre-processing to decode the predictions into texts:

In [20]:
import nltk
import numpy as np

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)
    # Replace -100 in the labels as we can't decode them.
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
    
    # Rouge expects a newline after each sentence
    decoded_preds = ["\n".join(nltk.sent_tokenize(pred.strip())) for pred in decoded_preds]
    decoded_labels = ["\n".join(nltk.sent_tokenize(label.strip())) for label in decoded_labels]
    
    result = metric.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)
    # Extract a few results
    result = {key: value.mid.fmeasure * 100 for key, value in result.items()}
    
    # Add mean generated length
    prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in predictions]
    result["gen_len"] = np.mean(prediction_lens)
    
    return {k: round(v, 4) for k, v in result.items()}

Then we just need to pass all of this along with our datasets to the `Seq2SeqTrainer`:

In [21]:
trainer = Seq2SeqTrainer(
    model,
    args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

You might wonder why we pass along the `tokenizer` when we already preprocessed our data. This is because we will use it once last time to make all the samples we gather the same length by applying padding, which requires knowing the model's preferences regarding padding (to the left or right? with which token?). The `tokenizer` has a pad method that will do all of this right for us, and the `Trainer` will use it. You can customize this part by defining and passing your own `data_collator` which will receive the samples like the dictionaries seen above and will need to return a dictionary of tensors.

We can now finetune our model by just calling the `train` method:

In [22]:
trainer.train()

Epoch,Training Loss,Validation Loss,Rouge1,Rouge2,Rougel,Rougelsum,Gen Len,Runtime,Samples Per Second
1,2.7346,2.489542,27.8736,7.4565,21.876,21.877,18.8148,361.4274,31.353


TrainOutput(global_step=12753, training_loss=2.781377320494136, metrics={'train_runtime': 4818.9718, 'train_samples_per_second': 2.646, 'total_flos': 7.775233923431731e+16, 'epoch': 1.0, 'init_mem_cpu_alloc_delta': 320370, 'init_mem_gpu_alloc_delta': 242026496, 'init_mem_cpu_peaked_delta': 18306, 'init_mem_gpu_peaked_delta': 0, 'train_mem_cpu_alloc_delta': 2637525, 'train_mem_gpu_alloc_delta': 726786560, 'train_mem_cpu_peaked_delta': 138241123, 'train_mem_gpu_peaked_delta': 14676730368})

Check all our Hugging Face [notebook examples](https://huggingface.co/transformers/notebooks.html) and the regular [script examples](https://github.com/huggingface/transformers/tree/master/examples).