Copyright (c) 2023 Graphcore Ltd. All rights reserved.

# Text Summarization on IPUs using BART-L - Inference

This notebook demonstrates a text summarization task with a BART-L model using an inference pipeline from 🤗 Optimum, run on Graphcore IPUs.

### Summary table
|  Domain | Tasks | Model | Datasets | Workflow |   Number of IPUs   | Execution time |
|---------|-------|-------|----------|----------|--------------|--------------|
| NLP  | Text summarization | BART-L | - | Inference | Recommended: 2 | 5 min    |


## Environment setup

The best way to run this demo is on Paperspace Gradient's cloud IPUs because everything is already set up for you. 

To run the demo using other IPU hardware, you need to have the Poplar SDK enabled and the relevant PopTorch wheels installed. Refer to the [getting started guide](https://docs.graphcore.ai/en/latest/getting-started.html#getting-started) for your system for details on how to enable the Poplar SDK and install the PopTorch wheels.

## Requirements

Before running the model on IPUs you have to install the Python dependencies:

In [None]:
%pip install optimum-graphcore==0.7.0
%pip install git+https://github.com/graphcore/graphcore-cloud-tools.git
%pip install wikipedia

%load_ext graphcore_cloud_tools.notebook_logging.gc_logger

In order to improve usability and support for future users, Graphcore would like to collect information about the applications and code being run in this notebook. The following information will be anonymised before being sent to Graphcore:

- User progression through the notebook
- Notebook details: number of cells, code being run and the output of the cells
- Environment details

You can disable logging at any time by running `%unload_ext graphcore_cloud_tools.notebook_logging.gc_logger` from any cell.

In [None]:
import os

exec_cache_dir = os.getenv("POPLAR_EXECUTABLE_CACHE_DIR", "/tmp/exe_cache/")

## Model preparation

We start by preparing the model. First, we define the configuration needed to run the model on the IPU. `IPUConfig` is a class that specifies attributes and configuration parameters to compile and put the model on the device:

In [None]:
from optimum.graphcore import IPUConfig

ipu_config = IPUConfig(
    layers_per_ipu=[12, 12],
    matmul_proportion=0.15,
    executable_cache_dir=exec_cache_dir,
)

Next, let's import `pipeline` from `optimum.graphcore` and create our summarization pipeline:

In [None]:
from optimum.graphcore import pipeline

summarizer = pipeline(
    "summarization",
    model="facebook/bart-large-cnn",
    tokenizer="facebook/bart-large-cnn",
    ipu_config=ipu_config.to_dict(),
    config="facebook/bart-large-cnn",
    max_input_length=1024,
    truncation=True,
    parallelize_kwargs={
        "max_length": 150,
        "num_beams": 3,
        "use_encoder_output_buffer": True,
        "on_device_generation_steps": 16,
    },
)

We define an input to test the model.

In [None]:
input_test = 'In computing, a compiler is a computer program that translates computer code written in one programming language (the source language) into another language (the target language). The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language (e.g. assembly language, object code, or machine code) to create an executable program.'
input_test

Compilation time for the 1st run: ~ 2:30

In [None]:
%%time
summarizer(input_test, max_length=150, num_beams=3)

## A fairy tale long story short...

The first call to the pipeline was a bit slow, it took several seconds to provide the answer. This behaviour is due to compilation of the model which happens on the first call.
On subsequent prompts it is much faster:

In [None]:
the_princess_and_the_pea = 'Once upon a time there was a prince who wanted to marry a princess; but she would have to be a real princess. He travelled all over the world to find one, but nowhere could he get what he wanted. There were princesses enough, but it was difficult to find out whether they were real ones. There was always something about them that was not as it should be. So he came home again and was sad, for he would have liked very much to have a real princess. One evening a terrible storm came on; there was thunder and lightning, and the rain poured down in torrents. Suddenly a knocking was heard at the city gate, and the old king went to open it. It was a princess standing out there in front of the gate. But, good gracious! what a sight the rain and the wind had made her look. The water ran down from her hair and clothes; it ran down into the toes of her shoes and out again at the heels. And yet she said that she was a real princess. Well, we\'ll soon find that out, thought the old queen. But she said nothing, went into the bed-room, took all the bedding off the bedstead, and laid a pea on the bottom; then she took twenty mattresses and laid them on the pea, and then twenty eider-down beds on top of the mattresses. On this the princess had to lie all night. In the morning she was asked how she had slept. "Oh, very badly!" said she. "I have scarcely closed my eyes all night. Heaven only knows what was in the bed, but I was lying on something hard, so that I am black and blue all over my body. It\'s horrible!" Now they knew that she was a real princess because she had felt the pea right through the twenty mattresses and the twenty eider-down beds. Nobody but a real princess could be as sensitive as that. So the prince took her for his wife, for now he knew that he had a real princess; and the pea was put in the museum, where it may still be seen, if no one has stolen it. There, that is a true story.'
the_princess_and_the_pea

In [None]:
%%time
summarizer(the_princess_and_the_pea, max_length=150, num_beams=3)

## Summarization of Wikipedia articles
Now let's use the Wikipedia API to search for some long text that can be summarized:

In [None]:
import wikipedia

# TRY IT YOURSELF BY CHANGING THE PAGE TITLE BELOW
page_title = "Queen (band)"
text = wikipedia.page(page_title).content
text

In [None]:
%%time
summarizer(
    text,  # NOTE: the input text would be truncated to max_input_length=1024
    max_length=150,
    num_beams=3,
)

## Summarization of medical health records
The summarization task may be also useful in summarising medical health records (MHR). Let's import an open source dataset with some medical samples.

In [None]:
from datasets import load_dataset

dataset = load_dataset("rungalileo/medical_transcription_40")
dataset

We focus on the medical report labeled as "text" and from the training dataset select a random patient ID.

In [None]:
import random

# RUN THIS CELL AGAIN TO SELECT ANOTHER REPORT
random_patient_id = random.randint(0, len(dataset["train"]))

exemplary_medical_report = dataset["train"][random_patient_id]["text"]
exemplary_medical_report

In [None]:
%%time
summarizer(exemplary_medical_report, max_length=150, num_beams=3)

## Optional - Release IPUs in use

The IPython kernel has a lock on the IPUs used to run the model, preventing other users from using them. For example, if you wish to use other notebooks after working your way through this one, it may be necessary to manually run the below cell to release IPUs from use. This will happen by default if you use the "Run All" option. More information on the topic can be found at [Managing IPU Resources](https://github.com/gradient-ai/Graphcore-HuggingFace/blob/main/useful-tips/managing_ipu_resources.ipynb).

In [None]:
summarizer.model.detachFromDevice()

## Conclusions and next steps

This notebook demonstrated running a text summarization task on Graphcore IPUs, with BART-L using an inference pipeline from 🤗 Optimum.

Try out the other [IPU-powered Jupyter Notebooks](https://www.graphcore.ai/ipu-jupyter-notebooks) to see how how IPUs perform on other tasks.