# Summarisation Notebook

In this notebook, we will be running a comparison between a standard BART model and a Longformer Encoder Decoder (LED) model. The idea of this comparison is to show the pros and cons of using a long-document transformer, versus a standard pre-trained language model.

To run this notebook, you need to ensure that you have the appropriate virtual environment set up, and have installed the `requirements.txt` file located in `ALTA2021_tutorial/summarisation/`.

In case you are unfamiliar with the models, the use of HuggingFace, or PyTorch Lightning, here are links below:

* **BART**: https://aclanthology.org/2020.acl-main.703/
* **Longformer Encoder Decoder**: https://arxiv.org/abs/2004.05150

* **HuggingFace**: https://huggingface.co/
* **PyTorch Lightning**: https://www.pytorchlightning.ai/

### Setup
#### (Only for  Google Colab Execution)

If you are running the notebook in Google Colab, run the cell below to download the repository witht he required files to run the models and the requirements file.

In [None]:
!git clone https://github.com/ijauregiCMCRC/ALTA2021_tutorial.git
%cd ALTA2021_tutorial/summarisation

#### Install requirements
__Note__: You may have to re-start the runtime environment in Google Colab after
installing the requried packages.

In [None]:
!pip install -r requirements.txt

## Task 1
Covers:
- How to train and evaluate a `BART-base` and `LED-base` model.
- Understand how to use the `pl.LightningModule` and`pl.Trainer`.
- Understand the arguments required to define the model architecture and the model training.

### 1.1 Import Packages

In [None]:
import os
os.getcwd()
import random
import numpy as np
import textwrap  # for inference example
from rouge_score import rouge_scorer  # for inference example

import torch
import pytorch_lightning as pl
from pytorch_lightning.loggers import TestTubeLogger
from pytorch_lightning.callbacks import ModelCheckpoint
import nlp  # to load dataset

from src.summarisation_lightning_model import LmForSummarisation

### 1.2. Define Parameters

Here we define a dictionary of arguments which we pass to the Lightning script when loading the model. These will differ depending on the model you use, and the task you are performing.

For this tutorial, LED is an architecture that is built on top of BART's pre-trained model weights, so model-specific arguments will be very similar.

The remainder of the arguments are set by default in the Trainer function in the following section, and are specific to PyTorch Lightning. 

In [None]:
args ={
    'max_input_len': 512,  # Maximum number of tokens in the source documents, 512 for BART-base, 2048 for LED-base
    'max_output_len': 256,  # Maximum number of tokens in the summary
    'save_dir': '../models/summarisation_bart',  # Path to save the model and logs, 'models/summarisation_bart' for BART, 'models/summarisation_led' for LED
    'tokenizer': 'facebook/bart-base',  # Pretrained tokenizer
    'model_path': 'facebook/bart-base',  # Pretrained model (facebook/bart-base for BART, allenai/led-base-16384)
    'label_smoothing': 0.0, # Label smoothing (not required)
    'epochs': 1,  # Number of epochs during training
    'batch_size': 4,  # Batch size (1 for LED, 4 for BART)
    'grad_accum': 1,  # Gradient accumulation (4 for LED for effective batch size, 1 for BART to keep consistent)
    'lr': 0.00003,  # Training learning rate
    'warmup': 1000,  # Number of warmup steps
    'gpus': 1,  # Number of gpus. 0 for CPU
    'precision': 16,  # Double precision (64), full precision (32) 
                      # or half precision (16). Can be used on CPU, GPU or TPUs.
    'cache_dir': '../datasets/cache/', # Path to dataset cache where dataset is converted
    'attention_dropout': 0.1,  # default
    'adafactor': True,  # use Adafactor optimizer, else Adam
    'debug': False,  # debug run
    'num_workers': 0,  # number of data loader workers
    'grad_ckpt': True,  # gradient checkpointing to save memory
    'attention_mode': 'sliding_chunks',  # Longformer attention mode
    'attention_window': 512  # Longformer attention window
}

### 1.3. Initialize Lightning Module

In this section, we load the model and dataset we choose to use. In this instance, we are loading a dataset that is stored on the HuggingFace Datasets repository. If you have your own dataset you wish to use for training and testing, please review the link <a href="https://huggingface.co/docs/datasets/" target="_blank">here</a>.

- We initialize the random, numpy, torch and cuda with the same seed.
- We initialize our custom LightningModule (`LmForSummarisation`).
- We initialize the logger to capture training information.
- We create a checkpointing callback to save the best model during training.
- We define the Pytorch Lightning trainer.

In [None]:
# Initialize with a seed
seed = 1234
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

# Define PyTorch Lightning model
model = LmForSummarisation(args)
# Include datasets
model.hf_datasets = nlp.load_dataset('multi_news', cache_dir=args['cache_dir'])

# Dataset size - needed to compute number of steps for the lr scheduler
# Sum of training and validation data
args['dataset_size'] = model.hf_datasets['train'].num_rows + model.hf_datasets['validation'].num_rows

# Define logger
logger = TestTubeLogger(
    save_dir=args['save_dir'],
    name='training',
    version=0  # always use version=0
)

# Define checkpoint saver
checkpoint_callback = ModelCheckpoint(
    dirpath=os.path.join(args['save_dir'], "training", "checkpoints"),  # Dir path
    filename='check-{epoch:02d}-{validation_loss:.2f}',
    save_top_k=1,
    verbose=True,
    monitor='validation_loss',
    mode='min',
    period=1
)

print(args)


# Define lightning trainer
trainer = pl.Trainer(gpus=args['gpus'], distributed_backend='dp' if torch.cuda.is_available() else None,
                     track_grad_norm=-1,
                     max_epochs=args['epochs'],
                     max_steps=None,
                     replace_sampler_ddp=False,
                     accumulate_grad_batches=args['grad_accum'],
                     gradient_clip_val=1.0,  # Max grad_norm
                     val_check_interval=1.0,  # Num steps between validation
                     num_sanity_val_steps=2,  # Validation steps for sanity check
                     check_val_every_n_epoch=1,  # Check validation every N
                     logger=logger,
                     callbacks=checkpoint_callback,
                     progress_bar_refresh_rate=10,  # Progress bar for printing (updates every N)
                     precision=args['precision'],
                     amp_backend='native', amp_level='O2'
                     )

### 1.4. Train Model

In this section and the following, we look to train the BART or LED model, and evaluate it over a test set. A few points to note about the comparison between the two models:

* Training time will differ significantly - this is due to the amount of information that either model will need to process. Despite linear attention alleviating quadratic memory complexity, and allowing LED to fit in memory, processing much more data will take longer to train compared to BART.

* We have chosen to showcase a fraction of the power of BART and LED, by restricting their input lengths to 512 and 2048 tokens respectively. This is merely to facilitate the training and testing process. It is worth reminding that LED can process up to 16x the maximum input length of BART (16384 vs. 1024 tokens).

* ROUGE score improvements here are modest, but will increase significantly with longer training times, adequate selection of hyperparameters (e.g. increasing input length, batch size), and the use of the -large variants of these models.

* Summarisation is a computationally heavy task, and can often require a GPU with quite large RAM to utilise the full benefits of either model.

Simply call `trainer.fit()` with your lightning model and the training will start.

In [None]:
# Train model
trainer.fit(model)

### 1.5. Test Model

In [None]:
# Test model
trainer.test(model)

In [None]:
BART_valscores = [0, 0, 0]
LED_valscores = [0, 0, 0]
print(f'BART Val Scores: {BART_valscores[0]} / {BART_valscores[1]} / {BART_valscores[2]}')
print(f'LED Val Scores: {LED_valscores[0]} / {LED_valscores[1]} / {LED_valscores[2]}')

print()

BART_testscores = [0, 0, 0]
LED_testscores = [0, 0, 0]
print(f'BART Test Scores: {BART_testscores[0]} / {BART_testscores[1]} / {BART_testscores[2]}')
print(f'LED Test Scores: {LED_testscores[0]} / {LED_testscores[1]} / {LED_testscores[2]}')

## Task 2: Comparing Models
- Compare different summarisation models that have been previously trained by us with the same code and same dataset.

### Our Models:
- __BART_base__:
    - Tokenizer and model: `facebook/bart-base` ([Huggingface link](https://huggingface.co/facebook/bart-base))
- __Longformer_Encoder_Decoder_base__:
    - Tokenizer and model: `facebook/bart-base` + `allenai/led-base-16384` ([Huggingface link](https://huggingface.co/allenai/led-base-16384))
    
### 2.1. Inference

In this final section, we will load the trained model from the its saved directory and use it purely for inference. We can compare empirically the benefit of using a long-document Transformer such as LED for document summarisation.

#### Document Example

In [None]:
# Example from Multi-News
document = 'GOP Eyes Gains As Voters In 11 States Pick Governors      Enlarge this image toggle caption Jim Cole/AP Jim Cole/AP      Voters in 11 states will pick their governors tonight, and Republicans appear on track to increase their numbers by at least one, with the potential to extend their hold to more than two-thirds of the nation\'s top state offices.      Eight of the gubernatorial seats up for grabs are now held by Democrats; three are in Republican hands. Republicans currently hold 29 governorships, Democrats have 20, and Rhode Island\'s Gov. Lincoln Chafee is an Independent.      Polls and race analysts suggest that only three of tonight\'s contests are considered competitive, all in states where incumbent Democratic governors aren\'t running again: Montana, New Hampshire and Washington.      While those state races remain too close to call, Republicans are expected to wrest the North Carolina governorship from Democratic control, and to easily win GOP-held seats in Utah, North Dakota and Indiana.      Democrats are likely to hold on to their seats in West Virginia and Missouri, and are expected to notch safe wins in races for seats they hold in Vermont and Delaware.      Holding Sway On Health Care      While the occupant of the governor\'s office is historically far less important than the party that controls the state legislature, top state officials in coming years are expected to wield significant influence in at least one major area.      And that\'s health care, says political scientist Thad Kousser, co-author of The Power of American Governors.      "No matter who wins the presidency, national politics is going to be stalemated on the Affordable Care Act," says Kousser, of the University of California, San Diego.      A recent U.S. Supreme Court decision giving states the ability to opt out of the law\'s expansion of Medicaid, the federal insurance program for poor, disabled and elderly Americans, confers "incredible power" on the states and their governors, Kousser says.      Just look at what happened when the Obama administration in 2010 offered federal stimulus money to states to begin building a high-speed rail network. Three Republican governors, including Rick Scott of Florida and Scott Walker of Wisconsin, rejected a share of the money citing debt and deficit concerns.      "A [Mitt] Romney victory would dramatically empower Republican governors," Kousser says.      State-By-State View      North Carolina: One-term incumbent Democratic Gov. Beverly Perdue, the first woman to hold the state\'s top office, announced in January that she would not seek re-election after polls showed her with high disapproval ratings and trailing Republican candidate Pat McCrory.      The seat is expected to be won by McCrory, a former Charlotte mayor, who is facing Perdue\'s lieutenant governor, Walter Dalton. McCrory lost a close race to Perdue in 2008, when then-presidential candidate Barack Obama became the first Democrat to win North Carolina in more than three decades. The Real Clear Politics average for the race has McCrory maintaining a 14.3 percentage point lead.      Montana: Popular Democratic Gov. Brian Schweitzer — he won his last election with 65 percent of the vote — has reached his two-term limit. The state\'s Democratic Attorney General Steve Bullock is trying to keep the seat in his party\'s column by associating himself with Schweitzer\'s legacy. He\'s in a tough race with former two-term GOP Rep. Rick Hill.      New Hampshire: Former Democratic state Sen. Maggie Hassan has also promised a continuation of the policies of her predecessor, retiring Democratic Gov. John Lynch. Her opponent is lawyer Ovide Lamontagne, a Tea Party conservative who ran unsuccessfully for governor in 1996 and for the U.S. Senate in 2010. The national parties have invested in the campaigns, which have focused on fiscal and women\'s health care issues.      Washington: The state\'s governorship has been in Democratic hands for 32 years, and former U.S. Rep. Jay Inslee is in a dead-heat battle to keep it that way. His opponent is the state\'s Republican Attorney General Rob McKenna. McKenna has a proven ability to win statewide, but working in Inslee\'s favor are Obama\'s poll numbers. The Real Clear Politics average shows Obama with an average 13.6 percentage point lead over Romney; Inslee is leading McKenna by an average of 1 percentage point.      Pretty Much Sure Things      Republican Govs. Jack Dalrymple in North Dakota and Gary Herbert in Utah, and GOP Rep. Mike Pence in Indiana are expected to win. So are Democratic Govs. Peter Shumlin in Vermont and Jack Markell in Delaware.      Democrats are also hoping to hold on to the governorship in Missouri, where Jay Nixon is running for a second term against Republican Dave Spence; and in West Virginia, where Gov. Earl Ray Tomblin, former state senate president, is running for his first full term after winning a special election in 2011. GOP businessman Bill Maloney is his opponent, as he was last year.      Nixon has been consistently outpolling Spence by an average of about 7 points in Missouri. Tomblin is seen as likely to retain his seat, even in a state where Romney is leading Obama by double digits. ||||| GOP Eyes Gains As Voters In 11 States Pick Governors      Jim Cole / AP i Jim Cole / AP      Voters in 11 states will pick their governors tonight, and Republicans appear on track to increase their numbers by at least one, and with the potential to extend their hold to more than two-thirds of the nation\'s top state offices.      Eight of the gubernatorial seats up for grabs today are now held by Democrats; three are in Republican hands. Republicans currently hold 29 governorships, Democrats have 20; and Rhode Island\'s Gov. Lincoln Chafee is an Independent.      Polls and race analysts suggest that only three of tonight\'s contests are considered competitive, all in states where incumbent Democratic governors aren\'t running again: Montana, New Hampshire and Washington.      While those state races remain too close to call, Republicans are expected to wrest the North Carolina governorship from Democratic control, and to easily win GOP-held seats in Utah, North Dakota and Indiana.      Democrats are likely hold on to their seats in West Virginia and Missouri; and expected to notch safe wins in races for seats they hold in Vermont and Delaware.      Holding Sway On Health Care      While the occupant of the governor\'s office is historically far less important than the party that controls the state legislature, top state officials in coming years are expected to wield significant influence in at least one major area.      And that\'s health care, says political scientist Thad Kousser, co-author of The Power of American Governors.      "No matter who wins the presidency, national politics is going to be stalemated on the Affordable Care Act," says Kousser, of the University of California-Berkeley.      A recent U.S. Supreme Court decision giving states the ability to opt out of the law\'s expansion of Medicaid, the federal insurance program for poor, disabled and elderly Americans, confers "incredible power" on the states and their governors, Kousser says.      Just look at what happened when the Obama administration in 2010 offered federal stimulus money to states to begin building a high-speed rail network. Three Republican governors, including Rick Scott of Florida and Scott Walker of Wisconsin, rejected a share of the money citing debt and deficit concerns.      "A [Mitt] Romney victory would dramatically empower Republican governors," Kousser says.      State-by-State View      North Carolina: One-term incumbent Democratic Gov. Beverly Perdue, the first woman to hold the state\'s top office, announced in January she would not seek re-election after polls showed her with high disapproval ratings and trailing Republican candidate Pat McCrory.      The seat is expected to be won by McCrory, a former Charlotte mayor, who is facing Perdue\'s lieutenant governor, Walter Dalton. McCrory lost a close race to Perdue in 2008, when then-presidential candidate Barack Obama became the first Democrat to win North Carolina in more than three decades. The Real Clear Politics average for the race has McCrory maintaining a 14.3 percentage point lead.      Montana: Popular Democratic Gov. Brian Schweitzer — he won his last election with 65 percent of the vote — has reached his two-term limit. The state\'s Democratic Attorney General Steve Bullock is trying to keep the seat in his party\'s column by associating himself with Schweitzer\'s legacy. He\'s in a tough race with former two-term GOP Rep. Rick Hill.      New Hampshire: Former Democratic state Sen. Maggie Hassan has also promised a continuation of the policies of her predecessor, retiring Democratic Gov. John Lynch. Her opponent is lawyer Ovide Lamontagne, a Tea Party conservative who ran unsuccessfully for governor in 1996 and for the U.S. Senate in 2010. The national parties have invested in the campaigns, which have focused on fiscal and women\'s health care issues.      Washington: The state\'s governorship has been in Democratic hands for 32 years, and former Rep. Jay Inslee is in a dead-heat battle to keep it that way. His opponent is the state\'s Republican Attorney General Rob McKenna. McKenna has a proven ability to win statewide, but working in Inslee\'s favor are Obama\'s poll numbers. The Real Clear Politics average shows Obama with an average 13.6 point lead over Romney; Inslee\'s leading McKenna by an average of 1 percentage point.      Pretty Much Sure Things      Republican governors Jack Dalrymple in North Dakota and Gary Herbert in Utah, and GOP Rep. Mike Pence in Indiana are expected to win. So are Democratic governors Peter Shumlin in Vermont and Jack Markell in Delaware.      Democrats are also hoping to hold on to the governorship in Missouri, where Jay Nixon is running for a second term against Republican Dave Spence; and in West Virginia, where Gov. Earl Ray Tomblin, former state senate president, is running for his first full term after willing a special election in 2011. GOP businessman Bill Maloney is his opponent, as he was last year.      Nixon has been consistently out-polling Spence by an average of about 7 points in Missouri. Tomblin is seen as likely to retain his seat, even in a state where Romney is leading Obama by double digits. |||||'
ground_truth = '– It\'s a race for the governor\'s mansion in 11 states today, and the GOP could end the night at the helm of more than two-thirds of the 50 states. The GOP currently controls 29 of the country\'s top state offices; it\'s expected to keep the three Republican ones that are up for grabs (Utah, North Dakota, and Indiana), and wrest North Carolina from the Dems. That brings its toll to 30, with the potential to take three more, reports NPR. Races in Montana, New Hampshire, and Washington are still too close to call, and in all three, Democrat incumbents aren\'t seeking reelection. The results could have a big impact on health care, since a Supreme Court ruling grants states the ability to opt out of ObamaCare\'s Medicaid expansion. "A Romney victory would dramatically empower Republican governors," said one analyst. Click for NPR\'s state-by-state breakdown of what could happen.'
print(textwrap.fill(ground_truth, 100))

In [None]:
# Load ROUGE Scorer
scorer = rouge_scorer.RougeScorer(rouge_types=['rouge1', 'rouge2', 'rougeL', 'rougeLsum'], use_stemmer=False)

#### Load BART Model

In [None]:
!ls ../models/summarisation_bart/training/checkpoints/

In [None]:
# Define PyTorch Lightning model
bart_model = LmForSummarisation.load_from_checkpoint('../models/summarisation_bart/training/checkpoints/check-epoch=00-validation_loss=.ckpt')

In [None]:
bart_summary = bart_model.summarise_example(document)
print(textwrap.fill(bart_summary[0], 100))

In [None]:
# ROUGE score
score = scorer.score(ground_truth, bart_summary[0])

print('ROUGE-1: ', score['rouge1'].fmeasure*100)
print('ROUGE-2: ', score['rouge2'].fmeasure*100)
print('ROUGE-L: ', score['rougeL'].fmeasure*100)

#### Load LED Model

In [None]:
!ls ../models/summarisation_led/training/checkpoints/

In [None]:
# Define PyTorch Lightning model
led_model = LmForSummarisation.load_from_checkpoint('../models/summarisation_led/training/checkpoints/check-epoch=00-validation_loss=.ckpt')

In [None]:
led_summary = led_model.summarise_example(document)
print(textwrap.fill(led_summary[0], 100))

In [None]:
# ROUGE score
score = scorer.score(ground_truth, led_summary[0])

print('ROUGE-1: ', score['rouge1'].fmeasure*100)
print('ROUGE-2: ', score['rouge2'].fmeasure*100)
print('ROUGE-L: ', score['rougeL'].fmeasure*100)

#### Well done!

You have reached the end of the summarisation notebook. Now feel free to change and play with it as much as you like
(hyperparameters, datasets, models...). Have fun training your own summarisation models!