Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

## Extractive Summarization on CNN/DM Dataset using Transformer Version of BertSum


### Summary

This notebook demonstrates how to fine tune Transformers for extractive text summarization. Utility functions and classes in the NLP Best Practices repo are used to facilitate data preprocessing, model training, model scoring, result postprocessing, and model evaluation.

BertSum refers to  [Fine-tune BERT for Extractive Summarization](https://arxiv.org/pdf/1903.10318.pdf) with [published example](https://github.com/nlpyang/BertSum/). And the Transformer version of Bertsum refers to our modification of BertSum and the source code can be accessed at (https://github.com/daden-ms/BertSum/). 

Extractive summarization are usually used in document summarization where each input document consists of mutiple sentences. The preprocessing of the input training data involves assigning label 0 or 1 to the document sentences based on the give summary. The summarization problem is also simplfied to classifying whether a document sentence should be included in the summary. 

The figure below illustrates how BERTSum can be fine tuned for extractive summarization task. [CLS] token is inserted at the beginning of each sentence, so is [SEP] token at the end. Interval segment embedding and positional embedding are added upon the token embedding as the input of the BERT model. The [CLS] token representation is used as sentence embedding and only the [CLS] tokens are used as the input for the summarization model. The summarization layer predicts the probability for each  sentence being included in the summary. Techniques like trigram blocking can be used to improve model accuarcy.   

<img src="https://nlpbp.blob.core.windows.net/images/BertSum.PNG">


### Before You Start

The running time shown in this notebook is on a Standard_NC24s_v3 Azure Ubuntu Virtual Machine with 4 NVIDIA Tesla V100 GPUs. 
> **Tip**: If you want to run through the notebook quickly, you can set the **`QUICK_RUN`** flag in the cell below to **`True`** to run the notebook on a small subset of the data and a smaller number of epochs. 

Using only 1 NVIDIA Tesla V100 GPUs, 16GB GPU memory configuration,
- for data preprocessing, it takes around 1 minutes to preprocess the data for quick run. Otherwise it takes ~2 hours to finish the data preprocessing. This time estimation assumes that the chosen transformer model is "distilbert-base-uncased" and the sentence selection method is "greedy", which is the default. The preprocessing time can be significantly longer if the sentence selection method is "combination", which can achieve better model performance.

- for model fine tuning, it takes around 10 minutes for quick run. Otherwise, it takes around ~3 hours to finish. This estimation assumes the chosen encoder method is "transformer". The model fine tuning time can be shorter if other encoder method is chosen, which may result in worse model performance. 


In [1]:
## Set QUICK_RUN = True to run the notebook on a small subset of data and a smaller number of epochs.
QUICK_RUN = True
## Set USE_PREPROCSSED_DATA = True to skip the data preprocessing
USE_PREPROCSSED_DATA = True

### Configuration


In [2]:
%load_ext autoreload

In [3]:
%autoreload 2

In [4]:
import os
import shutil
import sys
from tempfile import TemporaryDirectory
import torch

nlp_path = os.path.abspath("../../")
if nlp_path not in sys.path:
    sys.path.insert(0, nlp_path)

from utils_nlp.dataset.cnndm import CNNDMBertSumProcessedData, CNNDMSummarizationDataset
from utils_nlp.eval.evaluate_summarization import get_rouge
from utils_nlp.models.transformers.extractive_summarization import (
    ExtractiveSummarizer,
    ExtSumProcessedData,
    ExtSumProcessor,
)

import pandas as pd
import scrapbook as sb

[nltk_data] Downloading package punkt to /home/daden/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
I0116 02:46:45.948552 140512908457792 file_utils.py:35] PyTorch version 1.3.0 available.



### Configuration: choose the transformer model to be used

Several pretrained models have been made available by [Hugging Face](https://github.com/huggingface/transformers). For extractive summarization, the following pretrained models are supported. 

In [5]:
pd.DataFrame({"model_name": ExtractiveSummarizer.list_supported_models()})

Unnamed: 0,model_name
0,bert-base-uncased
1,distilbert-base-uncased


In [6]:
# Transformer model being used
MODEL_NAME = "distilbert-base-uncased"

Also, we need to install the dependencies for pyrouge.

# dependencies for ROUGE-1.5.5.pl
Run the following commands in your terminal to install XML parsing C library.

1. sudo apt-get update
1. sudo apt-get install expat
1. sudo apt-get install libexpat-dev -y

Run the following commands in your terminal to install other pre-requistes for using pyrouge.
1. sudo cpan install XML::Parser
1. sudo cpan install XML::Parser::PerlSAX
1. sudo cpan install XML::DOM

Download ROUGE-1.5.5 from https://github.com/andersjo/pyrouge/tree/master/tools/ROUGE-1.5.5.
Run the following command in your terminal.
* pyrouge_set_rouge_path $ABSOLUTE_DIRECTORY_TO_ROUGE-1.5.5.pl

### Data Preprocessing

The dataset we used for this notebook is CNN/DM dataset which contains the documents and accompanying questions from the news articles of CNN and Daily mail. The highlights in each article are used as summary. The dataset consits of ~289K training examples, ~11K valiation examples and ~11K test examples.  You can choose the [Option 1] below preprocess the data or [Option 2] to use the preprocessed version at [BERTSum published example](https://github.com/nlpyang/BertSum/). You don't need to manually download any of these two data sets as the code below will handle downloading. Functions defined specific in [cnndm.py](../../utils_nlp/dataset/cnndm.py) are unique to CNN/DM dataset that's preprocessed by harvardnlp. However, it provides a skeleton of how to preprocessing text into the format that model preprocessor takes: sentence tokenization and work tokenization. 

##### Details of Data Preprocessing

The purpose of preprocessing is to process the input articles to the format that model finetuning needed. Assuming you have (1) all articles and (2) target summaries, each in a file and line-breaker separated, the steps to preprocess the data are:
1. sentence tokenization
2. word tokenization
3. **label** the sentences in the article with 1 meaning the sentence is selected and 0 meaning the sentence is not selected. The algorithms for the sentence selection are "greedy" and "combination" and can be found in [sentence_selection.py](../../utils_nlp/dataset/sentence_selection.py)
3. convert each example to  the desired format for extractive summarization
    - filter the sentences in the example based on the min_src_ntokens argument. If the lefted total sentence number is less than min_nsents, the example is discarded.
    - truncate the sentences in the example if the length is greater than max_src_ntokens
    - truncate the sentences in the example and the labels if the total number of sentences is greater than max_nsents
    - [CLS] and [SEP] are inserted before and after each sentence
    - wordPiece tokenization or Byte Pair Encoding (BPE) subword tokenization
    - truncate the example to 512 tokens
    - convert the tokens into token indices corresponding to the transformer tokenizer's vocabulary.
    - segment ids are generated and added
    - [CLS] token positions are logged
    - [CLS] token labels are truncated if it's greater than 512, which is the maximum input length that can be taken by the transformer model.
    
    
Note that the original BERTSum paper use Stanford CoreNLP for data preprocessing, here we use NLTK for data preprocessing. 

##### [Option 1] Preprocess  data (Please skil this part if you choose to use preprocessed data)
The code in following cell will download the CNN/DM dataset listed at https://github.com/harvardnlp/sent-summary/.

In [11]:
# the data path used to save the downloaded data file
DATA_PATH = TemporaryDirectory().name
# The number of lines at the head of data file used for preprocessing. -1 means all the lines.
TOP_N = 1000
CHUNK_SIZE=200
if not QUICK_RUN:
    TOP_N = -1
    CHUNK_SIZE = 2000

In [19]:
train_dataset, test_dataset = CNNDMSummarizationDataset(top_n=TOP_N, local_cache_path=DATA_PATH)

I0103 05:29:37.485339 140060135520064 utils.py:173] Opening tar file /tmp/tmpjd6tv6g9/cnndm.tar.gz.
I0103 05:29:37.487093 140060135520064 utils.py:181] /tmp/tmpjd6tv6g9/test.txt.src already extracted.
I0103 05:29:37.777695 140060135520064 utils.py:181] /tmp/tmpjd6tv6g9/test.txt.tgt.tagged already extracted.
I0103 05:29:37.804513 140060135520064 utils.py:181] /tmp/tmpjd6tv6g9/train.txt.src already extracted.
I0103 05:29:45.345131 140060135520064 utils.py:181] /tmp/tmpjd6tv6g9/train.txt.tgt.tagged already extracted.
I0103 05:29:45.963999 140060135520064 utils.py:181] /tmp/tmpjd6tv6g9/val.txt.src already extracted.
I0103 05:29:46.300785 140060135520064 utils.py:181] /tmp/tmpjd6tv6g9/val.txt.tgt.tagged already extracted.


Preprocess the data and save the data to disk.

In [20]:
processor = ExtSumProcessor(model_name=MODEL_NAME)
ext_sum_train = processor.preprocess(train_dataset, train_dataset.get_target(), oracle_mode="greedy")
ext_sum_test = processor.preprocess(test_dataset, test_dataset.get_target(),oracle_mode="greedy")

I0103 05:29:49.643863 140060135520064 tokenization_utils.py:379] loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at ./26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084


In [21]:
save_path = os.path.join(DATA_PATH, "processed")
train_files = ExtSumProcessedData.save_data(
    ext_sum_train, is_test=False, save_path=save_path, chunk_size=CHUNK_SIZE
)
test_files = ExtSumProcessedData.save_data(
    ext_sum_test, is_test=True, save_path=save_path, chunk_size=CHUNK_SIZE
)

In [22]:
train_files

['/tmp/tmpjd6tv6g9/processed/0_train',
 '/tmp/tmpjd6tv6g9/processed/1_train',
 '/tmp/tmpjd6tv6g9/processed/2_train',
 '/tmp/tmpjd6tv6g9/processed/3_train',
 '/tmp/tmpjd6tv6g9/processed/4_train']

In [23]:
test_files

['/tmp/tmpjd6tv6g9/processed/0_test',
 '/tmp/tmpjd6tv6g9/processed/1_test',
 '/tmp/tmpjd6tv6g9/processed/2_test',
 '/tmp/tmpjd6tv6g9/processed/3_test',
 '/tmp/tmpjd6tv6g9/processed/4_test']

In [24]:
train_dataset, test_dataset = ExtSumProcessedData().splits(root=save_path)

#### Inspect Data

In [25]:
import torch
bert_format_data = torch.load(train_files[0])
print(len(bert_format_data))
bert_format_data[0].keys()

200


dict_keys(['src', 'labels', 'segs', 'clss', 'src_txt', 'tgt_txt'])

In [26]:
bert_format_data[0]['labels']

[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

##### [Option 2] Reuse Preprocessed  data from [BERTSUM Repo](https://github.com/nlpyang/BertSum)

In [8]:
# the data path used to downloaded the preprocessed data from BERTSUM Repo.
# if you have downloaded the dataset, change the code to use that path where the dataset is.
PROCESSED_DATA_PATH = TemporaryDirectory().name
data_path = "./temp_data5/"
PROCESSED_DATA_PATH = data_path

In [9]:
if USE_PREPROCSSED_DATA:
    CNNDMBertSumProcessedData.download(local_path=PROCESSED_DATA_PATH)
    train_dataset, test_dataset = ExtSumProcessedData().splits(root=PROCESSED_DATA_PATH)
    

### Model training
To start model training, we need to create a instance of ExtractiveSummarizer.
#### Choose the transformer model.
Currently ExtractiveSummarizer support two models:
- distilbert-base-uncase, 
- bert-base-uncase

Potentionally, roberta-based model and xlnet can be supported but needs to be tested.
#### Choose the encoder algorithm.
There are four options:
- baseline: it used a smaller transformer model to replace the bert model and with transformer summarization layer
- classifier: it uses pretrained BERT and fine-tune BERT with **simple logistic classification** summarization layer
- transformer: it uses pretrained BERT and fine-tune BERT with **transformer** summarization layer
- RNN: it uses pretrained BERT and fine-tune BERT with **LSTM** summarization layer

In [10]:
# notebook parameters
# the cache data path during find tuning
CACHE_DIR = TemporaryDirectory().name

# batch size, unit is the number of tokens
BATCH_SIZE = 3000

# GPU used for training
NUM_GPUS = 2

# Encoder name. Options are: 1. baseline, classifier, transformer, rnn.
ENCODER = "transformer"

# Learning rate
LEARNING_RATE=2e-3

# How often the statistics reports show up in training, unit is step.
REPORT_EVERY=100

# total number of steps for training
MAX_STEPS=1e3
# number of steps for warm up
WARMUP_STEPS=5e2
    
if not QUICK_RUN:
    MAX_STEPS=5e4
    WARMUP_STEPS=5e3
 

In [11]:
summarizer = ExtractiveSummarizer(MODEL_NAME, ENCODER, CACHE_DIR)

I0116 02:49:16.893831 140512908457792 file_utils.py:362] https://s3.amazonaws.com/models.huggingface.co/bert/distilbert-base-uncased-config.json not found in cache or force_download set to True, downloading to /tmp/tmpellptok2
I0116 02:49:17.050156 140512908457792 file_utils.py:377] copying /tmp/tmpellptok2 to cache at /tmp/tmpuzp1saxl/a41e817d5c0743e29e86ff85edc8c257e61bc8d88e4271bb1b243b6e7614c633.1ccd1a11c9ff276830e114ea477ea2407100f4a3be7bdc45d37be9e37fa71c7e
I0116 02:49:17.051342 140512908457792 file_utils.py:381] creating metadata file for /tmp/tmpuzp1saxl/a41e817d5c0743e29e86ff85edc8c257e61bc8d88e4271bb1b243b6e7614c633.1ccd1a11c9ff276830e114ea477ea2407100f4a3be7bdc45d37be9e37fa71c7e
I0116 02:49:17.052677 140512908457792 file_utils.py:390] removing temp file /tmp/tmpellptok2
I0116 02:49:17.053515 140512908457792 configuration_utils.py:185] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/distilbert-base-uncased-config.json from cache at /tmp/tmpuzp1s

In [32]:
summarizer.fit(
            train_dataset,
            num_gpus=NUM_GPUS,
            batch_size=BATCH_SIZE,
            gradient_accumulation_steps=2,
            max_steps=MAX_STEPS,
            learning_rate=LEARNING_RATE,
            warmup_steps=WARMUP_STEPS,
            verbose=True,
            report_every=REPORT_EVERY,
            clip_grad_norm=False,
        )



loss: 10.797444, time: 48.076998, number of examples in current step: 5, step 100 out of total 1000
loss: 10.033221, time: 36.946141, number of examples in current step: 5, step 200 out of total 1000
loss: 9.813506, time: 37.003569, number of examples in current step: 5, step 300 out of total 1000
loss: 9.743949, time: 36.684293, number of examples in current step: 5, step 400 out of total 1000
loss: 9.624907, time: 36.727618, number of examples in current step: 5, step 500 out of total 1000
loss: 9.359334, time: 36.721974, number of examples in current step: 5, step 600 out of total 1000
loss: 8.998051, time: 36.738466, number of examples in current step: 6, step 700 out of total 1000
loss: 8.392073, time: 36.622983, number of examples in current step: 5, step 800 out of total 1000
loss: 7.814545, time: 36.219987, number of examples in current step: 5, step 900 out of total 1000
loss: 6.793788, time: 36.647171, number of examples in current step: 5, step 1000 out of total 1000


In [33]:
summarizer.save_model("extsum_modelname_{0}_usepreprocess{1}_steps_{2}.pt".format(MODEL_NAME, USE_PREPROCSSED_DATA, MAX_STEPS))

I0103 05:38:19.590131 140060135520064 extractive_summarization.py:729] Saving model checkpoint to /tmp/tmp_b2wqaou/fine_tuned/extsum_modelname_distilbert-base-uncased_usepreprocessFalse_steps_1000.0.pt


In [12]:
# for loading a previous saved model
import torch
#summarizer.model = torch.load("/tmp/tmp2wg41gb5/fine_tuned/dis_sum_model.pt") #"cnndm_transformersum_distilbert-base-uncased_bertsum_processed_data.pt")
summarizer.model = torch.load("./tmp2wg41gb5/fine_tuned/dis_sum_model.pt")

### Model Evaluation

[ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)), or Recall-Oriented Understudy for Gisting Evaluation has been commonly used for evaluating text summarization.

In [13]:
target = [i['tgt_txt'] for i in test_dataset]

In [14]:
len(target)

11489

In [15]:
test_dataset[0].keys()

dict_keys(['src', 'labels', 'segs', 'clss', 'src_txt', 'tgt_txt'])

In [16]:
%%time
prediction = summarizer.predict(test_dataset, num_gpus=NUM_GPUS, batch_size=128)

Evaluating: 100%|██████████| 90/90 [00:41<00:00,  2.66it/s]


CPU times: user 4min 42s, sys: 3min 2s, total: 7min 45s
Wall time: 1min 8s


In [17]:
len(prediction)

11489

In [18]:
RESULT_DIR = TemporaryDirectory().name

In [19]:
rouge_score = get_rouge(prediction, target, RESULT_DIR)

11489
11489


2020-01-16 02:52:07,209 [MainThread  ] [INFO ]  Writing summaries.
I0116 02:52:07.209927 140512908457792 pyrouge.py:525] Writing summaries.
2020-01-16 02:52:07,211 [MainThread  ] [INFO ]  Processing summaries. Saving system files to /tmp/tmp6omxc5sj/tmpv6bdr_c8/system and model files to /tmp/tmp6omxc5sj/tmpv6bdr_c8/model.
I0116 02:52:07.211637 140512908457792 pyrouge.py:518] Processing summaries. Saving system files to /tmp/tmp6omxc5sj/tmpv6bdr_c8/system and model files to /tmp/tmp6omxc5sj/tmpv6bdr_c8/model.
2020-01-16 02:52:07,212 [MainThread  ] [INFO ]  Processing files in /tmp/tmp6omxc5sj/rouge-tmp-2020-01-16-02-52-06/candidate/.
I0116 02:52:07.212683 140512908457792 pyrouge.py:43] Processing files in /tmp/tmp6omxc5sj/rouge-tmp-2020-01-16-02-52-06/candidate/.
2020-01-16 02:52:08,401 [MainThread  ] [INFO ]  Saved processed files to /tmp/tmp6omxc5sj/tmpv6bdr_c8/system.
I0116 02:52:08.401605 140512908457792 pyrouge.py:53] Saved processed files to /tmp/tmp6omxc5sj/tmpv6bdr_c8/system.
20

---------------------------------------------
1 ROUGE-1 Average_R: 0.43005 (95%-conf.int. 0.42776 - 0.43242)
1 ROUGE-1 Average_P: 0.32518 (95%-conf.int. 0.32306 - 0.32755)
1 ROUGE-1 Average_F: 0.35676 (95%-conf.int. 0.35493 - 0.35871)
---------------------------------------------
1 ROUGE-2 Average_R: 0.15683 (95%-conf.int. 0.15477 - 0.15889)
1 ROUGE-2 Average_P: 0.11916 (95%-conf.int. 0.11742 - 0.12089)
1 ROUGE-2 Average_F: 0.13038 (95%-conf.int. 0.12864 - 0.13213)
---------------------------------------------
1 ROUGE-L Average_R: 0.38632 (95%-conf.int. 0.38403 - 0.38857)
1 ROUGE-L Average_P: 0.29290 (95%-conf.int. 0.29087 - 0.29508)
1 ROUGE-L Average_F: 0.32097 (95%-conf.int. 0.31919 - 0.32291)



In [34]:
rouge_score = get_rouge(prediction, target, RESULT_DIR)

11489
11489


2020-01-14 21:54:32,107 [MainThread  ] [INFO ]  Writing summaries.
I0114 21:54:32.107936 140559393421120 pyrouge.py:525] Writing summaries.
2020-01-14 21:54:32,109 [MainThread  ] [INFO ]  Processing summaries. Saving system files to /tmp/tmp606wcgbu/tmp47v2f3j3/system and model files to /tmp/tmp606wcgbu/tmp47v2f3j3/model.
I0114 21:54:32.109679 140559393421120 pyrouge.py:518] Processing summaries. Saving system files to /tmp/tmp606wcgbu/tmp47v2f3j3/system and model files to /tmp/tmp606wcgbu/tmp47v2f3j3/model.
2020-01-14 21:54:32,110 [MainThread  ] [INFO ]  Processing files in /tmp/tmp606wcgbu/rouge-tmp-2020-01-14-21-54-29/candidate/.
I0114 21:54:32.110743 140559393421120 pyrouge.py:43] Processing files in /tmp/tmp606wcgbu/rouge-tmp-2020-01-14-21-54-29/candidate/.
2020-01-14 21:54:33,255 [MainThread  ] [INFO ]  Saved processed files to /tmp/tmp606wcgbu/tmp47v2f3j3/system.
I0114 21:54:33.255218 140559393421120 pyrouge.py:53] Saved processed files to /tmp/tmp606wcgbu/tmp47v2f3j3/system.
20

---------------------------------------------
1 ROUGE-1 Average_R: 0.52314 (95%-conf.int. 0.52026 - 0.52623)
1 ROUGE-1 Average_P: 0.34694 (95%-conf.int. 0.34464 - 0.34935)
1 ROUGE-1 Average_F: 0.40336 (95%-conf.int. 0.40121 - 0.40552)
---------------------------------------------
1 ROUGE-2 Average_R: 0.22709 (95%-conf.int. 0.22443 - 0.22983)
1 ROUGE-2 Average_P: 0.14956 (95%-conf.int. 0.14784 - 0.15153)
1 ROUGE-2 Average_F: 0.17423 (95%-conf.int. 0.17230 - 0.17631)
---------------------------------------------
1 ROUGE-L Average_R: 0.47414 (95%-conf.int. 0.47143 - 0.47701)
1 ROUGE-L Average_P: 0.31482 (95%-conf.int. 0.31261 - 0.31694)
1 ROUGE-L Average_F: 0.36584 (95%-conf.int. 0.36375 - 0.36793)



In [31]:
rouge_score = get_rouge(prediction, target, RESULT_DIR)

11489
11489


2020-01-14 21:31:38,192 [MainThread  ] [INFO ]  Writing summaries.
I0114 21:31:38.192723 140559393421120 pyrouge.py:525] Writing summaries.
2020-01-14 21:31:38,194 [MainThread  ] [INFO ]  Processing summaries. Saving system files to /tmp/tmp606wcgbu/tmpt8hpdt2b/system and model files to /tmp/tmp606wcgbu/tmpt8hpdt2b/model.
I0114 21:31:38.194494 140559393421120 pyrouge.py:518] Processing summaries. Saving system files to /tmp/tmp606wcgbu/tmpt8hpdt2b/system and model files to /tmp/tmp606wcgbu/tmpt8hpdt2b/model.
2020-01-14 21:31:38,195 [MainThread  ] [INFO ]  Processing files in /tmp/tmp606wcgbu/rouge-tmp-2020-01-14-21-31-37/candidate/.
I0114 21:31:38.195538 140559393421120 pyrouge.py:43] Processing files in /tmp/tmp606wcgbu/rouge-tmp-2020-01-14-21-31-37/candidate/.
2020-01-14 21:31:39,295 [MainThread  ] [INFO ]  Saved processed files to /tmp/tmp606wcgbu/tmpt8hpdt2b/system.
I0114 21:31:39.295857 140559393421120 pyrouge.py:53] Saved processed files to /tmp/tmp606wcgbu/tmpt8hpdt2b/system.
20

---------------------------------------------
1 ROUGE-1 Average_R: 0.41913 (95%-conf.int. 0.41589 - 0.42222)
1 ROUGE-1 Average_P: 0.29780 (95%-conf.int. 0.29547 - 0.30013)
1 ROUGE-1 Average_F: 0.33357 (95%-conf.int. 0.33129 - 0.33590)
---------------------------------------------
1 ROUGE-2 Average_R: 0.15268 (95%-conf.int. 0.15024 - 0.15515)
1 ROUGE-2 Average_P: 0.10379 (95%-conf.int. 0.10201 - 0.10557)
1 ROUGE-2 Average_F: 0.11848 (95%-conf.int. 0.11663 - 0.12035)
---------------------------------------------
1 ROUGE-L Average_R: 0.37687 (95%-conf.int. 0.37383 - 0.37983)
1 ROUGE-L Average_P: 0.26805 (95%-conf.int. 0.26590 - 0.27017)
1 ROUGE-L Average_F: 0.30009 (95%-conf.int. 0.29799 - 0.30222)



In [24]:
rouge_score = get_rouge(prediction, target, RESULT_DIR)

11489
11489


2020-01-14 20:32:33,669 [MainThread  ] [INFO ]  Writing summaries.
I0114 20:32:33.669380 140559393421120 pyrouge.py:525] Writing summaries.
2020-01-14 20:32:33,671 [MainThread  ] [INFO ]  Processing summaries. Saving system files to /tmp/tmp606wcgbu/tmp47u21nov/system and model files to /tmp/tmp606wcgbu/tmp47u21nov/model.
I0114 20:32:33.671078 140559393421120 pyrouge.py:518] Processing summaries. Saving system files to /tmp/tmp606wcgbu/tmp47u21nov/system and model files to /tmp/tmp606wcgbu/tmp47u21nov/model.
2020-01-14 20:32:33,672 [MainThread  ] [INFO ]  Processing files in /tmp/tmp606wcgbu/rouge-tmp-2020-01-14-20-32-32/candidate/.
I0114 20:32:33.672239 140559393421120 pyrouge.py:43] Processing files in /tmp/tmp606wcgbu/rouge-tmp-2020-01-14-20-32-32/candidate/.
2020-01-14 20:32:34,765 [MainThread  ] [INFO ]  Saved processed files to /tmp/tmp606wcgbu/tmp47u21nov/system.
I0114 20:32:34.765755 140559393421120 pyrouge.py:53] Saved processed files to /tmp/tmp606wcgbu/tmp47u21nov/system.
20

---------------------------------------------
1 ROUGE-1 Average_R: 0.52314 (95%-conf.int. 0.52026 - 0.52623)
1 ROUGE-1 Average_P: 0.34694 (95%-conf.int. 0.34464 - 0.34935)
1 ROUGE-1 Average_F: 0.40336 (95%-conf.int. 0.40121 - 0.40552)
---------------------------------------------
1 ROUGE-2 Average_R: 0.22709 (95%-conf.int. 0.22443 - 0.22983)
1 ROUGE-2 Average_P: 0.14956 (95%-conf.int. 0.14784 - 0.15153)
1 ROUGE-2 Average_F: 0.17423 (95%-conf.int. 0.17230 - 0.17631)
---------------------------------------------
1 ROUGE-L Average_R: 0.47414 (95%-conf.int. 0.47143 - 0.47701)
1 ROUGE-L Average_P: 0.31482 (95%-conf.int. 0.31261 - 0.31694)
1 ROUGE-L Average_F: 0.36584 (95%-conf.int. 0.36375 - 0.36793)



In [25]:
test_dataset[0]['tgt_txt']

"turkish court imposed blocks as images of siege shared on social media<q>images ` deeply upset ' wife and children of hostage mehmet selim kiraz<q>prosecutor , 46 , died in hospital after hostages stormed a courthouse<q>two of his captors were killed when security forces took back the building"

In [26]:
prediction[0]

"turkey has blocked access to twitter and youtube after they refused a request to remove pictures of a prosecutor held during an armed siege last week .<q>a turkish court imposed the blocks because images of the deadly siege were being shared on social media and ` deeply upset ' the wife and children of mehmet selim kiraz , the hostage who was killed .<q>the 46-year-old turkish prosecutor died in hospital when members of the revolutionary people 's liberation party-front ( dhkp-c ) stormed a courthouse and took him hostage ."

In [27]:
test_dataset[0]['src_txt']

['turkey has blocked access to twitter and youtube after they refused a request to remove pictures of a prosecutor held during an armed siege last week .',
 "a turkish court imposed the blocks because images of the deadly siege were being shared on social media and ` deeply upset ' the wife and children of mehmet selim kiraz , the hostage who was killed .",
 "the 46-year-old turkish prosecutor died in hospital when members of the revolutionary people 's liberation party-front ( dhkp-c ) stormed a courthouse and took him hostage .",
 'the dhkp-c is considered a terrorist group by turkey , the european union and us .',
 'a turkish court has blocked access to twitter and youtube after they refused a request to remove pictures of prosecutor mehmet selim kiraz held during an armed siege last week',
 'grief : the family of mehmet selim kiraz grieve over his coffin during his funeral at eyup sultan mosque in istanbul , turkey .',
 'he died in hospital after he was taken hostage by the far-lef

In [44]:
# for testing
sb.glue("rouge_2_f_score", rouge_score['rouge_2_f_score'])

## Clean up temporary folders

In [None]:
if os.path.exists(DATA_PATH):
    shutil.rmtree(DATA_PATH, ignore_errors=True)
if os.path.exists(PROCESSED_DATA_PATH):
    shutil.rmtree(PROCESSED_DATA_PATH, ignore_errors=True)
if os.path.exists(CACHE_DIR):
    shutil.rmtree(CACHE_DIR, ignore_errors=True)
if os.path.exists(RESULT_DIR):
    shutil.rmtree(RESULT_DIR, ignore_errors=True)