## Extractive Text Summerization on CNN/DM Dataset using BertSum

### Summary

This notebook demonstrates how to fine tune BERT for extractive text summerization. Utility functions and classes in the NLP Best Practices repo are used to facilitate data preprocessing, model training, model scoring, result postprocessing, and model evaluation.

BertSum refers to  [Fine-tune BERT for Extractive Summarization](https://arxiv.org/pdf/1903.10318.pdf) with [published example](https://github.com/nlpyang/BertSum/)



### Configuration

Before we start the notebook, we should set the environment variable to make sure you can access the GPUs on your machine

In [1]:
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"   # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"]="0,1,2,3"

### Data Preprossing

The CNN/DM dataset we used can be downloaded from https://github.com/harvardnlp/sent-summary. The following notebook assumes the dataset has been unzipped to folder ./harvardnl_cnndm

In [2]:
import sys
sys.path.insert(0, '/dadendev/BertSum/src')
sys.path.insert(0, '/dadendev/textsum//wrapper')

In [3]:
from data_preprocessing import harvardnlp_cnndm_preprocess,  bertsum_formatting

[nltk_data] Downloading package punkt to /home/daden/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Functions defined specific in harvardnlp_cnndm_preprocess function are unique to CNN/DM dataset that's processed by harvardnlp. However, it provides a skeleton of how to preprocessing data into the format that BertSum takes. Assuming you have all articles and target summery each in a file, line seperated, the steps to preprocess the data are:
1. sentence tokenization
2. word tokenization
3. format to bertdata
    - use algorithms to label the sentences in the article with 1 meaning the sentence is selected
    2. [CLS] and [SEP] are inserted before and after each sentence
    3. segment ids are inserted
    4. [CLS] token position are logged


In [35]:
QUICK_RUN = True
max_job_number = -1
#if QUICK_RUN:
#    max_job_number = 100

In [43]:
train_src_file = "./harvardnlp_cnndm/test.txt.src"
train_tgt_file = "./harvardnlp_cnndm/test.txt.tgt.tagged"
import multiprocessing
n_cpus = multiprocessing.cpu_count() - 1
jobs = harvardnlp_cnndm_preprocess(n_cpus, train_src_file, train_tgt_file)
print("total length of training data:", len(jobs))
from prepro.data_builder import BertData
from bertsum_config import args
output_file = "./harvardnlp_cnndm/test.bertdata"
bertdata = BertData(args)
bertsum_formatting(n_cpus, bertdata,"combination", jobs[0:max_job_number], output_file)

[2019-10-03 13:53:40,310 INFO] loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/daden/.pytorch_pretrained_bert/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084


total length of training data: 11490


In [6]:
import torch
bert_format_data = torch.load("./bert_train_data_all_none_excluded")
print(len(bert_format_data))
bert_format_data[0].keys()

287085


dict_keys(['src', 'labels', 'segs', 'clss', 'src_txt', 'tgt_txt'])

In [7]:
bert_format_data[0]['tgt_txt']

"mentally ill inmates in miami are housed on the `` forgotten floor `` judge steven leifman says most are there as a result of `` avoidable felonies `` while cnn tours facility , patient shouts : `` i am the son of the president `` leifman says the system is unjust and he 's fighting for change ."

In [8]:
bert_format_data[0]['clss']

[0,
 31,
 54,
 80,
 119,
 142,
 180,
 194,
 250,
 278,
 289,
 307,
 337,
 362,
 372,
 399,
 415,
 433,
 457,
 484]

In [9]:
bert_format_data[0]['labels']

[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

In [10]:
bert_format_data[0]['src_txt']

["editor 's note : in our behind the scenes series , cnn correspondents share their experiences in covering news and analyze the stories behind the events .",
 "here , soledad o'brien takes users inside a jail where many of the inmates are mentally ill .",
 'an inmate housed on the `` forgotten floor , `` where many mentally ill inmates are housed in miami before trial .',
 'miami , florida -lrb- cnn -rrb- -- the ninth floor of the miami-dade pretrial detention facility is dubbed the `` forgotten floor . ``',
 "here , inmates with the most severe mental illnesses are incarcerated until they 're ready to appear in court .",
 'most often , they face drug charges or charges of assaulting an officer -- charges that judge steven leifman says are usually `` avoidable felonies . ``',
 'he says the arrests often result from confrontations with police .',
 "mentally ill people often wo n't do what they 're told when police arrive on the scene -- confrontation seems to exacerbate their illness a

### Model training
To start model training, we need to create a instance of BertSumExtractiveSummarizer, a wrapper for running BertSum-based finetuning. You can select any device ID on your machine, but make sure that you include the string version of the device ID in the gpu_ranks argument. Some of the default argument of BertSumExtractiveSummarizer is in bertsum_config file.




In [4]:
from extractive_text_summerization import BertSumExtractiveSummarizer

In [5]:
device_id = 2
gpu_ranks = str(device_id)

In [6]:
model_base_path = './models/'
log_base_path = './logs/'
encoder = 'baseline'
from random import random
random_number = random()

In [7]:
bertsum_model = BertSumExtractiveSummarizer(encoder = 'baseline', 
                                            model_path = model_base_path+encoder+str(random_number),
                                            log_file = log_base_path+encoder+str(random_number),
                                            device_id = device_id,
                                            gpu_ranks = gpu_ranks,)

['2']
{2: 0}


Here we use the fully processed CNN/DM dataset to train the model. During the training, you can stop any time and retrain from the previous saved checkpoint.

In [None]:
training_data_file = './bert_train_data_all_none_excluded'
bertsum_model.fit(device_id, [training_data_file], train_steps=50000, train_from="")

[2019-10-03 05:08:36,132 INFO] Device ID 2
[2019-10-03 05:08:36,136 INFO] loading archive file /dadendev/textsum/temp/bert-base-uncased
[2019-10-03 05:08:36,137 INFO] Model config {
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "max_position_embeddings": 512,
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "type_vocab_size": 2,
  "vocab_size": 30522
}



{'min_nsents': 3, 'max_nsents': 100, 'max_src_ntokens': 200, 'min_src_ntokens': 10, 'oracle_mode': 'combination', 'temp_dir': './temp', 'param_init': 0.0, 'param_init_glorot': True, 'dropout': 0.1, 'optim': 'adam', 'lr': 0.002, 'beta1': 0.9, 'beta2': 0.999, 'decay_method': 'noam', 'max_grad_norm': 0, 'use_interval': True, 'accum_count': 2, 'report_every': 50, 'save_checkpoint_steps': 500, 'batch_size': 3000, 'warmup_steps': 10000, 'block_trigram': True, 'recall_eval': False, 'report_rouge': True, 'encoder': 'baseline', 'hidden_size': 128, 'ff_size': 512, 'heads': 4, 'inter_layers': 2, 'rnn_size': 512, 'world_size': 1, 'visible_gpus': '0', 'gpu_ranks': '2', 'seed': 42, 'test_all': False, 'train_from': '', 'test_from': '', 'mode': 'train', 'model_path': './models/baseline0.7355433644584792', 'log_file': './logs/baseline0.7355433644584792', 'bert_config_path': './bert_config_uncased_base.json', 'worls_size': 1, 'gpu_ranks_map': {2: 0}}


[2019-10-03 05:08:38,152 INFO] Summarizer(
  (bert): Bert(
    (model): BertModel(
      (embeddings): BertEmbeddings(
        (word_embeddings): Embedding(30522, 128, padding_idx=0)
        (position_embeddings): Embedding(512, 128)
        (token_type_embeddings): Embedding(2, 128)
        (LayerNorm): BertLayerNorm()
        (dropout): Dropout(p=0.1)
      )
      (encoder): BertEncoder(
        (layer): ModuleList(
          (0): BertLayer(
            (attention): BertAttention(
              (self): BertSelfAttention(
                (query): Linear(in_features=128, out_features=128, bias=True)
                (key): Linear(in_features=128, out_features=128, bias=True)
                (value): Linear(in_features=128, out_features=128, bias=True)
                (dropout): Dropout(p=0.1)
              )
              (output): BertSelfOutput(
                (dense): Linear(in_features=128, out_features=128, bias=True)
                (LayerNorm): BertLayerNorm()
                (

device_id 2
gpu_rank 0


[2019-10-03 05:09:21,414 INFO] Step 50/50000; xent: 4.22; lr: 0.0000001; 243 docs/s;      4 sec
[2019-10-03 05:09:25,293 INFO] Step 100/50000; xent: 4.15; lr: 0.0000002; 257 docs/s;      8 sec
[2019-10-03 05:09:29,123 INFO] Step 150/50000; xent: 4.15; lr: 0.0000003; 262 docs/s;     12 sec
[2019-10-03 05:09:32,985 INFO] Step 200/50000; xent: 3.98; lr: 0.0000004; 262 docs/s;     16 sec
[2019-10-03 05:09:36,827 INFO] Step 250/50000; xent: 3.93; lr: 0.0000005; 261 docs/s;     20 sec
[2019-10-03 05:09:40,678 INFO] Step 300/50000; xent: 3.90; lr: 0.0000006; 260 docs/s;     23 sec
[2019-10-03 05:09:44,546 INFO] Step 350/50000; xent: 3.86; lr: 0.0000007; 260 docs/s;     27 sec
[2019-10-03 05:09:48,368 INFO] Step 400/50000; xent: 3.78; lr: 0.0000008; 264 docs/s;     31 sec
[2019-10-03 05:09:52,208 INFO] Step 450/50000; xent: 3.81; lr: 0.0000009; 263 docs/s;     35 sec
[2019-10-03 05:09:56,065 INFO] Step 500/50000; xent: 3.78; lr: 0.0000010; 260 docs/s;     39 sec
[2019-10-03 05:09:56,068 INFO] 

[2019-10-03 05:14:19,662 INFO] Step 3900/50000; xent: 3.35; lr: 0.0000078; 258 docs/s;    302 sec
[2019-10-03 05:14:23,526 INFO] Step 3950/50000; xent: 3.45; lr: 0.0000079; 261 docs/s;    306 sec
[2019-10-03 05:14:27,364 INFO] Step 4000/50000; xent: 3.43; lr: 0.0000080; 260 docs/s;    310 sec
[2019-10-03 05:14:27,367 INFO] Saving checkpoint ./models/baseline0.7355433644584792/model_step_4000.pt
[2019-10-03 05:14:31,280 INFO] Step 4050/50000; xent: 3.49; lr: 0.0000081; 262 docs/s;    314 sec
[2019-10-03 05:14:35,131 INFO] Step 4100/50000; xent: 3.39; lr: 0.0000082; 260 docs/s;    318 sec
[2019-10-03 05:14:39,008 INFO] Step 4150/50000; xent: 3.45; lr: 0.0000083; 260 docs/s;    322 sec
[2019-10-03 05:14:42,872 INFO] Step 4200/50000; xent: 3.45; lr: 0.0000084; 260 docs/s;    326 sec
[2019-10-03 05:14:46,737 INFO] Step 4250/50000; xent: 3.35; lr: 0.0000085; 261 docs/s;    329 sec
[2019-10-03 05:14:50,588 INFO] Step 4300/50000; xent: 3.44; lr: 0.0000086; 258 docs/s;    333 sec
[2019-10-03 05

### Model Evaluation

[ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)), or Recall-Oriented Understudy for Gisting Evaluation has been commonly used for evaluation text summerization.

In [8]:
import torch
from models.data_loader  import DataIterator,Batch,Dataloader
import os
test_dataset=torch.load("./harvardnlp_cnndm/test.bertdata")
from bertsum_config import Bunch

import os
dataset=[]
for i in range(0,6):
    filename = "cnndm.test.{0}.bert.pt".format(i)
    dataset.extend(torch.load(os.path.join("./"+"bert_data/", filename)))
    
def get_data_iter(dataset,batch_size=300):
    args = Bunch({})
    args.use_interval = True
    args.batch_size = batch_size
    test_data_iter = None
    test_data_iter  = DataIterator(args, dataset, args.batch_size, 'cuda', is_test=True, shuffle=False, sort=False)
    return test_data_iter

In [74]:
model_for_test = "./models/baseline0.7355433644584792/model_step_50000.pt"
target = [test_dataset[i]['tgt_txt'] for i in range(len(test_dataset))]
prediction = bertsum_model.predict(device_id, get_data_iter(test_dataset),
                                   test_from=model_for_test)
from utils import get_rouge
#rouge_baseline = get_rouge(prediction, target, "/dadendev/textsum/results/rougetemp")

[2019-10-03 18:10:19,387 INFO] Device ID 2
[2019-10-03 18:10:19,389 INFO] Loading checkpoint from ./models/baseline0.7355433644584792/model_step_50000.pt
[2019-10-03 18:10:20,964 INFO] * number of parameters: 5179137


device_id 2
gpu_rank 0


ValueError: max() arg is an empty sequence

In [72]:
len(prediction)

11486

In [54]:
model_for_test = "./models/rnn/model_step_50000.pt"
target = [test_dataset[i]['tgt_txt'] for i in range(len(test_dataset))]
prediction = bertsum_model.predict(device_id, get_data_iter(test_dataset),
                                   test_from=model_for_test)
from utils import get_rouge
rouge_rnn = get_rouge(prediction, target, "/dadendev/textsum/results/rougetemp")

[2019-10-03 17:46:35,071 INFO] Device ID 2
[2019-10-03 17:46:35,082 INFO] Loading checkpoint from ./models/rnn/model_step_50000.pt
[2019-10-03 17:46:37,559 INFO] * number of parameters: 113041921


device_id 2
gpu_rank 0
11486
11486


2019-10-03 17:48:59,222 [MainThread  ] [INFO ]  Writing summaries.
[2019-10-03 17:48:59,222 INFO] Writing summaries.
2019-10-03 17:48:59,224 [MainThread  ] [INFO ]  Processing summaries. Saving system files to /dadendev/textsum/results/rougetemp/tmp9w889acx/system and model files to /dadendev/textsum/results/rougetemp/tmp9w889acx/model.
[2019-10-03 17:48:59,224 INFO] Processing summaries. Saving system files to /dadendev/textsum/results/rougetemp/tmp9w889acx/system and model files to /dadendev/textsum/results/rougetemp/tmp9w889acx/model.
2019-10-03 17:48:59,225 [MainThread  ] [INFO ]  Processing files in /dadendev/textsum/results/rougetemp/rouge-tmp-2019-10-03-17-48-58/candidate/.
[2019-10-03 17:48:59,225 INFO] Processing files in /dadendev/textsum/results/rougetemp/rouge-tmp-2019-10-03-17-48-58/candidate/.
2019-10-03 17:49:00,406 [MainThread  ] [INFO ]  Saved processed files to /dadendev/textsum/results/rougetemp/tmp9w889acx/system.
[2019-10-03 17:49:00,406 INFO] Saved processed files

---------------------------------------------
1 ROUGE-1 Average_R: 0.53540 (95%-conf.int. 0.53270 - 0.53815)
1 ROUGE-1 Average_P: 0.37945 (95%-conf.int. 0.37709 - 0.38191)
1 ROUGE-1 Average_F: 0.42948 (95%-conf.int. 0.42737 - 0.43167)
---------------------------------------------
1 ROUGE-2 Average_R: 0.24836 (95%-conf.int. 0.24555 - 0.25117)
1 ROUGE-2 Average_P: 0.17692 (95%-conf.int. 0.17482 - 0.17919)
1 ROUGE-2 Average_F: 0.19954 (95%-conf.int. 0.19734 - 0.20177)
---------------------------------------------
1 ROUGE-L Average_R: 0.34366 (95%-conf.int. 0.34118 - 0.34616)
1 ROUGE-L Average_P: 0.24172 (95%-conf.int. 0.23978 - 0.24381)
1 ROUGE-L Average_F: 0.27432 (95%-conf.int. 0.27234 - 0.27626)



In [55]:
model_for_test = "./models/transformer/model_step_50000.pt"
target = [test_dataset[i]['tgt_txt'] for i in range(len(test_dataset))]
prediction = bertsum_model.predict(device_id, get_data_iter(test_dataset),
                                   test_from=model_for_test)
from utils import get_rouge
rouge_transformer = get_rouge(prediction, target, "/dadendev/textsum/results/rougetemp")

[2019-10-03 17:50:56,970 INFO] Device ID 2
[2019-10-03 17:50:56,973 INFO] Loading checkpoint from ./models/transformer/model_step_50000.pt
[2019-10-03 17:50:59,066 INFO] * number of parameters: 115790849


device_id 2
gpu_rank 0
11486
11486


2019-10-03 17:53:15,587 [MainThread  ] [INFO ]  Writing summaries.
[2019-10-03 17:53:15,587 INFO] Writing summaries.
2019-10-03 17:53:15,590 [MainThread  ] [INFO ]  Processing summaries. Saving system files to /dadendev/textsum/results/rougetemp/tmp6gynufns/system and model files to /dadendev/textsum/results/rougetemp/tmp6gynufns/model.
[2019-10-03 17:53:15,590 INFO] Processing summaries. Saving system files to /dadendev/textsum/results/rougetemp/tmp6gynufns/system and model files to /dadendev/textsum/results/rougetemp/tmp6gynufns/model.
2019-10-03 17:53:15,591 [MainThread  ] [INFO ]  Processing files in /dadendev/textsum/results/rougetemp/rouge-tmp-2019-10-03-17-53-14/candidate/.
[2019-10-03 17:53:15,591 INFO] Processing files in /dadendev/textsum/results/rougetemp/rouge-tmp-2019-10-03-17-53-14/candidate/.
2019-10-03 17:53:16,773 [MainThread  ] [INFO ]  Saved processed files to /dadendev/textsum/results/rougetemp/tmp6gynufns/system.
[2019-10-03 17:53:16,773 INFO] Saved processed files

---------------------------------------------
1 ROUGE-1 Average_R: 0.53732 (95%-conf.int. 0.53457 - 0.54013)
1 ROUGE-1 Average_P: 0.37491 (95%-conf.int. 0.37254 - 0.37733)
1 ROUGE-1 Average_F: 0.42705 (95%-conf.int. 0.42484 - 0.42920)
---------------------------------------------
1 ROUGE-2 Average_R: 0.24855 (95%-conf.int. 0.24582 - 0.25123)
1 ROUGE-2 Average_P: 0.17405 (95%-conf.int. 0.17196 - 0.17624)
1 ROUGE-2 Average_F: 0.19768 (95%-conf.int. 0.19553 - 0.19985)
---------------------------------------------
1 ROUGE-L Average_R: 0.34365 (95%-conf.int. 0.34113 - 0.34615)
1 ROUGE-L Average_P: 0.23772 (95%-conf.int. 0.23571 - 0.23975)
1 ROUGE-L Average_F: 0.27163 (95%-conf.int. 0.26963 - 0.27360)



In [16]:
model_for_test = "./models/transformer/model_step_50000.pt"
target = [dataset[i]['tgt_txt'] for i in range(len(dataset))]
prediction = bertsum_model.predict(device_id, get_data_iter(dataset, 3000),sentence_seperator="<q>",
                                   test_from=model_for_test)
from utils import get_rouge
rouge_transformer = get_rouge(prediction, target, "/dadendev/textsum/results/rougetemp")

[2019-10-03 18:41:50,554 INFO] Device ID 2
[2019-10-03 18:41:50,555 INFO] Loading checkpoint from ./models/transformer/model_step_50000.pt
[2019-10-03 18:41:52,881 INFO] * number of parameters: 115790849


device_id 2
gpu_rank 0
11489
11489


2019-10-03 18:44:25,911 [MainThread  ] [INFO ]  Writing summaries.
[2019-10-03 18:44:25,911 INFO] Writing summaries.
2019-10-03 18:44:25,919 [MainThread  ] [INFO ]  Processing summaries. Saving system files to /dadendev/textsum/results/rougetemp/tmpjsn2arpn/system and model files to /dadendev/textsum/results/rougetemp/tmpjsn2arpn/model.
[2019-10-03 18:44:25,919 INFO] Processing summaries. Saving system files to /dadendev/textsum/results/rougetemp/tmpjsn2arpn/system and model files to /dadendev/textsum/results/rougetemp/tmpjsn2arpn/model.
2019-10-03 18:44:25,920 [MainThread  ] [INFO ]  Processing files in /dadendev/textsum/results/rougetemp/rouge-tmp-2019-10-03-18-44-24/candidate/.
[2019-10-03 18:44:25,920 INFO] Processing files in /dadendev/textsum/results/rougetemp/rouge-tmp-2019-10-03-18-44-24/candidate/.
2019-10-03 18:44:27,124 [MainThread  ] [INFO ]  Saved processed files to /dadendev/textsum/results/rougetemp/tmpjsn2arpn/system.
[2019-10-03 18:44:27,124 INFO] Saved processed files

---------------------------------------------
1 ROUGE-1 Average_R: 0.53487 (95%-conf.int. 0.53211 - 0.53764)
1 ROUGE-1 Average_P: 0.38059 (95%-conf.int. 0.37813 - 0.38318)
1 ROUGE-1 Average_F: 0.42995 (95%-conf.int. 0.42787 - 0.43226)
---------------------------------------------
1 ROUGE-2 Average_R: 0.24873 (95%-conf.int. 0.24604 - 0.25157)
1 ROUGE-2 Average_P: 0.17760 (95%-conf.int. 0.17536 - 0.17989)
1 ROUGE-2 Average_F: 0.20002 (95%-conf.int. 0.19784 - 0.20247)
---------------------------------------------
1 ROUGE-L Average_R: 0.48956 (95%-conf.int. 0.48697 - 0.49221)
1 ROUGE-L Average_P: 0.34903 (95%-conf.int. 0.34667 - 0.35158)
1 ROUGE-L Average_F: 0.39396 (95%-conf.int. 0.39183 - 0.39629)



In [15]:
model_for_test = "./models/transformer/model_step_50000.pt"
target = [dataset[i]['tgt_txt'] for i in range(len(dataset))]
prediction = bertsum_model.predict(device_id, get_data_iter(dataset, 3000),sentence_seperator="<q>",
                                   test_from=model_for_test, cal_lead=True)
from utils import get_rouge
rouge_transformer = get_rouge(prediction, target, "/dadendev/textsum/results/rougetemp")

[2019-10-03 18:27:21,995 INFO] Device ID 2
[2019-10-03 18:27:21,996 INFO] Loading checkpoint from ./models/transformer/model_step_50000.pt
[2019-10-03 18:27:24,295 INFO] * number of parameters: 115790849


device_id 2
gpu_rank 0
11489
11489


2019-10-03 18:27:27,926 [MainThread  ] [INFO ]  Writing summaries.
[2019-10-03 18:27:27,926 INFO] Writing summaries.
2019-10-03 18:27:27,932 [MainThread  ] [INFO ]  Processing summaries. Saving system files to /dadendev/textsum/results/rougetemp/tmp8evwxrll/system and model files to /dadendev/textsum/results/rougetemp/tmp8evwxrll/model.
[2019-10-03 18:27:27,932 INFO] Processing summaries. Saving system files to /dadendev/textsum/results/rougetemp/tmp8evwxrll/system and model files to /dadendev/textsum/results/rougetemp/tmp8evwxrll/model.
2019-10-03 18:27:27,934 [MainThread  ] [INFO ]  Processing files in /dadendev/textsum/results/rougetemp/rouge-tmp-2019-10-03-18-27-26/candidate/.
[2019-10-03 18:27:27,934 INFO] Processing files in /dadendev/textsum/results/rougetemp/rouge-tmp-2019-10-03-18-27-26/candidate/.
2019-10-03 18:27:29,138 [MainThread  ] [INFO ]  Saved processed files to /dadendev/textsum/results/rougetemp/tmp8evwxrll/system.
[2019-10-03 18:27:29,138 INFO] Saved processed files

---------------------------------------------
1 ROUGE-1 Average_R: 0.52371 (95%-conf.int. 0.52067 - 0.52692)
1 ROUGE-1 Average_P: 0.34716 (95%-conf.int. 0.34484 - 0.34937)
1 ROUGE-1 Average_F: 0.40370 (95%-conf.int. 0.40154 - 0.40592)
---------------------------------------------
1 ROUGE-2 Average_R: 0.22740 (95%-conf.int. 0.22473 - 0.23047)
1 ROUGE-2 Average_P: 0.14969 (95%-conf.int. 0.14781 - 0.15166)
1 ROUGE-2 Average_F: 0.17444 (95%-conf.int. 0.17241 - 0.17662)
---------------------------------------------
1 ROUGE-L Average_R: 0.47465 (95%-conf.int. 0.47167 - 0.47766)
1 ROUGE-L Average_P: 0.31501 (95%-conf.int. 0.31282 - 0.31717)
1 ROUGE-L Average_F: 0.36614 (95%-conf.int. 0.36399 - 0.36826)



In [58]:
len(prediction)

11486

In [59]:
test_dataset[0]['src_txt']

['marseille , france -lrb- cnn -rrb- the french prosecutor leading an investigation into the crash of germanwings flight 9525 insisted wednesday that he was not aware of any video footage from on board the plane .',
 'marseille prosecutor brice robin told cnn that `` so far no videos were used in the crash investigation . ``',
 'he added , `` a person who has such a video needs to immediately give it to the investigators . ``',
 "robin 's comments follow claims by two magazines , german daily bild and french paris match , of a cell phone video showing the harrowing final seconds from on board germanwings flight 9525 as it crashed into the french alps .",
 'paris match and bild reported that the video was recovered from a phone at the wreckage site .',
 'the two publications described the supposed video , but did not post it on their websites .',
 'the publications said that they watched the video , which was found by a source close to the investigation .',
 "`` one can hear cries of ` 

In [60]:
target[0]

'marseille prosecutor says `` so far no videos were used in the crash investigation `` despite media reports . journalists at bild and paris match are `` very confident `` the video clip is real , an editor says . andreas lubitz had informed his lufthansa training school of an episode of severe depression , airline says .'

In [61]:
prediction[0]

'marseille prosecutor brice robin told cnn that `` so far no videos were used in the crash investigation . ``paris match and bild reported that the video was recovered from a phone at the wreckage site .marseille , france -lrb- cnn -rrb- the french prosecutor leading an investigation into the crash of germanwings flight 9525 insisted wednesday that he was not aware of any video footage from on board the plane .'

In [None]:
articles = [test_dataset[0]['src_txt']]
get_data_iter(article,batch_size=30000)