## ONLY if running on Colaboratory, run this cell first (once)

In [None]:
!git clone https://github.com/pie3636/newsjam.git
!mv newsjam/* .

## Install missing modules if needed (only run once)

In [None]:
!python -m pip install -r requirements.txt
!python -m spacy download fr_core_news_sm
!python -m spacy download en_core_web_sm
# Note: You'll have to restart the kernel/runtime after running this cell

## Imports (only run once)

In [1]:
# MLSUM Corpus & CNN/Daily Mail Corpus
from datasets import load_dataset

# Loading article data
import json

# Our packages
from eval.rouge_l import RougeLEval
from eval.bert_eval import BERT_Eval
from eval.time import TimeEval

from summ.lsa import LSASummarizer
from summ.bert_embed import BertEmbeddingsSummarizer

from tqdm import tqdm

dataset_fr = load_dataset('mlsum', 'fr')
dataset_en = load_dataset('cnn_dailymail', '3.0.0')

rouge_l = RougeLEval()
bert = BERT_Eval()
timer = TimeEval()
lsa_summ = LSASummarizer()
flaubert_summ = BertEmbeddingsSummarizer('flaubert/flaubert_large_cased')
camembert_summ = BertEmbeddingsSummarizer('camembert/camembert-large')
roberta_summ = BertEmbeddingsSummarizer('roberta-base')

Reusing dataset mlsum (/Users/josephkeenan/.cache/huggingface/datasets/mlsum/fr/1.0.0/77f23eb185781f439927ac2569ab1da1083195d8b2dab2b2f6bbe52feb600688)


HBox(children=(FloatProgress(value=0.0, max=3.0), HTML(value='')))




Reusing dataset cnn_dailymail (/Users/josephkeenan/.cache/huggingface/datasets/cnn_dailymail/3.0.0/3.0.0/3cb851bf7cf5826e45d49db2863f627cba583cbc32342df7349dfe6c38060234)


HBox(children=(FloatProgress(value=0.0, max=3.0), HTML(value='')))




Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.bias', 'lm_head.dense.bias', 'lm_h

## Summarize a single article

In [3]:
# Pick an article and its reference summary
article_fr = dataset_fr['test']['text'][54]
ref_summ_fr = dataset_fr['test']['summary'][54]

article_en = dataset_en['test']['article'][43]
ref_summ_en = dataset_en['test']['highlights'][43]

# Computes the summary and evaluation
# timer.evaluate_one(article, BertEmbeddingsSummarizer, 'camembert/camembert-large')

## Summarize a series of French articles

In [5]:
# Here we pick the number of articles
num_articles = 15

texts = dataset_fr['test']['text'][:num_articles]
ref_summs = dataset_fr['test']['summary'][:num_articles]

gen_summs = []
for text in tqdm(texts[:num_articles]):
     gen_summs.append(flaubert_summ.get_summary(text))

scores1, scores2 = rouge_l.evaluate_many(ref_summs, gen_summs, num_articles)
results = rouge_l.get_results(scores1, scores2)

for k, v in results.items():
     print(k.ljust(25), round(v*100, 3), '%')

#timer.evaluate_many(texts, LSASummarizer)

100%|███████████████████████████████████████████| 15/15 [05:42<00:00, 22.81s/it]
100%|███████████████████████████████████████████| 15/15 [00:00<00:00, 75.80it/s]

Long precision avg        10.413 %
Long recall avg           11.466 %
Long F1-score avg         10.771 %
Keyword precision avg     6.782 %
Keyword recall avg        8.435 %
Keyword F1-score avg      7.314 %





In [21]:
num = 12

print(ref_summs[num])
print()
print(gen_summs[num][0])
print()
print(gen_summs[num][1])

Tsitsipas, Halep, Fognini et Zverev ont gagné leurs matchs. « Le Monde » livre les résultats en détail au fil de la journée.

Fabio Fognini est venu à bout de l’Espagnol en quatre sets (7-6, 6-4, 4-6, 6-1).
Le jeune Grec, classé 6e mondial à 20 ans, a fini par s’imposer 7-5, 6-3, 6-7 [5], 7-6 [6] contre le Serbe.

Fabio Fognini venir bout Espagnol set 7 6 6 4 4 6 6 1
jeune Grec classer sixième mondial 20 an finir imposer 7 5 6 3 6 7 5 7 6 6 contre serbe


## Summarize a series of English articles

In [None]:
# Here we pick the number of articles
num_articles = 15

texts = dataset_en['test']['article'][:num_articles]
ref_summs = dataset_en['test']['highlights'][:num_articles]

gen_summs = []
for text in tqdm(texts[:num_articles]):
     gen_summs.append(roberta_summ.get_summary(text, lang='en'))

scores1, scores2 = rouge_l.evaluate_many(ref_summs, gen_summs, num_articles)
results = rouge_l.get_results(scores1, scores2)

for k, v in results.items():
     print(k.ljust(25), round(v*100, 3), '%')

 47%|████████████████████▌                       | 7/15 [00:37<00:42,  5.35s/it]

In [61]:
num = 12

print(ref_summs[num])
print()
print(gen_summs[num][0])
print()
print(gen_summs[num][1])

Saudi general says more than 1,200 airstrikes since campaign began March 26 .
Three Saudis were killed in attack on border position, source tells CNN .

(CNN)More than 500 Houthi rebels have been killed since the start of Saudi-led military operations against Yemeni Shia fighters, a Saudi Defense Ministry official said Saturday, according to the state-run Saudi Press Agency.

CNN)More 500 Houthi rebel kill start Saudi lead military operation yemeni Shia fighter Saudi Defense Ministry official say Saturday accord state run Saudi Press Agency


#### Optional: Save generated summaries to file

In [None]:
with open('generated.txt', 'w') as f:
    for summ1, summ2 in tqdm(gen_summs):
        f.write(summ1)
        f.write('\n\n')
        f.write(summ2)
        f.write('\n\n')

## Summarize a series of scraped articles

In [37]:
with open('data/actu_preliminary.json', 'r', encoding='utf-8') as jsonfile:
    data = json.load(jsonfile)

texts = [article['text'] for article in data]
ref_summs = [article['summary'] for article in data]

gen_summs = []
for text in tqdm(texts):
    gen_summs.append(lsa_summ.get_summary(text))

scores1, scores2 = rouge_l.evaluate_many(ref_summs, gen_summs)
results = rouge_l.get_results(scores1, scores2)

for k, v in results.items():
    print(k.ljust(25), round(v*100, 3), '%')

100%|███████████████████████████████████████████| 47/47 [01:16<00:00,  1.63s/it]
100%|██████████████████████████████████████████| 47/47 [00:00<00:00, 101.73it/s]

Long precision avg        49.216 %
Long recall avg           62.04 %
Long F1-score avg         54.236 %
Keyword precision avg     48.435 %
Keyword recall avg        61.659 %
Keyword F1-score avg      53.56 %





## Implementation of BERTScore

Splitting summaries

In [11]:
long_summs, short_summs, ref_summs, key_ref_summs =  bert.split_summs(gen_summs, ref_summs, gen_keys=True)

Computation of BERTScore

In [12]:
bert.bert_score(long_summs, short_summs, ref_summs, key_ref_summs)

calculating scores...
computing bert embedding.


HBox(children=(FloatProgress(value=0.0, max=2.0), HTML(value='')))


computing greedy matching.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


done in 18.72 seconds, 2.51 sentences/sec
calculating scores...
computing bert embedding.


HBox(children=(FloatProgress(value=0.0, max=2.0), HTML(value='')))


computing greedy matching.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


done in 12.90 seconds, 3.64 sentences/sec


{'Long precision avg': '0.2326',
 'Long recall avg': '0.3982',
 'Long F1-score avg': '0.3125',
 'Keyword precision avg': '0.2440',
 'Keyword recall avg': '0.3905',
 'Keyword F1-score avg': '0.3151'}

Optional matrix of a score

In [None]:
bert.get_matrix(long_summs, ref_summs, 4)

Experimentation w/ BERTScore

In [32]:
test_gen_summ = []
test_gen_summ.append('B')
print(len(test_gen_summ[0]))

print()

test_ref_summ = []
test_ref_summ.append('汉字')
print(len(test_ref_summ[0]))

1

2


In [33]:
bert.bert_score(test_gen_summ, test_ref_summ)

calculating scores...
computing bert embedding.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


computing greedy matching.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))


done in 0.13 seconds, 7.84 sentences/sec


{'Long precision avg': '0.3093',
 'Long recall avg': '0.2388',
 'Long F1-score avg': '0.2743'}