This is the implementation of Algorithm 1 found in project paper that calculates the average of Rouge scores of each models using the first 100 BBC business news articles.  In order to perform an experimental comparison with four extractive summarization techniques, we extract six sentences from each article in order to compose the summary. 
1. the value of the evaluation measure F1, Recall, Precision were calculated for each article
2. take the average of those scores to arrive at a consolidated F1, Recall and Precision scores for each Rouge

In [124]:
! pip install bert-extractive-summarizer
! pip install spacy==2.1.3
! pip install transformers==2.2.2
! pip install neuralcoref
! pip install rouge
! pip install sumy
! pip install newspaper3k





Averaging algorithm on first 100 BBC business news articles dataset using different extractive summarizer

In [230]:
import os,glob
from statistics import mean 
#from summarizer import Summarizer
from rouge import Rouge 
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer
from sumy.summarizers.luhn import LuhnSummarizer
from sumy.summarizers.lex_rank import LexRankSummarizer 
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words


directory = 'BBC News Summary/News Articles/business'
directory_gs = 'BBC News Summary/Summaries/business'

rouge = Rouge()
language = "english"
news_list = []
gs_list = []

for filename in glob.glob(os.path.join(directory, '*.txt')):
    with open(filename, 'r') as f:
        text = f.read()
        news_list.append(text)
        
for filename in glob.glob(os.path.join(directory_gs, '*.txt')):
    with open(filename, 'r') as f:
        text = f.read()
        gs_list.append(text)

In [231]:
def avg_rouge(model, sentence_cnt):
    r1_f = r1_p = r1_r = r2_f = r2_p = r2_r = rl_f = rl_p = rl_r = []
    summary = ""
    stemmer = Stemmer(language) 
    if model == 'lsa':
        #sumy_model = LsaSummarizer()
        sumy_model = LsaSummarizer(stemmer)
    elif model == 'luhn':
        #sumy_model = LuhnSummarizer()
        sumy_model = LuhnSummarizer(language)
    elif model == 'lex':
        #sumy_model = LexRankSummarizer()
        sumy_model = LexRankSummarizer(language)
    elif model == 'bert':
        model_bert = Summarizer()
        
    for news, gs in zip(news_list, gs_list):
        if model == 'lsa' or 'luhn' or 'lex':
            parser = PlaintextParser(news, Tokenizer(language))
                   
            #sumy_model.stop_words = get_stop_words(language)
            result = sumy_model(parser.document, sentence_cnt)
            for i in result:
                summary = summary + str(i) + " " 
        elif model == 'bert':
            result = model_bert(news, num_sentences = sentence_cnt)
            summary = "".join(result)   

        scores = rouge.get_scores(summary, gs)
        r1_f.append(scores[0]['rouge-1']['f'])
        r1_p.append(scores[0]['rouge-1']['p'])
        r1_r.append(scores[0]['rouge-1']['r'])
        r2_f.append(scores[0]['rouge-2']['f'])
        r2_p.append(scores[0]['rouge-2']['p'])
        r2_r.append(scores[0]['rouge-2']['r'])
        rl_f.append(scores[0]['rouge-l']['f'])
        rl_p.append(scores[0]['rouge-l']['p'])
        rl_r.append(scores[0]['rouge-l']['r'])

    print(model + ": " + "Average of Rouge-1 F1 =", round(mean(r1_f), 3))  
    print(model + ": " + "Average of Rouge-1 p =", round(mean(r1_p), 3))  
    print(model + ": " + "Average of Rouge-1 r =", round(mean(r1_r), 3))  
    print(model + ": " + "Average of Rouge-2 F1 =", round(mean(r2_f), 3))  
    print(model + ": " + "Average of Rouge-2 p =", round(mean(r2_p), 3))  
    print(model + ": " + "Average of Rouge-2 r =", round(mean(r2_r), 3))  
    print(model + ": " + "Average of Rouge-l F1 =", round(mean(rl_f), 3))  
    print(model + ": " + "Average of Rouge-l p =", round(mean(rl_p), 3))  
    print(model + ": " + "Average of Rouge-l r =", round(mean(rl_r), 3))  

In [232]:
avg_rouge('lsa', 6)

lsa: Average of Rouge-1 F1 = 0.271
lsa: Average of Rouge-1 p = 0.271
lsa: Average of Rouge-1 r = 0.271
lsa: Average of Rouge-2 F1 = 0.271
lsa: Average of Rouge-2 p = 0.271
lsa: Average of Rouge-2 r = 0.271
lsa: Average of Rouge-l F1 = 0.271
lsa: Average of Rouge-l p = 0.271
lsa: Average of Rouge-l r = 0.271


In [227]:
avg_rouge('luhn', 6)

luhn: Average of Rouge-1 F1 = 0.061
luhn: Average of Rouge-1 p = 0.036
luhn: Average of Rouge-1 r = 0.906
luhn: Average of Rouge-2 F1 = 0.048
luhn: Average of Rouge-2 p = 0.027
luhn: Average of Rouge-2 r = 0.739
luhn: Average of Rouge-l F1 = 0.093
luhn: Average of Rouge-l p = 0.054
luhn: Average of Rouge-l r = 0.884


In [214]:
avg_rouge('lex', 6)

lex: Average of Rouge-1 F1 = 0.067
lex: Average of Rouge-1 p = 0.041
lex: Average of Rouge-1 r = 0.865
lex: Average of Rouge-2 F1 = 0.048
lex: Average of Rouge-2 p = 0.029
lex: Average of Rouge-2 r = 0.623
lex: Average of Rouge-l F1 = 0.094
lex: Average of Rouge-l p = 0.056
lex: Average of Rouge-l r = 0.834


In [142]:
! pip install spacy==2.2.2
! pip install transformers==2.11.0
! pip install spacy-transformers[cuda100]==0.5.1
#! pip install spacy==2.1.3
#! pip install transformers==2.2.2

Collecting spacy-transformers[cuda100]==0.5.1
  Downloading spacy-transformers-0.5.1.tar.gz (59 kB)
Collecting transformers<2.1.0,>=2.0.0
  Downloading transformers-2.0.0-py3-none-any.whl (290 kB)


ERROR: Could not find a version that satisfies the requirement torch>=1.0.0 (from spacy-transformers[cuda100]==0.5.1) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2)
ERROR: No matching distribution found for torch>=1.0.0 (from spacy-transformers[cuda100]==0.5.1)


In [148]:
from summarizer import Summarizer

model_bert = Summarizer()

NameError: name 'BertModel' is not defined

In [6]:
sentence_cnt = 8
result = model_bert(text, num_sentences = sentence_cnt - 1)
summary_bert = "".join(result)
pprint(summary_bert)

('Quarterly profits at US media giant TimeWarner jumped 76% to $1.13bn (£600m) '
 'for the three months to December, from $639m year-earlier. TimeWarner said '
 'fourth quarter sales rose 2% to $11.1bn from $10.9bn. But its own internet '
 'business, AOL, had has mixed fortunes. It lost 464,000 subscribers in the '
 'fourth quarter profits were lower than in the preceding three quarters. But '
 'its film division saw profits slump 27% to $284m, helped by box-office flops '
 'Alexander and Catwoman, a sharp contrast to year-earlier, when the third and '
 'final film in the Lord of the Rings trilogy boosted results. Our financial '
 'performance was strong, meeting or exceeding all of our full-year objectives '
 'and greatly enhancing our flexibility," chairman and chief executive Richard '
 'Parsons said. For 2005, TimeWarner is projecting operating earnings growth '
 'of around 5%, and also expects higher revenue and wider profit margins. '
 'TimeWarner is to restate its accounts as pa

In [112]:
import requests
from newspaper import fulltext
from sumy.parsers.html import HtmlParser
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words

In [16]:
def bert_summarize(url, num_sentences_out):

    text = fulltext(requests.get(url).text)
    model = Summarizer()
    result = model(text, num_sentences = num_sentences_out)
    summary = "".join(result)
    return summary

In [17]:
def lsa_summarize(url, num_sentences_out):
    # for plain text files
    # text = fulltext(requests.get(url).text)
    summary_lsa = ""
    language = "english"
    parser = HtmlParser.from_url(url, Tokenizer(language))
    # for plain text files
    # parser = PlaintextParser(text, Tokenizer("english"))
    stemmer = Stemmer(language)

    model_lsa = LsaSummarizer(stemmer)
    #model_lsa.stop_words = get_stop_words(language)
    result_lsa = model_lsa(parser.document, num_sentences_out)
    
    for i in result_lsa:
        summary_lsa = summary_lsa + str(i) + " " 
    return summary_lsa

In [18]:
def luhn_summarize(url, num_sentences_out):
    # for plain text files
    #text = fulltext(requests.get(url).text)
    summary_luhn = ""
    language = "english"
    parser = HtmlParser.from_url(url, Tokenizer(language))
    # for plain text files
    #parser = PlaintextParser(text, Tokenizer("english"))
    stemmer = Stemmer(language)

    model_luhn = LuhnSummarizer(stemmer)
    #model_luhn.stop_words = get_stop_words(language)
    result_luhn = model_luhn(parser.document, num_sentences_out)
    
    for i in result_luhn:
        summary_luhn = summary_luhn + str(i) + " "
    return summary_luhn

In [110]:
def lex_summarize(url, num_sentences_out):

    summary_lex = ""
    language = "english"
    parser = HtmlParser.from_url(url, Tokenizer(language))
    # for plain text files
    #parser = PlaintextParser(text, Tokenizer("english"))
    stemmer = Stemmer(language)
    
    model_lex = LexRankSummarizer(stemmer)
    #model_lex.stop_words = get_stop_words(language)
    result_lex = model_lex(parser.document, num_sentences_out)
    
    for i in result_lex:
        summary_lex = summary_lex + str(i) + " "
    return summary_lex

In [20]:
bert_summary = bert_summarize('https://edition.cnn.com/2020/08/16/asia/new-zealand-ardern-election-delay-coronavirus/index.html', 10)
pprint(bert_summary)

('Auckland (CNN) New Zealand Prime Minister Jacinda Ardern says she is '
 "delaying the country's parliamentary election by four weeks to October 17 "
 'after the reemergence of Covid-19 in the country last week. This comes after '
 "around 100 days without community spread. Ardern said that New Zealand's "
 'Electoral Commission had assured her that a safe and accessible election '
 'would be possible on the new date. "Ultimately I want to ensure we have a '
 'well-run election that gives all voters the best chance to receive all the '
 'information about parties and candidates and delivers certainty for the '
 'future," she said. "In the end what matters most is what is in the best '
 'interests of voters and our democracy," she said. " Any decision to review '
 'the election date must be as free from partisan political interests as '
 'possible." "Even if I had not picked up the phone and contacted anyone, I '
 'believe this is the outcome I would have arrived at," she said. " We ha

In [21]:
lsa_summary = lsa_summarize('https://edition.cnn.com/2020/08/16/asia/new-zealand-ardern-election-delay-coronavirus/index.html', 10)
pprint(lsa_summary)

('"Ultimately I want to ensure we have a well-run election that gives all '
 'voters the best chance to receive all the information about parties and '
 'candidates and delivers certainty for the future," she said. "Confirmation '
 'of the date provides certainty to the public about when the election will be '
 'held," Chief Electoral Officer Alicia Wright said in a statement. The '
 'commission said that it had always prepared for the election to be run as if '
 'under Alert Level 2 lockdown restrictions, with planned measures including '
 'contact tracing, provision of hand sanitizer and physical distancing. In a '
 'statement after Ardern\'s announcement Tuesday she said: "It was always '
 "National's view that to have a fair, democratic election, we needed to deal "
 'with this second wave of Covid-19 so politicians from all parties had a '
 'reasonable chance to present their policies, and the public felt comfortable '
 'engaging with the campaign without putting their health at r

In [23]:
luhn_summary = luhn_summarize('https://edition.cnn.com/2020/08/16/asia/new-zealand-ardern-election-delay-coronavirus/index.html', 10)
pprint(luhn_summary)

('Ardern said while the decision to change the election date rested solely '
 'with her as Prime Minister, she consulted with other party leaders as '
 '"moving an election date especially this late in an electoral cycle is a '
 'significant decision." Ardern said New Zealand\'s Electoral Commission had '
 'been preparing for a range of circumstances, such as holding an election in '
 'level two or three lockdown, and that she did not intend to change the '
 'election date again. Leader of the opposition, National Party Leader Judith '
 'Collins, had previously called for the election to be delayed until '
 'November. In a statement after Ardern\'s announcement Tuesday she said: "It '
 "was always National's view that to have a fair, democratic election, we "
 'needed to deal with this second wave of Covid-19 so politicians from all '
 'parties had a reasonable chance to present their policies, and the public '
 'felt comfortable engaging with the campaign without putting their health 

In [113]:
lex_summary = lex_summarize('https://edition.cnn.com/2020/08/16/asia/new-zealand-ardern-election-delay-coronavirus/index.html', 10)
pprint(lex_summary)

('Ardern said while the decision to change the election date rested solely '
 'with her as Prime Minister, she consulted with other party leaders as '
 '"moving an election date especially this late in an electoral cycle is a '
 'significant decision." "In the end what matters most is what is in the best '
 'interests of voters and our democracy," she said. Ardern said New Zealand\'s '
 'Electoral Commission had been preparing for a range of circumstances, such '
 'as holding an election in level two or three lockdown, and that she did not '
 'intend to change the election date again. "Covid is the world\'s new normal. '
 '"Holding an election during a Covid outbreak has the risk of serious '
 'interference in our democracy. It returned to level one on June 9 , with '
 'border controls remaining in place but most citizens living life as normal '
 '-- until last week. "If we look at the number of new cases today -- given '
 'the extent of testing that has happened over the last few days