This is the implementation of Algorithm 1 found in project paper that calculates the average of Rouge scores of each models using the first 100 BBC business news articles.  In order to perform an experimental comparison with four extractive summarization techniques, we extract six sentences from each article in order to compose the summary. 
1. the value of the evaluation measure F1, Recall, Precision were calculated for each article
2. take the average of those scores to arrive at a consolidated F1, Recall and Precision scores for each Rouge

In [1]:
! pip install bert-extractive-summarizer
! pip install spacy==2.1.3
! pip install transformers==2.2.2
! pip install neuralcoref
! pip install rouge
! pip install sumy
! pip install newspaper3k

Collecting spacy==2.1.3
  Using cached spacy-2.1.3-cp36-cp36m-win_amd64.whl (26.9 MB)
Collecting preshed<2.1.0,>=2.0.1
  Using cached preshed-2.0.1-cp36-cp36m-win_amd64.whl (73 kB)
Collecting blis<0.3.0,>=0.2.2
  Using cached blis-0.2.4-cp36-cp36m-win_amd64.whl (3.1 MB)
Collecting thinc<7.1.0,>=7.0.2
  Using cached thinc-7.0.8-cp36-cp36m-win_amd64.whl (1.9 MB)
Installing collected packages: preshed, blis, thinc, spacy
  Attempting uninstall: preshed
    Found existing installation: preshed 3.0.2
    Uninstalling preshed-3.0.2:
      Successfully uninstalled preshed-3.0.2
  Attempting uninstall: blis
    Found existing installation: blis 0.4.1
    Uninstalling blis-0.4.1:
      Successfully uninstalled blis-0.4.1
  Attempting uninstall: thinc
    Found existing installation: thinc 7.3.1
    Uninstalling thinc-7.3.1:
      Successfully uninstalled thinc-7.3.1
  Attempting uninstall: spacy
    Found existing installation: spacy 2.2.2
    Uninstalling spacy-2.2.2:
      Successfully uninst

ERROR: en-core-web-sm 2.2.5 has requirement spacy>=2.2.2, but you'll have spacy 2.1.3 which is incompatible.


Collecting transformers==2.2.2
  Using cached transformers-2.2.2-py3-none-any.whl (387 kB)
Installing collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 2.11.0
    Uninstalling transformers-2.11.0:
      Successfully uninstalled transformers-2.11.0
Successfully installed transformers-2.2.2


Averaging algorithm on first 100 BBC business news articles dataset using different extractive summarizer

In [2]:
import os,glob
from statistics import mean 
#from summarizer import Summarizer
from rouge import Rouge 
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer
from sumy.summarizers.luhn import LuhnSummarizer
from sumy.summarizers.lex_rank import LexRankSummarizer 
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words


directory = 'BBC News Summary/News Articles/business'
directory_gs = 'BBC News Summary/Summaries/business'

rouge = Rouge()
language = "english"
news_list = []
gs_list = []

for filename in glob.glob(os.path.join(directory, '*.txt')):
    with open(filename, 'r') as f:
        text = f.read()
        news_list.append(text)
        
for filename in glob.glob(os.path.join(directory_gs, '*.txt')):
    with open(filename, 'r') as f:
        text = f.read()
        gs_list.append(text)

In [3]:
def avg_rouge(model, sentence_cnt):
    r1_f = [] 
    r1_p = []
    r1_r = []
    r2_f = []
    r2_p = []
    r2_r = []
    rl_f = []
    rl_p = []
    rl_r = []
    summary = ""
    stemmer = Stemmer(language) 
    if model == 'lsa':
        sumy_model = LsaSummarizer()
        #sumy_model = LsaSummarizer(stemmer)
    elif model == 'luhn':
        sumy_model = LuhnSummarizer()
        #sumy_model = LuhnSummarizer(stemmer)
    elif model == 'lex':
        sumy_model = LexRankSummarizer()
        #sumy_model = LexRankSummarizer(stemmer)
    elif model == 'bert':
        model_bert = Summarizer()
        
    for news, gs in zip(news_list, gs_list):
        if model == 'lsa' or 'luhn' or 'lex':
            parser = PlaintextParser(news, Tokenizer(language))
                   
            #sumy_model.stop_words = get_stop_words(language)
            result = sumy_model(parser.document, sentence_cnt)
            for i in result:
                summary = summary + str(i) + " " 
        elif model == 'bert':
            result = model_bert(news, num_sentences = sentence_cnt)
            summary = "".join(result)   

        scores = rouge.get_scores(summary, gs)
        r1_f.append(scores[0]['rouge-1']['f'])
        r1_p.append(scores[0]['rouge-1']['p'])
        r1_r.append(scores[0]['rouge-1']['r'])
        r2_f.append(scores[0]['rouge-2']['f'])
        r2_p.append(scores[0]['rouge-2']['p'])
        r2_r.append(scores[0]['rouge-2']['r'])
        rl_f.append(scores[0]['rouge-l']['f'])
        rl_p.append(scores[0]['rouge-l']['p'])
        rl_r.append(scores[0]['rouge-l']['r'])

    print(model + ": " + "Average of Rouge-1 F1 =", round(mean(r1_f), 3))  
    print(model + ": " + "Average of Rouge-1 p =", round(mean(r1_p), 3))  
    print(model + ": " + "Average of Rouge-1 r =", round(mean(r1_r), 3))  
    print(model + ": " + "Average of Rouge-2 F1 =", round(mean(r2_f), 3))  
    print(model + ": " + "Average of Rouge-2 p =", round(mean(r2_p), 3))  
    print(model + ": " + "Average of Rouge-2 r =", round(mean(r2_r), 3))  
    print(model + ": " + "Average of Rouge-l F1 =", round(mean(rl_f), 3))  
    print(model + ": " + "Average of Rouge-l p =", round(mean(rl_p), 3))  
    print(model + ": " + "Average of Rouge-l r =", round(mean(rl_r), 3))  

In [4]:
avg_rouge('lsa', 6)

lsa: Average of Rouge-1 F1 = 0.064
lsa: Average of Rouge-1 p = 0.04
lsa: Average of Rouge-1 r = 0.834
lsa: Average of Rouge-2 F1 = 0.04
lsa: Average of Rouge-2 p = 0.025
lsa: Average of Rouge-2 r = 0.54
lsa: Average of Rouge-l F1 = 0.085
lsa: Average of Rouge-l p = 0.051
lsa: Average of Rouge-l r = 0.799


In [5]:
avg_rouge('luhn', 6)

luhn: Average of Rouge-1 F1 = 0.061
luhn: Average of Rouge-1 p = 0.036
luhn: Average of Rouge-1 r = 0.906
luhn: Average of Rouge-2 F1 = 0.048
luhn: Average of Rouge-2 p = 0.027
luhn: Average of Rouge-2 r = 0.739
luhn: Average of Rouge-l F1 = 0.093
luhn: Average of Rouge-l p = 0.054
luhn: Average of Rouge-l r = 0.884


In [6]:
avg_rouge('lex', 6)

lex: Average of Rouge-1 F1 = 0.067
lex: Average of Rouge-1 p = 0.041
lex: Average of Rouge-1 r = 0.865
lex: Average of Rouge-2 F1 = 0.048
lex: Average of Rouge-2 p = 0.029
lex: Average of Rouge-2 r = 0.623
lex: Average of Rouge-l F1 = 0.094
lex: Average of Rouge-l p = 0.056
lex: Average of Rouge-l r = 0.834


In [7]:
! pip install spacy==2.2.2
! pip install transformers==2.11.0
! pip install spacy-transformers[cuda100]==0.5.1
#! pip install spacy==2.1.3
#! pip install transformers==2.2.2

Collecting spacy==2.2.2
  Using cached spacy-2.2.2-cp36-cp36m-win_amd64.whl (9.4 MB)
Collecting thinc<7.4.0,>=7.3.0
  Using cached thinc-7.3.1-cp36-cp36m-win_amd64.whl (2.0 MB)
Collecting blis<0.5.0,>=0.4.0
  Using cached blis-0.4.1-cp36-cp36m-win_amd64.whl (5.0 MB)
Collecting preshed<3.1.0,>=3.0.2
  Using cached preshed-3.0.2-cp36-cp36m-win_amd64.whl (105 kB)
Installing collected packages: blis, preshed, thinc, spacy
  Attempting uninstall: blis
    Found existing installation: blis 0.2.4
    Uninstalling blis-0.2.4:
      Successfully uninstalled blis-0.2.4
  Attempting uninstall: preshed
    Found existing installation: preshed 2.0.1
    Uninstalling preshed-2.0.1:
      Successfully uninstalled preshed-2.0.1
  Attempting uninstall: thinc
    Found existing installation: thinc 7.0.8
    Uninstalling thinc-7.0.8:
      Successfully uninstalled thinc-7.0.8
  Attempting uninstall: spacy
    Found existing installation: spacy 2.1.3
    Uninstalling spacy-2.1.3:
      Successfully uninst

ERROR: Could not find a version that satisfies the requirement torch>=1.0.0 (from spacy-transformers[cuda100]==0.5.1) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2)
ERROR: No matching distribution found for torch>=1.0.0 (from spacy-transformers[cuda100]==0.5.1)


In [8]:
# from summarizer import Summarizer

# model_bert = Summarizer()

In [9]:
# sentence_cnt = 6
# result = model_bert(text, num_sentences = sentence_cnt - 1)
# summary_bert = "".join(result)
# pprint(summary_bert)

In [10]:
import requests
from newspaper import fulltext
from sumy.parsers.html import HtmlParser
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words

In [11]:
# def bert_summarize(url, num_sentences_out):

#     text = fulltext(requests.get(url).text)
#     model = Summarizer()
#     result = model(text, num_sentences = num_sentences_out)
#     summary = "".join(result)
#     return summary

In [12]:
def lsa_summarize(url, num_sentences_out):
    # for plain text files
    # text = fulltext(requests.get(url).text)
    summary_lsa = ""
    language = "english"
    parser = HtmlParser.from_url(url, Tokenizer(language))
    # for plain text files
    # parser = PlaintextParser(text, Tokenizer("english"))
    stemmer = Stemmer(language)

    model_lsa = LsaSummarizer(stemmer)
    #model_lsa.stop_words = get_stop_words(language)
    result_lsa = model_lsa(parser.document, num_sentences_out)
    
    for i in result_lsa:
        summary_lsa = summary_lsa + str(i) + " " 
    return summary_lsa

In [13]:
def luhn_summarize(url, num_sentences_out):
    # for plain text files
    #text = fulltext(requests.get(url).text)
    summary_luhn = ""
    language = "english"
    parser = HtmlParser.from_url(url, Tokenizer(language))
    # for plain text files
    #parser = PlaintextParser(text, Tokenizer("english"))
    stemmer = Stemmer(language)

    model_luhn = LuhnSummarizer(stemmer)
    #model_luhn.stop_words = get_stop_words(language)
    result_luhn = model_luhn(parser.document, num_sentences_out)
    
    for i in result_luhn:
        summary_luhn = summary_luhn + str(i) + " "
    return summary_luhn

In [14]:
def lex_summarize(url, num_sentences_out):

    summary_lex = ""
    language = "english"
    parser = HtmlParser.from_url(url, Tokenizer(language))
    # for plain text files
    #parser = PlaintextParser(text, Tokenizer("english"))
    stemmer = Stemmer(language)
    
    model_lex = LexRankSummarizer(stemmer)
    #model_lex.stop_words = get_stop_words(language)
    result_lex = model_lex(parser.document, num_sentences_out)
    
    for i in result_lex:
        summary_lex = summary_lex + str(i) + " "
    return summary_lex

In [15]:
from pprint import pprint

In [16]:
# bert_summary = bert_summarize('https://edition.cnn.com/2020/08/16/asia/new-zealand-ardern-election-delay-coronavirus/index.html', 10)
# pprint(bert_summary)

In [17]:
lsa_summary = lsa_summarize('https://edition.cnn.com/2020/08/16/asia/new-zealand-ardern-election-delay-coronavirus/index.html', 10)
pprint(lsa_summary)

('Ardern said while the decision to change the election date rested solely '
 'with her as Prime Minister, she consulted with other party leaders as '
 '"moving an election date especially this late in an electoral cycle is a '
 'significant decision." Ardern said New Zealand\'s Electoral Commission had '
 'been preparing for a range of circumstances, such as holding an election in '
 'level two or three lockdown, and that she did not intend to change the '
 "election date again. New Zealand's Parliament will now reconvene Tuesday and "
 'be dissolved on September 6 ahead of the October poll. The commission said '
 'that it had always prepared for the election to be run as if under Alert '
 'Level 2 lockdown restrictions, with planned measures including contact '
 'tracing, provision of hand sanitizer and physical distancing. In a statement '
 'after Ardern\'s announcement Tuesday she said: "It was always National\'s '
 'view that to have a fair, democratic election, we needed to deal 

In [18]:
luhn_summary = luhn_summarize('https://edition.cnn.com/2020/08/16/asia/new-zealand-ardern-election-delay-coronavirus/index.html', 10)
pprint(luhn_summary)

('"Ultimately I want to ensure we have a well-run election that gives all '
 'voters the best chance to receive all the information about parties and '
 'candidates and delivers certainty for the future," she said. Ardern said '
 'while the decision to change the election date rested solely with her as '
 'Prime Minister, she consulted with other party leaders as "moving an '
 'election date especially this late in an electoral cycle is a significant '
 'decision." Ardern said New Zealand\'s Electoral Commission had been '
 'preparing for a range of circumstances, such as holding an election in level '
 'two or three lockdown, and that she did not intend to change the election '
 "date again. In a statement after Ardern's announcement Tuesday she said: "
 '"It was always National\'s view that to have a fair, democratic election, we '
 'needed to deal with this second wave of Covid-19 so politicians from all '
 'parties had a reasonable chance to present their policies, and the public '

In [19]:
lex_summary = lex_summarize('https://edition.cnn.com/2020/08/16/asia/new-zealand-ardern-election-delay-coronavirus/index.html', 10)
pprint(lex_summary)

("Ardern said that New Zealand's Electoral Commission had assured her that a "
 'safe and accessible election would be possible on the new date. "Ultimately '
 'I want to ensure we have a well-run election that gives all voters the best '
 'chance to receive all the information about parties and candidates and '
 'delivers certainty for the future," she said. Ardern said while the decision '
 'to change the election date rested solely with her as Prime Minister, she '
 'consulted with other party leaders as "moving an election date especially '
 'this late in an electoral cycle is a significant decision." Here in New '
 'Zealand, we are all working as hard as we can to make sure that our new '
 'normal disrupts our lives as little as possible." In a statement after '
 'Ardern\'s announcement Tuesday she said: "It was always National\'s view '
 'that to have a fair, democratic election, we needed to deal with this second '
 'wave of Covid-19 so politicians from all parties had a reasona