# News Summarization Approaches for NLP

Text summarization in NLP is the process of summarizing the information in large texts for quicker consumption. In this notebook, I will walk you through the traditional extractive as well as the advanced generative methods to implement News summarization.

When you open news sites, do you just start reading every news article? Probably not. We typically glance the short news summary and then read more details if interested. Short, informative summaries of the news is now everywhere like magazines, news aggregator apps, research sites, etc.
It is essential for the summary to be a fluent, continuous and depict the significant.

Text summarization methods can be grouped into two main categories: 
- Extractive 
- Abstractive 





# import necessary libraries


In [1]:
## for uploading data
import pandas as pd
import os
from pathlib import Path


## for uploadinf data
import pandas as pd
import os
import numpy as np


## for plotting
import matplotlib.pyplot as plt
import seaborn as sns

## import necessary nlp libraries
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
lem = WordNetLemmatizer()
from nltk import sent_tokenize


from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer

# TextRank 
from sumy.summarizers.text_rank import TextRankSummarizer

# LexRank
from sumy.summarizers.lex_rank import LexRankSummarizer

#BART 
from transformers import BartTokenizer, BartForConditionalGeneration
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

import torch

# Evaluation
from rouge import Rouge
rouge = Rouge()

#Our Application
from app import Summarizer
from utils import Article


# Main :



## 1) Get the Data


#### a ) Loading the Data from multi documents

In [2]:
DATA_PATH="./dataset/BBC/BBC News Summary/"
newsDF=Article.loading(DATA_PATH)
newsDF[['Article','Summary']].head()

Unnamed: 0,Article,Summary
0,"EU-US seeking deal on air dispute , The EU and...",Both sides hope to reach a negotiated deal ove...
1,"Trade gap narrows as exports rise , The UK's t...",Overall UK exports - including both goods and ...
2,"Economy 'strong' in election year , UK busines...",The BDO optimism index - a leading indicator o...
3,"Brazil approves bankruptcy reform , A major re...","The new legislation changes this, giving prior..."
4,"Yangtze Electric's profits double , Yangtze El...","Yangtze Electric Power, the operator of China'..."


#### b) Extract text from URL

In [3]:
#  Extract text from URL
URL="https://assets.msn.com/labs/mind/AAJwoxD.html"
textFromUrl=Article.scraper(URL)
textFromUrl

'There won’t be a chill down to your bones this Halloween in Orlando, unless you count the sweat dripping from your armpits.\nHalloween temperatures are supposed to come near or tie the record for the hottest Halloween in Orlando, but the month of October has already beaten the record for the hottest October ever recorded in the City Beautiful, according to the National Weather Service.\nThe record to beat was an average of 80.2 degrees; with two days remaining 2019′s October is on track to record 80.9 degrees, said NWS meteorologist Derrick Weitlich.\n“Yeah with just two days left, there’s no way it won’t break the record. This October has been above normal,” Weitlich said.\nDaytona and Vero Beach are also expected to hit record breaking months.\nSo why is it so hot? Was Central Florida cursed by a coven of witches?\nThe answer is less magical and more meteorological as a ridge, or an area of blocking high pressure, has been sitting over Florida preventing frontal passages of cooler a

## 2 ) Clean the Data

In [4]:
article=Article.cleaningArticle("article",newsDF['Article'][0])
summary=Article.cleaningArticle("summary",newsDF['Summary'][0])
print("Article :\n\n",article,"\n","-"*115," Our Ref Summary :\n\n",summary)

Article :

 the eu and us have agreed to begin talks on ending subsidies given to aircraft makers , eu trade commissioner peter mandelson has announced . , both sides hope to reach a negotiated deal over state aid received by european aircraft maker airbus and its us rival boeing , mr mandelson said . airbus and boeing accuse each other of benefiting from illegal subsidies . mr mandelson said the eu and us hoped to avoid having to resolve the dispute at the world trade organisation wto . , with this agreement the eu and us have confirmed their willingness to resolve the dispute which has arisen between them , mr mandelson said . i hope our negotiations in the next three months will lead to an agreement ending subsidies to development and production of large civil aircraft . last year , the us terminated an agreement with the eu , reached in 1992 , which limits the subsidies countries can hand over to civil aircraft makers . the us filed a complaint against brussels with the wto over st

# 3) Text Summarization

### a)  Extractive

####  I ) TextRank

In [5]:
#Applying TextRank summarization on data

TextRankSummary=Summarizer.extracitve(article,TextRankSummarizer,3)

In [6]:

print("The Whole Article : \n------------------\n",article)
print("------------------------------------------------------------------------------------------------------------------")
print("Summary (TextRank) : \n------------------\n",TextRankSummary)
print("------------------------------------------------------------------------------------------------------------------")
print("Our Summary Ref : \n------------------\n",summary)


The Whole Article : 
------------------
 the eu and us have agreed to begin talks on ending subsidies given to aircraft makers , eu trade commissioner peter mandelson has announced . , both sides hope to reach a negotiated deal over state aid received by european aircraft maker airbus and its us rival boeing , mr mandelson said . airbus and boeing accuse each other of benefiting from illegal subsidies . mr mandelson said the eu and us hoped to avoid having to resolve the dispute at the world trade organisation wto . , with this agreement the eu and us have confirmed their willingness to resolve the dispute which has arisen between them , mr mandelson said . i hope our negotiations in the next three months will lead to an agreement ending subsidies to development and production of large civil aircraft . last year , the us terminated an agreement with the eu , reached in 1992 , which limits the subsidies countries can hand over to civil aircraft makers . the us filed a complaint against 

In [7]:
pd.DataFrame(Summarizer.summaryScore(TextRankSummary,summary,Avg=True))

Unnamed: 0,rouge-1,rouge-2,rouge-l
r,0.638095,0.532051,0.638095
p,0.985294,0.932584,0.985294
f,0.774566,0.677551,0.774566


####  II ) LexRank

In [8]:
#Applying LexRank on data
LexRankSummary=Summarizer.extracitve(article,LexRankSummarizer,3)

In [9]:

print("The Whole Article : \n------------------\n",article)
print("------------------------------------------------------------------------------------------------------------------")
print("Summary (LexRank) : \n------------------\n",LexRankSummary)
print("------------------------------------------------------------------------------------------------------------------")
print("Our Summary Ref : \n------------------\n",summary)


The Whole Article : 
------------------
 the eu and us have agreed to begin talks on ending subsidies given to aircraft makers , eu trade commissioner peter mandelson has announced . , both sides hope to reach a negotiated deal over state aid received by european aircraft maker airbus and its us rival boeing , mr mandelson said . airbus and boeing accuse each other of benefiting from illegal subsidies . mr mandelson said the eu and us hoped to avoid having to resolve the dispute at the world trade organisation wto . , with this agreement the eu and us have confirmed their willingness to resolve the dispute which has arisen between them , mr mandelson said . i hope our negotiations in the next three months will lead to an agreement ending subsidies to development and production of large civil aircraft . last year , the us terminated an agreement with the eu , reached in 1992 , which limits the subsidies countries can hand over to civil aircraft makers . the us filed a complaint against 

In [10]:
pd.DataFrame(Summarizer.summaryScore(article,summary, Avg=True))

Unnamed: 0,rouge-1,rouge-2,rouge-l
r,0.961905,0.935897,0.961905
p,0.528796,0.447853,0.528796
f,0.682432,0.605809,0.682432


## b) Abstractive 

####  BART Transformer

In [11]:
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-xsum")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large-xsum")

In [12]:
BARTSummary=Summarizer.abstractive(article,tokenizer,model)

  next_indices = next_tokens // vocab_size


In [13]:

print("The Whole Article : \n------------------\n",article)
print("------------------------------------------------------------------------------------------------------------------")
print("BARTSummary : \n------------------\n",BARTSummary)
print("------------------------------------------------------------------------------------------------------------------")
print("Our Summary Ref : \n------------------\n",summary)


The Whole Article : 
------------------
 the eu and us have agreed to begin talks on ending subsidies given to aircraft makers , eu trade commissioner peter mandelson has announced . , both sides hope to reach a negotiated deal over state aid received by european aircraft maker airbus and its us rival boeing , mr mandelson said . airbus and boeing accuse each other of benefiting from illegal subsidies . mr mandelson said the eu and us hoped to avoid having to resolve the dispute at the world trade organisation wto . , with this agreement the eu and us have confirmed their willingness to resolve the dispute which has arisen between them , mr mandelson said . i hope our negotiations in the next three months will lead to an agreement ending subsidies to development and production of large civil aircraft . last year , the us terminated an agreement with the eu , reached in 1992 , which limits the subsidies countries can hand over to civil aircraft makers . the us filed a complaint against 

In [14]:
pd.DataFrame(Summarizer.summaryScore(BARTSummary,summary, Avg=True))


Unnamed: 0,rouge-1,rouge-2,rouge-l
r,0.342857,0.179487,0.333333
p,0.521739,0.282828,0.507246
f,0.413793,0.219608,0.402299
