# News Summarization

### Scrape the article you want to summarize

In [1]:
from scraping import return_single_article
article = return_single_article('https://www.cnn.com/2020/10/20/politics/joe-biden-tax-plan/index.html')

In [4]:
print(article['title'], '\n\n')
print(article['authors'], '\n\n')
print(article['source'], '\n\n')
print(article['article'], '\n\n')

What you need to know about Joe Biden's tax plan 


By Analysis Katie Lobosco 


CNN 


Washington (CNN) Democratic presidential candidate Joe Biden has put forth several proposals that would change the tax code.

In general, he's proposing to raise taxes on the wealthy and on corporations by reversing some of the Republican-backed tax cuts that President Donald Trump signed into law in 2017.

It's unlikely that Biden's campaign plans would come to fruition just as he's proposed them, even if he wins the election. He'd have an easier time getting them passed if Democrats also take back the Senate and maintain control of the House.

Here's what you need to know:

Biden pledges not raise taxes on anyone earning less than $400,000 




### Summarization

In [5]:
from hf_summarizer import bart_summarize

In [6]:
# this function uses this model: https://huggingface.co/sshleifer/distilbart-cnn-12-6
# which is a bert-like model trained on the cnn-dailymail dataset
# its one of the best summarization models available in transformers, very fast, good for deployment
summary = bart_summarize(article['article'])

In [7]:
summary

" Democratic presidential candidate Joe Biden has put forth several proposals that would change the tax code. He's proposing to raise taxes on the wealthy and on corporations by reversing some of the Republican-backed tax cuts signed into law in 2017. It's unlikely that Biden's campaign plans would come to fruition just as he's proposed them, even if he wins the election."

#### There are many other summary options available in hf_summarizer.py including different pegasus models, t5 and methods that address the issues with these summarizers (like the fact that some of these summarizers can only have limited size inputs)

### Statistical Summarization

In [8]:
from statistical_summarize import run_statistical_summarizers

In [9]:
run_statistical_summarizers(text=article['article'], num_sentences=5)

**********Statistical Summarizations**********


TF IDF Summary:
Washington (CNN) Democratic presidential candidate Joe Biden has put forth several proposals that would change the tax code.

In general, he's proposing to raise taxes on the wealthy and on corporations by reversing some of the Republican-backed tax cuts that President Donald Trump signed into law in 2017.

It's unlikely that Biden's campaign plans would come to fruition just as he's proposed them, even if he wins the election. He'd have an easier time getting them passed if Democrats also take back the Senate and maintain control of the House.

Here's what you need to know:

Biden pledges not raise taxes on anyone earning less than $400,000


Word Frequency Summary:
Washington (CNN) Democratic presidential candidate Joe Biden has put forth several proposals that would change the tax code.

In general, he's proposing to raise taxes on the wealthy and on corporations by reversing some of the Republican-backed tax cuts that

#### These statistical techniques do not use any Machine Learning algorithms but are very fast and give a decent result

### Sentiment Analysis

In [10]:
from sentiment_analysis import hf_topn_sentiment

In [11]:
# this function returns the top positive and top negative sentences from a piece of text,
# not exactly summarization but can get the most polarizing lines from an article
top_positive, top_negative = hf_topn_sentiment(article['article'])

In [12]:
top_positive

[(0.5798031091690063,
  "In general, he's proposing to raise taxes on the wealthy and on corporations by reversing some of the Republican-backed tax cuts that President Donald Trump signed into law in 2017."),
 (0.9804840087890625,
  'Washington (CNN) Democratic presidential candidate Joe Biden has put forth several proposals that would change the tax code.')]

In [13]:
top_negative

[(0.9953889846801758,
  "It's unlikely that Biden's campaign plans would come to fruition just as he's proposed them, even if he wins the election."),
 (0.999197244644165,
  "He'd have an easier time getting them passed if Democrats also take back the Senate and maintain control of the House.")]

### Theres serveral other good methods hidden in here, like subjectivity analysis and a plagiarism checker. All of this is wrapped in an easy to use web app in my `jweissenberger/newsletter` repo