# Text Summarization

Text summarization refers to the technique of shortening long pieces of text. The intention is to create a coherent and fluent summary having only the main points outlined in the document.
Automatic text summarization is a common problem in machine learning and natural language processing (NLP).

There are broadly two different approaches that are used for text summarization:

* Extractive Summarization
* Abstractive Summarization


In this notebook, we will build an extraction based text summarizer using python

### Extractive Summarization
The name gives away what this approach does. We identify the important sentences or phrases from the original text and extract only those from the text. Those extracted sentences would be our summary. The below diagram illustrates extractive summarization:

![Extractive Summarization](images/Extractive_Summarization.png)

# PyTeaser

PyTeaser takes any news article and extract a brief summary from it. It's based on the original Scala project [TextTeaer](https://github.com/MojoJolo/textteaser).

TextTeaser is an automatic summarization algorithm that combines the power of natural language processing and machine learning to produce good results.

Summaries are created by ranking sentences in a news article according to how relevant they are to the entire text. The top 5 sentences are used to form a "summary". Each sentence is ranked by using four criteria:

* Relevance to the title
* Relevance to keywords in the article
* Position of the sentence
* Length of the sentence

# Text Summarization using PyTeaser
https://github.com/alanbuxton/PyTeaserPython3

In [None]:
# Get the pyteaser.py file from github

# import package

In [13]:
from pyteaser import SummarizeUrl,Summarize,keywords
import pandas as pd

In [8]:
# ! wget http://mlg.ucd.ie/files/datasets/bbc-fulltext.zip
# ! unzip bbc-fulltext.zip

In [28]:
text = "AgustaWestland chopper scam co-accused Rajiv Saxena was extradited to India from UAE on Wednesday. He had been evading the Enforcement Directorate's summons claiming he was suffering from leukaemia but had moved an anti-money laundering court for anticipatory bail in December, stating he had never been summoned at his Dubai address. Saxena's lawyers alleged he had been illegally extradited. "

In [29]:
title = "AgustaWestland scam accused Rajiv Saxena extradited to India"

# Text Summarization using PyTeaser

In [23]:
summaries = Summarize(text,title)

In [24]:
print(summaries)

['AgustaWestland scam accused Rajiv Saxena extradited to India']


In [26]:
# url = 'http://www.huffingtonpost.com/2013/11/22/twitter-forward-secrecy_n_4326599.html'
# summaries = SummarizeUrl(url)
# print(summaries)