# Text Summarization

Summarization condenses a longer document into a short version while retaining core information. When this is done through a computer, we call it Automatic Text Summarization. This process can be seen as a form of compression, and it inevitably suffers from information loss, but it is essential to tackle the information overload due to abundance of textual material available on the internet, which needs to be effectively summarized to be useful.

In [20]:
# Import necessary libraries
from gensim.summarization.summarizer import summarize
from nltk.tokenize import sent_tokenize

from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.text_rank import TextRankSummarizer


### Load Data

In [9]:
with open('../data/text_summary1.txt') as f:
    text = f.read()

print(text)

A vaccine for the coronavirus will likely be ready by early 2021 but rolling it out safely across India's 1.3 billion people will be the country's biggest challenge in fighting its surging epidemic, a leading vaccine scientist told Bloomberg.
India, which is host to some of the front-runner vaccine clinical trials, currently has no local infrastructure in place to go beyond immunizing babies and pregnant women, said Gagandeep Kang, professor of microbiology at the Vellore-based Christian Medical College and a member of the WHO's Global Advisory Committee on Vaccine Safety.
The timing of the vaccine is a contentious subject around the world. In the U.S., President Donald Trump has contradicted a top administration health expert by saying a vaccine would be available by October. In India, Prime Minister Narendra Modi's government had promised an indigenous vaccine as early as mid-August, a claim the government and its apex medical research body has since walked back.


In [16]:
## Count number of sentences 
sentences = sent_tokenize(text)
print(len(sentences))

5


## Extractive

### Gensim 

Word2Vec Model

In [18]:
# Summarize text using gensim
gen_summary = summarize(text, ratio=0.5)
print(gen_summary)

A vaccine for the coronavirus will likely be ready by early 2021 but rolling it out safely across India's 1.3 billion people will be the country's biggest challenge in fighting its surging epidemic, a leading vaccine scientist told Bloomberg.
In India, Prime Minister Narendra Modi's government had promised an indigenous vaccine as early as mid-August, a claim the government and its apex medical research body has since walked back.


### TextRank

Page Rank - Cosine similarity

In [33]:
parser = PlaintextParser.from_string(text,Tokenizer("english"))
summarizer = TextRankSummarizer()
summary =summarizer(parser.document,2)
text_summary=""
for sentence in summary:
    text_summary = text_summary+ " " + str(sentence) 
    
print(text_summary)

 A vaccine for the coronavirus will likely be ready by early 2021 but rolling it out safely across India's 1.3 billion people will be the country's biggest challenge in fighting its surging epidemic, a leading vaccine scientist told Bloomberg. India, which is host to some of the front-runner vaccine clinical trials, currently has no local infrastructure in place to go beyond immunizing babies and pregnant women, said Gagandeep Kang, professor of microbiology at the Vellore-based Christian Medical College and a member of the WHO's Global Advisory Committee on Vaccine Safety.


### LexRank

In [26]:
from sumy.summarizers.lex_rank import LexRankSummarizer
summarizer_lex = LexRankSummarizer()

# Summarize using sumy LexRank
summary= summarizer_lex(parser.document, 2)

lex_summary=""
for sentence in summary:
    lex_summary+=str(sentence) 
    
print(lex_summary)

A vaccine for the coronavirus will likely be ready by early 2021 but rolling it out safely across India's 1.3 billion people will be the country's biggest challenge in fighting its surging epidemic, a leading vaccine scientist told Bloomberg.India, which is host to some of the front-runner vaccine clinical trials, currently has no local infrastructure in place to go beyond immunizing babies and pregnant women, said Gagandeep Kang, professor of microbiology at the Vellore-based Christian Medical College and a member of the WHO's Global Advisory Committee on Vaccine Safety.
