# TEXT SUMMARY USING NLTK

## _MINOR PROJECT BY RITBIK BHARTI_

In [47]:
#IMPORT IMPORTANT LIBRARIES
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
from string import punctuation

In [48]:
text="""
On 23 November 2008, the first public acknowledgement of an unmanned mission to Mars was announced by
then-ISRO chairman G. Madhavan Nair.The MOM mission concept began with a feasibility study in 2010 by
the Indian Institute of Space Science and Technology after the launch of lunar satellite Chandrayaan-1
in 2008. Prime Minister Manmohan Singh approved the project on 3 August 2012, after the Indian Space
Research Organisation completed ₹125 crore (US$18 million) of required studies for the orbiter.The
total project cost may be up to ₹454 crore (US$66 million). The satellite costs ₹153 crore (US$22 million)
and the rest of the budget has been attributed to ground stations and relay upgrades that will be used for
other ISRO projects. The space agency had planned the launch on 28 October 2013 but was postponed to 5 November
following the delay in ISRO's spacecraft tracking ships to take up pre-determined positions due to poor
weather in the Pacific Ocean. Launch opportunities for a fuel-saving Hohmann transfer orbit occur every 26
months, in this case the next two would be in 2016 and 2018. Assembly of the PSLV-XL launch vehicle, designated
C25, started on 5 August 2013. The mounting of the five scientific instruments was completed at Indian Space
Research Organisation Satellite Centre, Bengaluru, and the finished spacecraft was shipped to Sriharikota on
2 October 2013 for integration to the PSLV-XL launch vehicle. The satellite's development was fast-tracked
and completed in a record 15 months. Despite the US federal government shutdown, NASA reaffirmed on 5 October
2013 it would provide communications and navigation support to the mission. During a meeting on 30 September
2014, NASA and ISRO officials signed an agreement to establish a pathway for future joint missions to explore Mars.
One of the working group's objectives will be to explore potential coordinated observations and science analysis
between the MAVEN orbiter and MOM, as well as other current and future Mars missions.
"""

In [49]:
# CREATE SENTENCES
sentences = sent_tokenize(text)

In [50]:
#SET STOPWORDS INCLUDING PUNCTUATIONS
stopwords = set(stopwords.words('english') + list(punctuation))

### Now we’ll tokenise words. We also calculate word frequency. This measure will give us a metric to calculate if a word is important on the corpus. Here, we’ll use TF-IDF. TF-IDF is a metric that takes into account term frequency both in a single document and in the all corpus. So, a high TF-IDF occurs when a term has high frequency in a single document and low frequency on the whole corpus.

In [51]:
import nltk
import string

from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.stem.wordnet import WordNetLemmatizer

def tokenize(text):
    tokens = nltk.word_tokenize(text)
    stems = []
    for item in tokens:
        stems.append(WordNetLemmatizer().lemmatize(item))
    return stems

tfidf = TfidfVectorizer(tokenizer=tokenize, 
                        stop_words=stopwords)
tfs = tfidf.fit_transform([text])

# FREQUENCIES
freqs = {}

feature_names = tfidf.get_feature_names()
for col in tfs.nonzero()[1]:
    freqs[feature_names[col]] = tfs[0, col]

### Now, we will calculate which are the most important sentences based on the presence of important words on them

In [52]:
from collections import defaultdict

important_sentences = defaultdict(int)

for i, sentence in enumerate(sentences):
    for token in word_tokenize(sentence.lower()):
        if token in freqs:
            important_sentences[i] += freqs[token]

### And now last step is to build the summary. An important thing to do here is to choose how many sentences will be present in our summary. I have decided to build a summary with 10% of the original number of sentences.

In [53]:
from heapq import nlargest
import operator

# Choose 10% of the text to show
number_sentences = int(len(sentences) * 0.10)

# Create an index with the most important sentences
index_important_sentences = nlargest(number_sentences, 
                                   important_sentences, 
                                   important_sentences.get)

    
# Create summary
print('\nSumary:\n')
for i in sorted(index_important_sentences):
    print(sentences[i]+'\n')



Sumary:


On 23 November 2008, the first public acknowledgement of an unmanned mission to Mars was announced by
then-ISRO chairman G. Madhavan Nair.The MOM mission concept began with a feasibility study in 2010 by
the Indian Institute of Space Science and Technology after the launch of lunar satellite Chandrayaan-1
in 2008.

