# Extractive Text Sumarization

* Select sentences from a document that best represent its content and are arranged to form a summary.

* Uses unsupervised method of cosine similarity. Cosine similarity is the cosine of the angle between two vectors. Its 0 if the two vectors are identical.


In [0]:
from nltk.cluster.util import cosine_distance
from nltk.corpus import stopwords
import numpy as np
import networkx as nx

In [0]:
maintext='''
In an attempt to build an AI-ready workforce, Microsoft announced Intelligent Cloud Hub which has been launched to empower the next generation of students with AI-ready skills.\n
Envisioned as a three-year collaborative program, Intelligent Cloud Hub will support around 100 institutions with AI infrastructure, course content and curriculum, developer support, development tools and give students access to cloud and AI services.\n
As part of the program, the Redmond giant which wants to expand its reach and is planning to build a strong developer ecosystem in India with the program will set up the core AI infrastructure and IoT Hub for the selected campuses.\n
The company will provide AI development tools and Azure AI services such as Microsoft Cognitive Services, Bot Services and Azure Machine Learning.\n
According to Manish Prakash, Country General Manager-PS, Health and Education, Microsoft India, said, "With AI being the defining technology of our time, it is transforming lives and industry and the jobs of tomorrow will require a different skillset.\n
This will require more collaborations and training and working with AI.\n
That’s why it has become more critical than ever for educational institutions to integrate new cloud and AI technologies.\n
The program is an attempt to ramp up the institutional set-up and build capabilities among the educators to educate the workforce of tomorrow.\n
" The program aims to build up the cognitive skills and in-depth understanding of developing intelligent cloud connected solutions for applications across industry.\n
Earlier in April this year, the company announced Microsoft Professional Program In AI as a learning track open to the public.\n
The program was developed to provide job ready skills to programmers who wanted to hone their skills in AI and data science with a series of online courses which featured hands-on labs and expert instructors as well.\n
This program also included developer-focused AI school that provided a bunch of assets to help build AI skills.\n
'''

In [39]:
sents=maintext.split("\n")
sents[:5]

['',
 'In an attempt to build an AI-ready workforce, Microsoft announced Intelligent Cloud Hub which has been launched to empower the next generation of students with AI-ready skills.',
 '',
 'Envisioned as a three-year collaborative program, Intelligent Cloud Hub will support around 100 institutions with AI infrastructure, course content and curriculum, developer support, development tools and give students access to cloud and AI services.',
 '']

In [0]:
import re
def cleaner(sentence):
  tokens=re.findall('[a-z A-z]+',sentence)
  return ' '.join([token.lower() for token in tokens])

In [0]:
cleaned_sentences=list(map(cleaner,sents))

In [42]:
cleaned_sentences[:20]

['',
 'in an attempt to build an ai ready workforce  microsoft announced intelligent cloud hub which has been launched to empower the next generation of students with ai ready skills',
 '',
 'envisioned as a three year collaborative program  intelligent cloud hub will support around   institutions with ai infrastructure  course content and curriculum  developer support  development tools and give students access to cloud and ai services',
 '',
 'as part of the program  the redmond giant which wants to expand its reach and is planning to build a strong developer ecosystem in india with the program will set up the core ai infrastructure and iot hub for the selected campuses',
 '',
 'the company will provide ai development tools and azure ai services such as microsoft cognitive services  bot services and azure machine learning',
 '',
 'according to manish prakash  country general manager ps  health and education  microsoft india  said   with ai being the defining technology of our time  i

In [43]:
len(cleaned_sentences)

26

In [0]:
cleaned_sentences=[sent for sent in cleaned_sentences if len(sent.split(' '))>=5]

In [45]:
cleaned_sentences[:10]

['in an attempt to build an ai ready workforce  microsoft announced intelligent cloud hub which has been launched to empower the next generation of students with ai ready skills',
 'envisioned as a three year collaborative program  intelligent cloud hub will support around   institutions with ai infrastructure  course content and curriculum  developer support  development tools and give students access to cloud and ai services',
 'as part of the program  the redmond giant which wants to expand its reach and is planning to build a strong developer ecosystem in india with the program will set up the core ai infrastructure and iot hub for the selected campuses',
 'the company will provide ai development tools and azure ai services such as microsoft cognitive services  bot services and azure machine learning',
 'according to manish prakash  country general manager ps  health and education  microsoft india  said   with ai being the defining technology of our time  it is transforming lives a

In [46]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [0]:
stopwords_list=stopwords.words('english')

In [0]:
def sentence_similarity(sent1,sent2):
  all_words=list(set(sent1+sent2))
  vector1=[0]*len((all_words))
  vector2=[0]*len((all_words))
  #generate vector for first sentence
  for w in sent1:
    if w not in stopwords_list:
      vector1[all_words.index(w)]+=1
  for w in sent2:
    if w not in stopwords_list:
      vector2[all_words.index(w)]+=1
  return 1 - cosine_distance(vector1, vector2)# returns 1 - cosine distance      

In [49]:
sentence_similarity('hi my name is hamza','hi my name is hamza')

1.0000000000000002

In [50]:
sentence_similarity('I like cats','Donald Trump is the president of the united states')

0.6750123615163116

In [0]:
def build_similarity_matrix(sentences):
  similarity_matrix=np.zeros((len(sentences),len(sentences)))
  n=len(sentences)
  for index1 in range(n):
    for index2 in range(n):
      if index1!=index2:
        similarity_matrix[index1][index2]=sentence_similarity(sentences[index1],sentences[index2])
  return similarity_matrix 

In [0]:
similarity_matrix=build_similarity_matrix(cleaned_sentences)

In [0]:
def generate_summary(similarity_matrix,sentences,top_n=5):
  sentence_similarity_graph=nx.from_numpy_array(similarity_matrix)
  scores=nx.pagerank(sentence_similarity_graph)
  ranked_sentence=sorted(((scores[i],s) for i,s in enumerate(sentences)),reverse=True)
  return ranked_sentence

In [0]:
ranked=generate_summary(similarity_matrix,cleaned_sentences)

In [57]:
summary_sentences=[]
for index in range(5):
  summary_sentences.append(ranked[index][1])
summary_sentences

['as part of the program  the redmond giant which wants to expand its reach and is planning to build a strong developer ecosystem in india with the program will set up the core ai infrastructure and iot hub for the selected campuses',
 'the program was developed to provide job ready skills to programmers who wanted to hone their skills in ai and data science with a series of online courses which featured hands on labs and expert instructors as well',
 'in an attempt to build an ai ready workforce  microsoft announced intelligent cloud hub which has been launched to empower the next generation of students with ai ready skills',
 'according to manish prakash  country general manager ps  health and education  microsoft india  said   with ai being the defining technology of our time  it is transforming lives and industry and the jobs of tomorrow will require a different skillset',
 'that s why it has become more critical than ever for educational institutions to integrate new cloud and ai 

In [61]:
print('.\n'.join(summary_sentences))

as part of the program  the redmond giant which wants to expand its reach and is planning to build a strong developer ecosystem in india with the program will set up the core ai infrastructure and iot hub for the selected campuses.
the program was developed to provide job ready skills to programmers who wanted to hone their skills in ai and data science with a series of online courses which featured hands on labs and expert instructors as well.
in an attempt to build an ai ready workforce  microsoft announced intelligent cloud hub which has been launched to empower the next generation of students with ai ready skills.
according to manish prakash  country general manager ps  health and education  microsoft india  said   with ai being the defining technology of our time  it is transforming lives and industry and the jobs of tomorrow will require a different skillset.
that s why it has become more critical than ever for educational institutions to integrate new cloud and ai technologies
