# Summarize text using Graph

in this project we will try to summarize a text using cosine distance and graph. 
we will be using word2vec to vectorize our words

# Import the libraries

In [1]:
pip install nltk

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
import numpy as np
import pandas as pd
import re
from gensim.models import Word2Vec
from scipy import spatial
import networkx as nx
import nltk
nltk.download('punkt')
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


# summarize the text
### we use OOPs to build this summarizer

In [14]:
class summarize:
  __slots__ = ['sentences', 'sentence_tokens', 'w2v', 'sentence_embeddings', 'similarity_matrix', 'nx_graph', 'top_sentence', 'scores']
  def __init__(self):
    pass
  
  
  # function to get the input paragraph
  def add_paragraph(self, Text):
    self.text = Text
  
  
  # function to process the summarizer
  def preprocessing(self):
    self.sentences = sent_tokenize(self.text)       # tokenizing the paragraph
    self.sentences_clean=[re.sub(r'[^\w\s]','',sentence.lower()) for sentence in self.sentences]      # cleaning symbols and other unwanted characters
    stop_words = stopwords.words('english')
    self.sentence_tokens = [[words for words in sentence.split(' ') if words not in stop_words] for sentence in self.sentences_clean]         # removing stopwords
    self.w2v=Word2Vec(self.sentence_tokens,size=1,min_count=1,iter=1000)
    self.sentence_embeddings = [[self.w2v[word][0] for word in words] for words in self.sentence_tokens]
    self.max_len=max([len(tokens) for tokens in self.sentence_tokens])
    self.sentence_embeddings=[np.pad(embedding,(0,self.max_len-len(embedding)),'constant') for embedding in self.sentence_embeddings]
    self.similarity_matrix = np.zeros([len(self.sentence_tokens), len(self.sentence_tokens)])
    for i,row_embedding in enumerate(self.sentence_embeddings):
      for j,column_embedding in enumerate(self.sentence_embeddings):
        self.similarity_matrix[i][j]=1-spatial.distance.cosine(row_embedding,column_embedding)
    self.nx_graph = nx.from_numpy_array(self.similarity_matrix)
    self.scores = nx.pagerank(self.nx_graph)
    self.top_sentence = {sentence:self.scores[index] for index,sentence in enumerate(self.sentences)}
  
  
  
  # ask user about summary size
  def summary_size(self,size):
    self.top=dict(sorted(self.top_sentence.items(), key=lambda x: x[1], reverse=True)[:size])
  
  
  
  # return the summary of the given text
  def get_summary(self):
    for sent in self.sentences:
      if sent in self.top.keys():
        print(sent)
    


# Now let's test our model

In [7]:
text1 = '''Bitcoin is a cryptocurrency, a virtual currency designed to act as money and a form of payment outside the control of any one person, group, or entity, and thus removing the need for third-party involvement in financial transactions. It is rewarded to blockchain miners for the work done to verify transactions and can be purchased on several exchanges.

Bitcoin was introduced to the public in 2009 by an anonymous developer or group of developers using the name Satoshi Nakamoto.

It has since become the most well-known cryptocurrency in the world. Its popularity has inspired the development of many other cryptocurrencies. These competitors either attempt to replace it as a payment system or are used as utility or security tokens in other blockchains and emerging financial technologies.'''

In [5]:
summ = summarize()
summ.add_paragraph("Quantum computing is a rapidly-emerging technology that harnesses the laws of quantum mechanics to solve problems too complex for classical computers. Today, IBM Quantum makes real quantum hardware -- a tool scientists only began to imagine three decades ago -- available to thousands of developers. Our engineers deliver ever-more-powerful superconducting quantum processors at regular intervals, building toward the quantum computing speed and capacity necessary to change the world. These machines are very different from the classical computers that have been around for more than half a century. Here's a primer on this transformative technology.")
summ.preprocessing()
summ.summary_size(4)
summ.get_summary()

Quantum computing is a rapidly-emerging technology that harnesses the laws of quantum mechanics to solve problems too complex for classical computers.
Today, IBM Quantum makes real quantum hardware -- a tool scientists only began to imagine three decades ago -- available to thousands of developers.
Our engineers deliver ever-more-powerful superconducting quantum processors at regular intervals, building toward the quantum computing speed and capacity necessary to change the world.
These machines are very different from the classical computers that have been around for more than half a century.


  


In [9]:

summary = summarize()
summary.add_paragraph(text1)
summary.preprocessing()
summary.summary_size(4)
summary.get_summary()

It is rewarded to blockchain miners for the work done to verify transactions and can be purchased on several exchanges.
Bitcoin was introduced to the public in 2009 by an anonymous developer or group of developers using the name Satoshi Nakamoto.
Its popularity has inspired the development of many other cryptocurrencies.
These competitors either attempt to replace it as a payment system or are used as utility or security tokens in other blockchains and emerging financial technologies.


  


In [10]:
text2 = '''Banks are a very important part of the economy because they provide vital services for both consumers and businesses. As financial services providers, they give you a safe place to store your cash. Through a variety of account types such as checking and savings accounts and certificates of deposit (CDs), you can conduct routine banking transactions like deposits, withdrawals, check writing, and bill payments. You can also save your money and earn interest on your investment. The money stored in most bank accounts is federally insured by the Federal Deposit Insurance Corporation (FDIC), up to a limit of $250,000 for individual depositors and $500,000 for jointly held deposits.
1

Banks also provide credit opportunities for people and corporations. The bank lends the money you deposit at the bank—short-term cash—to others for long-term debt such as car loans, credit cards, mortgages, and other debt vehicles. This process helps create liquidity in the market—which creates money and keeps the supply going.

Just like any other business, the goal of a bank is to earn a profit for its owners. For most banks, the owners are their shareholders. Banks do this by charging more interest on the loans and other debt they issue to borrowers than what they pay to people who use their savings vehicles. For a simple example, a bank that pays 1% interest on savings accounts and charges 6% interest for loans earns a gross profit of 5% for its owners.'''

In [11]:
summary = summarize()
summary.add_paragraph(text2)
summary.preprocessing()
summary.summary_size(4)
summary.get_summary()

Banks are a very important part of the economy because they provide vital services for both consumers and businesses.
As financial services providers, they give you a safe place to store your cash.
This process helps create liquidity in the market—which creates money and keeps the supply going.
Banks do this by charging more interest on the loans and other debt they issue to borrowers than what they pay to people who use their savings vehicles.


  
