## What is the purpose of Text Summarization in NLP?
<code><b>"Automatic text summarization is the task of producing a concise and fluent summary while preserving key information content and overall meaning"</b></code>

In general, two different approaches are used for text summarization:
#### 1. Extractive Summarization

Only the most important sentences or phrases are extracted from the original text.

#### 2. Abstractive Summarization

It is the opposite of extractive summarization, in which an exact sentence is used to generate new sentences summary by identifying the most important information of the original group of sentences. It is possible that these new sentences summary do not appear in the original sentences.

In [1]:
# 1. Installing spaCy library and downloading small model (uncomment if you want to install)
!pip install -U spacy
!python -m spacy download en_core_web_sm

Collecting spacy
  Downloading spacy-3.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.6/6.6 MB[0m [31m39.8 MB/s[0m eta [36m0:00:00[0m
Collecting weasel<0.4.0,>=0.1.0 (from spacy)
  Downloading weasel-0.3.4-py3-none-any.whl (50 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.1/50.1 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
Collecting cloudpathlib<0.17.0,>=0.7.0 (from weasel<0.4.0,>=0.1.0->spacy)
  Downloading cloudpathlib-0.16.0-py3-none-any.whl (45 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.0/45.0 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: cloudpathlib, weasel, spacy
  Attempting uninstall: spacy
    Found existing installation: spacy 3.6.1
    Uninstalling spacy-3.6.1:
      Successfully uninstalled spacy-3.6.1
[31mERROR: pip's dependency resolver does not currently take into account all the packages t

In [4]:
## 2.Importing required libraries

import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation
from heapq import nlargest

##3. Input text
# Take Input text from any article
text=input()

print("Length of original text:",len(text))

#4. Defining the textSummarizer function

def textSummarizer(text, percentage):

    # load the model into spaCy
    nlp = spacy.load('en_core_web_sm')

    # pass the text into the nlp function
    doc= nlp(text)

    ## The score of each word is kept in a frequency table
    tokens=[token.text for token in doc]
    freq_of_word=dict()

    # Text cleaning and vectorization
    for word in doc:
        if word.text.lower() not in list(STOP_WORDS):
            if word.text.lower() not in punctuation:
                if word.text not in freq_of_word.keys():
                    freq_of_word[word.text] = 1
                else:
                    freq_of_word[word.text] += 1

    # Maximum frequency of word
    max_freq=max(freq_of_word.values())

    # Normalization of word frequency
    for word in freq_of_word.keys():
        freq_of_word[word]=freq_of_word[word]/max_freq

    # In this part, each sentence is weighed based on how often it contains the token.
    sent_tokens= [sent for sent in doc.sents]
    sent_scores = dict()
    for sent in sent_tokens:
        for word in sent:
            if word.text.lower() in freq_of_word.keys():
                if sent not in sent_scores.keys():
                    sent_scores[sent]=freq_of_word[word.text.lower()]
                else:
                    sent_scores[sent]+=freq_of_word[word.text.lower()]


    len_tokens=int(len(sent_tokens)*percentage)

    # Summary for the sentences with maximum score. Here, each sentence in the list is of spacy.span type
    summary = nlargest(n = len_tokens, iterable = sent_scores,key=sent_scores.get)

    # Prepare for final summary
    final_summary=[word.text for word in summary]

    #convert to a string
    summary=" ".join(final_summary)

    # Return final summary
    return summary

# 5.# Calling the textSummarizer with two arguments (input, percentage of summary)
final_summary = textSummarizer(text, 0.3)

# 6. Calling the textSummarizer with two arguments (input, percentage of summary)
print("#"*50)
print("Summary of the text")
print("Length of summarized text:",len(final_summary))
print("#"*50)
print()
print(final_summary)

There are many techniques available to generate extractive summarization to keep it simple, I will be using an unsupervised learning approach to find the sentences similarity and rank them. Summarization can be defined as a task of producing a concise and fluent summary while preserving key information and overall meaning. One benefit of this will be, you don’t need to train and build a model prior start using it for your project. It’s good to understand Cosine similarity to make the best use of the code you are going to see. Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them. Its measures cosine of the angle between vectors. The angle will be 0 if sentences are similar.
Length of original text: 779
##################################################
Summary of the text
Length of summarized text: 341
##################################################

Cosine similarity is a measure of sim