# Text Summarization 
**Text summarization** refers to the process of generating a concise and coherent summary of a longer text while preserving its key information and main ideas. It involves reducing the length of the original text while retaining the essential content and meaning.

Text summarization can be classified into two main categories:

1. **Extractive Summarization**: Extractive summarization involves identifying and selecting important sentences or phrases from the original text and assembling them to form a summary. The selected sentences are typically the ones that contain crucial information, such as key terms, facts, or opinions. Extractive summarization does not generate new sentences but instead extracts and rearranges the existing content.

2. **Abstractive Summarization**: Abstractive summarization aims to generate a summary that may contain words, phrases, or sentences not present in the original text. It involves understanding the meaning and context of the text and generating new sentences that capture the essence of the original content. Abstractive summarization often requires advanced natural language processing techniques, such as language generation models or neural networks.

Text summarization has several applications:

- Information Retrieval: Summaries provide condensed versions of documents, making it easier for users to quickly assess the relevance and key points of a document before delving deeper.

- News and Article Summarization: News agencies and content platforms often use summarization techniques to generate short summaries of news articles, blog posts, or other textual content to engage readers and provide a quick overview.

- Document Summarization: For lengthy documents or reports, summarization can save time and effort by providing concise summaries that capture the main ideas and important details.

- Meeting or Conversation Summarization: In corporate or professional settings, text summarization can be used to generate summaries of meeting transcripts or chat conversations, allowing participants to review key discussions and decisions efficiently.

- Social Media and Online Reviews: On social media platforms or review websites, summarization techniques can be employed to generate brief overviews or snippets of user-generated content to provide users with quick insights or recommendations.


The below sample is adapted from [here](https://www.kaggle.com/code/itsmohammadshahid/nlp-text-summarizer-using-spacy)

In [None]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation
from collections import Counter
from heapq import nlargest

import pandas as pd

In order to use various spacy models, you'll need to download them. Please run the following snippets of code in your local terminal app to download the models:
- `python -m spacy download en_core_web_sm`
- `python3 -m spacy download en_core_web_lg`

In [None]:
# Import the english language model
nlp = spacy.load("en_core_web_sm")

We'll use this short blurb about MSFTs Intelligent Cloud Hub

In [None]:
text = """"In an attempt to build an AI-ready workforce, Microsoft announced Intelligent Cloud Hub which has 
been lanched to empower the next generation of students with AI-ready skills. Envisioned as a three-year 
collaborative program, Intelligent Cloud Hub will support around 100 institutions with AI infrastructure, 
course content and curriculum, developer support, development tools and give students access to cloud and 
AI services. As part of the program, the Redmond giant which wants to expand its reach and is planning to 
build a strong developer ecosystem in India with the program will set up the core AI infrastructure and IoT 
Hub for the selected campuses. The company will provide AI development tools and Azure AI services such as 
Microsoft Cognitive Services, Bot Services and Azure Machine Learning.According to Manish Prakash, 
Country General Manager-PS, Health and Education, Microsoft India, said, With AI being the defining technology
of our time, it is transforming lives and industry and the jobs of tomorrow will require a different skillset. 
This will require more collaborations and training and working with AI. That’s why it has become more critical 
than ever for educational institutions to integrate new cloud and AI technologies. The program is an attempt 
to ramp up the institutional set-up and build capabilities among the educators to educate the workforce of 
tomorrow. The program aims to build up the cognitive skills and in-depth understanding of developing 
intelligent cloud connected solutions for applications across industry. Earlier in April this year, 
the company announced Microsoft Professional Program In AI as a learning track open to the public. 
The program was developed to provide job ready skills to programmers who wanted to hone their skills 
in AI and data science with a series of online courses which featured hands-on labs and expert instructors 
as well. This program also included developer-focused AI school that provided a bunch of assets to help
build AI skills."""

In [None]:
doc = nlp(text)

In [None]:
## The score of each word is kept in a frequency table
tokens = [token.text for token in doc]
percentage = 0.2
freq_of_word = dict()

# Text cleaning and vectorization 
for word in doc:
    if word.text.lower() not in list(STOP_WORDS):
        if word.text.lower() not in punctuation:
            if word.text not in freq_of_word.keys():
                freq_of_word[word.text] = 1
            else:
                freq_of_word[word.text] += 1

# Maximum frequency of word
max_freq = max(freq_of_word.values())

# Normalization of word frequency
for word in freq_of_word.keys():
    freq_of_word[word]=freq_of_word[word]/max_freq
    
# In this part, each sentence is weighed based on how often it contains the token.
sent_tokens = [sent for sent in doc.sents]
sent_scores = dict()
for sent in sent_tokens:
    for word in sent:
        if word.text.lower() in freq_of_word.keys():
            if sent not in sent_scores.keys():                            
                sent_scores[sent]=freq_of_word[word.text.lower()]
            else:
                sent_scores[sent]+=freq_of_word[word.text.lower()]


len_tokens = int(len(sent_tokens)*percentage)

# Summary for the sentences with maximum score. Here, each sentence in the list is of spacy.span type
summary = nlargest(n = len_tokens, iterable = sent_scores,key=sent_scores.get)

# Prepare for final summary
final_summary = [word.text for word in summary]

#convert to a string
summary =" ".join(final_summary)

In [None]:
print(f"A summary of the text generated by spaCy: {summary}")

In addition to `spaCy`, other frameworks like `NLTK`, `genism`, and `transformers` (from `HuggingFace`) can be used for text summarization 