<a href="https://colab.research.google.com/github/ougrid/my-knowledge-resource/blob/master/Text_Summarization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import nltk
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
nltk.download('punkt')
# Input text to be summarized
text = """
Text summarization is the process of automatically generating a concise and coherent summary from a longer piece of text, such as an article, document, or web page. It aims to capture the main ideas, important information, and key points of the original text in a shorter form, allowing users to quickly grasp the essence of the text without having to read the entire content.

There are generally two approaches to text summarization:

1. Extractive Summarization: In this approach, key sentences or phrases are selected from the original text and combined to form a summary. These selected sentences or phrases are usually considered to be the most important or representative ones in the original text. Extractive summarization involves identifying and ranking sentences based on their relevance and importance to the overall content.

2. Abstractive Summarization: In this approach, a summary is generated by paraphrasing and rephrasing the content of the original text, using natural language generation techniques. Abstractive summarization involves generating new sentences that may not be present in the original text, but still capture the main ideas and essence of the original content. This approach requires a higher level of natural language understanding and generation capabilities compared to extractive summarization.

Text summarization has various applications, including but not limited to:

- News summarization: Summarizing news articles to provide concise summaries for readers who are short on time.
- Document summarization: Generating summaries of lengthy documents or reports for quick review or reference.
- Content curation: Automatically summarizing and curating content from multiple sources for social media or content marketing purposes.
- Information retrieval: Summarizing search results or query results to provide brief descriptions of documents or web pages.
- Chatbots: Summarizing user inputs or system responses in conversational interfaces for better user experience.

Text summarization techniques can be implemented using various natural language processing (NLP) and machine learning (ML) algorithms, such as algorithms based on statistical methods, deep learning, or transformer models like BERT, GPT-2, etc. These techniques can be implemented in Python using libraries such as NLTK, Gensim, Transformers, and others.
"""

# Tokenize the text into sentences
sentences = nltk.sent_tokenize(text)

# Create a CountVectorizer to convert sentences into vectors
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(sentences)

# Compute cosine similarity between sentence vectors
similarity_matrix = cosine_similarity(X, X)

# Create a list to store sentence scores
sentence_scores = []

# Calculate the scores of each sentence
for i in range(len(sentences)):
    sentence_score = sum(similarity_matrix[i]) - similarity_matrix[i][i]
    sentence_scores.append(sentence_score)

# Sort the sentences based on scores
sorted_sentences = [sentences[i] for i in sorted(range(len(sentence_scores)), key=sentence_scores.__getitem__, reverse=True)]

# Select the top 3 sentences as the summary
summary = ' '.join(sorted_sentences[:3])

# Print the summary
print("Summary:")
print(summary)


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


Summary:
Extractive Summarization: In this approach, key sentences or phrases are selected from the original text and combined to form a summary. Abstractive Summarization: In this approach, a summary is generated by paraphrasing and rephrasing the content of the original text, using natural language generation techniques. It aims to capture the main ideas, important information, and key points of the original text in a shorter form, allowing users to quickly grasp the essence of the text without having to read the entire content.
