here's a step-by-step guide to implementing a text summarization code using Python and the Natural Language Toolkit (NLTK) library.** **

**bold text**

In [1]:
# Step 1: Install the NLTK library

!pip install nltk

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
# Step 2: Import the necessary libraries and download the required NLTK packages
import nltk
nltk.download('punkt')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [3]:
# Step 3: Load the text to be summarized
text = """
The Quick Brown Fox Jumps Over The Lazy Dog. 
A red sun rises, blood has been spilled this night. 
It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.
"""

In [4]:
# Step 4: Tokenize the text into sentences and words using NLTK's sent_tokenize() and word_tokenize() functions
from nltk.tokenize import sent_tokenize, word_tokenize
sentences = sent_tokenize(text)
words = [word_tokenize(sentence) for sentence in sentences]

In [5]:
# Step 5: Remove stop words and punctuations from the tokenized words using NLTK's stopwords corpus
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))

filtered_words = []
for sentence in words:
    filtered_words.append([word for word in sentence if word.lower() not in stop_words and word.isalpha()])

In [6]:
# Step 6: Calculate the word frequency for each word in the filtered words using Python's Counter() function
from collections import Counter
word_frequencies = Counter()
for sentence in filtered_words:
    for word in sentence:
        word_frequencies[word] += 1

In [7]:
#Step 7: Calculate the sentence scores based on the word frequency and sentence length using a simple algorithm
sentence_scores = {}
for i, sentence in enumerate(filtered_words):
    score = 0
    for word in sentence:
        score += word_frequencies[word]
    sentence_scores[i] = score / len(sentence)

In [8]:
# Step 8: Select the top N sentences with the highest scores to generate the summary
import heapq
summary_sentences = heapq.nlargest(2, sentence_scores, key=sentence_scores.get)

summary = ''
for i in summary_sentences:
    summary += sentences[i].strip() + ' '

print(summary)


The Quick Brown Fox Jumps Over The Lazy Dog. A red sun rises, blood has been spilled this night. 


In [9]:
# In this example, we selected the top 2 sentences with the highest scores to generate the summary. You can adjust this value to get a longer or shorter summary.

#To test the code, you can use the following text:
text = """
Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects. Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly, procedural), object-oriented, and functional programming. Python is often described as a "batteries included" language due to its comprehensive standard library.
"""
