### Importing the dependencies

In [44]:
import math
import spacy
import string

### The Sample Text

In [2]:
# sample text to summarise
text = '''Sherlock (Benedict Cumberbatch) and John (Martin Freeman) receive a visit from Henry Knight (Russell Tovey), who witnessed his father's death by a "gigantic hound" at Dartmoor 20 years ago. After years of therapy, Henry revisited the site, only to see the hound again, prompting his request for help. Though initially dismissive, Sherlock is soon interested in Henry's use of "hound" instead of "dog". Sherlock and John arrive in Dartmoor to find the hound is a local legend. They visit Baskerville, a nearby Ministry of Defence research base, using Mycroft's (Mark Gatiss) security pass. After Mycroft's credentials cause a security alert, Dr. Bob Frankland (Clive Mantle) vouches for Sherlock's identity, despite knowing the truth. Frankland says he was a friend of Henry's father and is concerned for Henry's well-being.

Henry tells John and Sherlock about the words "Liberty" and "In" in his dreams. Sherlock, John, and Henry then visit the hollow in the hope of finding the hound. On the way, John notices what seems to be Morse code signals (these were unrelated; they were headlight flashes from a group of doggers). When Sherlock and Henry arrive at the hollow, they see the hound. At a local inn, Sherlock is visibly shaken and confesses he saw the hound. John tries calming him, suggesting he imagines things. Sherlock reacts with anger, denying there is something wrong with him. John tries to interview Henry's therapist, Louise Mortimer (Sasha Behar). However, they are interrupted by Frankland, who blows his cover. Meanwhile, Henry hallucinates the hound is stalking his home.

The next morning Sherlock realises "hound" may be an acronym rather than a word. The pair run into DI Lestrade (Rupert Graves) who was sent by Mycroft to keep an eye on Sherlock. They interrogate the innkeepers about a past order for meat that John has spotted, which struck him as odd for a vegetarian restaurant. The innkeepers kept a dog on the moor to boost the tourist trade but assured the investigators they had put it down. This explanation satisfies Lestrade but not Sherlock, who insists the dog he saw was monstrous. Calling Mycroft, Sherlock gains access to Baskerville again. Searching the lower levels of the genetics labs, John finds himself trapped and then hears growling, which he assumes is the hound. Locking himself in an empty cage, he calls Sherlock, who rescues him. Sherlock deduces a chemical weapon designed to trigger violent hallucinations was responsible. Retreating into his "mind palace", a memory technique, Sherlock realises "Liberty" and "In" stands for Liberty, Indiana. After viewing confidential files, he sees "H.O.U.N.D." was a secret C.I.A. project aimed at creating a hallucinatory anti-personnel chemical weapon. Nonetheless, the project was abandoned several years before. Sherlock realises Frankland, who participated in the project, has continued it secretly.

After John receives a call from Mortimer that Henry has run off with a gun, John, Sherlock, and Lestrade run to the hollow to find Henry about to commit suicide. Sherlock explains the hound was a hallucination; his father was killed by Frankland, wearing a gas mask and a sweatshirt with "H.O.U.N.D. Liberty, In" on it; a child could not cope with this, so his mind tricked him. Every time Henry came back, Frankland gassed him with the hallucinogen; the chemical agent is the fog they encountered at the hollow, triggered by pressure pads in the area. As Henry calms down, they all see the innkeepers' dog affected by the gas; John shoots it. Sherlock finds and catches Frankland at the scene. Henry realises that Frankland murdered Henry's father because he found him testing the drug. Frankland flees into the base's minefield and gets blown up. As Sherlock and John prepare to leave the following day, John wonders why he saw the hound in the lab despite not having inhaled the gas from the hollow. Sherlock surmises that the leaking pipes poisoned John in the laboratory. John realises Sherlock locked him in the labs to test his theory. He also points out Sherlock was wrong for once; he believed the drug was in Henry's sugar and put it in John's coffee.

In the closing scenes, Mycroft oversees the release of Jim Moriarty (Andrew Scott) from a holding cell in which he has written Sherlock's name all over the walls.'''
# printing the text
print(text)

Sherlock (Benedict Cumberbatch) and John (Martin Freeman) receive a visit from Henry Knight (Russell Tovey), who witnessed his father's death by a "gigantic hound" at Dartmoor 20 years ago. After years of therapy, Henry revisited the site, only to see the hound again, prompting his request for help. Though initially dismissive, Sherlock is soon interested in Henry's use of "hound" instead of "dog". Sherlock and John arrive in Dartmoor to find the hound is a local legend. They visit Baskerville, a nearby Ministry of Defence research base, using Mycroft's (Mark Gatiss) security pass. After Mycroft's credentials cause a security alert, Dr. Bob Frankland (Clive Mantle) vouches for Sherlock's identity, despite knowing the truth. Frankland says he was a friend of Henry's father and is concerned for Henry's well-being.

Henry tells John and Sherlock about the words "Liberty" and "In" in his dreams. Sherlock, John, and Henry then visit the hollow in the hope of finding the hound. On the way, J

### Text Processing

In [3]:
# loading the NLP 
nlp = spacy.load('en_core_web_sm')
# inputting the text
doc = nlp(text)
# loading the stopwords
STOP_WORDS = nlp.Defaults.stop_words

In [4]:
# word tokenizing, lemmantization and changing to lower cases
words = [x.lemma_.lower() for x in doc]
# removing the punctuations
punct_removed = [x for x in words if x not in string.punctuation]
# removing the stop words
sw_removed = [x for x in punct_removed if x not in STOP_WORDS]

### Creating a Word Frequency Dictionary

In [5]:
# blank frequency dictionary
freq_dict = {}
for x in sw_removed:
    if x not in freq_dict:
        freq_dict[x] = 1
    elif x in freq_dict:
        freq_dict [x] += 1
    else:
        pass
freq_dict

{'sherlock': 25,
 'benedict': 1,
 'cumberbatch': 1,
 'john': 17,
 'martin': 1,
 'freeman': 1,
 'receive': 2,
 'visit': 3,
 'henry': 17,
 'knight': 1,
 'russell': 1,
 'tovey': 1,
 'witness': 1,
 'father': 4,
 'death': 1,
 'gigantic': 1,
 'hound': 12,
 'dartmoor': 2,
 '20': 1,
 'year': 3,
 'ago': 1,
 'therapy': 1,
 'revisit': 1,
 'site': 1,
 'prompt': 1,
 'request': 1,
 'help': 1,
 'initially': 1,
 'dismissive': 1,
 'soon': 1,
 'interested': 1,
 'use': 2,
 'instead': 1,
 'dog': 4,
 'arrive': 2,
 'find': 6,
 'local': 2,
 'legend': 1,
 'baskerville': 2,
 'nearby': 1,
 'ministry': 1,
 'defence': 1,
 'research': 1,
 'base': 2,
 'mycroft': 5,
 'mark': 1,
 'gatiss': 1,
 'security': 2,
 'pass': 1,
 'credential': 1,
 'cause': 1,
 'alert': 1,
 'dr.': 1,
 'bob': 1,
 'frankland': 9,
 'clive': 1,
 'mantle': 1,
 'vouche': 1,
 'identity': 1,
 'despite': 2,
 'know': 1,
 'truth': 1,
 'friend': 1,
 'concern': 1,
 '\n\n': 4,
 'tell': 1,
 'word': 2,
 'liberty': 4,
 'dream': 1,
 'hollow': 5,
 'hope': 1,
 'w

In [6]:
# maximum frequency that a word have
max_freq = max(freq_dict.values())
# normalising the frequencies
freq_dict = {word: freq/max_freq for word, freq in freq_dict.items()}
freq_dict

{'sherlock': 1.0,
 'benedict': 0.04,
 'cumberbatch': 0.04,
 'john': 0.68,
 'martin': 0.04,
 'freeman': 0.04,
 'receive': 0.08,
 'visit': 0.12,
 'henry': 0.68,
 'knight': 0.04,
 'russell': 0.04,
 'tovey': 0.04,
 'witness': 0.04,
 'father': 0.16,
 'death': 0.04,
 'gigantic': 0.04,
 'hound': 0.48,
 'dartmoor': 0.08,
 '20': 0.04,
 'year': 0.12,
 'ago': 0.04,
 'therapy': 0.04,
 'revisit': 0.04,
 'site': 0.04,
 'prompt': 0.04,
 'request': 0.04,
 'help': 0.04,
 'initially': 0.04,
 'dismissive': 0.04,
 'soon': 0.04,
 'interested': 0.04,
 'use': 0.08,
 'instead': 0.04,
 'dog': 0.16,
 'arrive': 0.08,
 'find': 0.24,
 'local': 0.08,
 'legend': 0.04,
 'baskerville': 0.08,
 'nearby': 0.04,
 'ministry': 0.04,
 'defence': 0.04,
 'research': 0.04,
 'base': 0.08,
 'mycroft': 0.2,
 'mark': 0.04,
 'gatiss': 0.04,
 'security': 0.08,
 'pass': 0.04,
 'credential': 0.04,
 'cause': 0.04,
 'alert': 0.04,
 'dr.': 0.04,
 'bob': 0.04,
 'frankland': 0.36,
 'clive': 0.04,
 'mantle': 0.04,
 'vouche': 0.04,
 'identity

### Ranking Sentences according to how much important words are there in the sentences

In [7]:
# isolating the sentences 
original_sents = [x for x in doc.sents]
# processing the sentences 
processed_sents = []
for sent in original_sents:
    new_sent = [word.lemma_.lower() for word in sent if word.text.lower() not in string.punctuation and word.text.lower() not in STOP_WORDS]
    processed_sents.append(new_sent)
# scoring the sentences 
sent_score = {}
for i in range(len(original_sents)):
    sent_score[original_sents[i]] = 0
    for word in processed_sents[i]:
        if word in freq_dict:
            sent_score[original_sents[i]] += freq_dict[word] 
# sorting the sentences according to the sentence scores
sent_score = {sent: score for sent, score in sorted(sent_score.items(), key=lambda item: item[1], reverse=True)}
sent_score

{After John receives a call from Mortimer that Henry has run off with a gun, John, Sherlock, and Lestrade run to the hollow to find Henry about to commit suicide.: 4.800000000000001,
 Sherlock (Benedict Cumberbatch) and John (Martin Freeman) receive a visit from Henry Knight (Russell Tovey), who witnessed his father's death by a "gigantic hound" at Dartmoor 20 years ago.: 3.8800000000000012,
 As Sherlock and John prepare to leave the following day, John wonders why he saw the hound in the lab despite not having inhaled the gas from the hollow.: 3.680000000000001,
 Sherlock, John, and Henry then visit the hollow in the hope of finding the hound.: 3.440000000000001,
 Sherlock explains the hound was a hallucination; his father was killed by Frankland, wearing a gas mask and a sweatshirt with "H.O.U.N.D. Liberty, In" on it; a child could not cope with this, so his mind tricked him.: 2.880000000000001,
 He also points out Sherlock was wrong for once; he believed the drug was in Henry's suga

### Deciding the summary length

In [40]:
# number of word in the original text
n_word_orig = len([x for x in doc])
# number of word in the summary
n_word_summ = math.floor(0.3*n_word_orig)
# taking those sentences that are important as well as permit the length constraint
summary_sents = []
for sent in sent_score.keys():
    summary_sents.append(sent)
    temp_summ = nlp(' '.join([x.text for x in summary_sents]))
    word_count = len([x for x in temp_summ])
    if word_count >= n_word_summ:
        break
summary_sents

[After John receives a call from Mortimer that Henry has run off with a gun, John, Sherlock, and Lestrade run to the hollow to find Henry about to commit suicide.,
 Sherlock (Benedict Cumberbatch) and John (Martin Freeman) receive a visit from Henry Knight (Russell Tovey), who witnessed his father's death by a "gigantic hound" at Dartmoor 20 years ago.,
 As Sherlock and John prepare to leave the following day, John wonders why he saw the hound in the lab despite not having inhaled the gas from the hollow.,
 Sherlock, John, and Henry then visit the hollow in the hope of finding the hound.,
 Sherlock explains the hound was a hallucination; his father was killed by Frankland, wearing a gas mask and a sweatshirt with "H.O.U.N.D. Liberty, In" on it; a child could not cope with this, so his mind tricked him.,
 He also points out Sherlock was wrong for once; he believed the drug was in Henry's sugar and put it in John's coffee.
 ,
 Henry tells John and Sherlock about the words "Liberty" and "

### Arranging the selected sentences of summary in the order they appeared in the original text

In [41]:
# getting the indices of the selected sentences in the list of original sentences 
idx = [original_sents.index(sent) for sent in summary_sents]
# sorting the index
sorted_idx = sorted(idx)
# getting the selected sentences in the order
final_summary_sents = [original_sents[idx].text for idx in sorted_idx]
# final_summary
summary = ' '.join(final_summary_sents)
print(summary)

Sherlock (Benedict Cumberbatch) and John (Martin Freeman) receive a visit from Henry Knight (Russell Tovey), who witnessed his father's death by a "gigantic hound" at Dartmoor 20 years ago. Though initially dismissive, Sherlock is soon interested in Henry's use of "hound" instead of "dog". Sherlock and John arrive in Dartmoor to find the hound is a local legend. Henry tells John and Sherlock about the words "Liberty" and "In" in his dreams. Sherlock, John, and Henry then visit the hollow in the hope of finding the hound. After John receives a call from Mortimer that Henry has run off with a gun, John, Sherlock, and Lestrade run to the hollow to find Henry about to commit suicide. Sherlock explains the hound was a hallucination; his father was killed by Frankland, wearing a gas mask and a sweatshirt with "H.O.U.N.D. Liberty, In" on it; a child could not cope with this, so his mind tricked him. Henry realises that Frankland murdered Henry's father because he found him testing the drug. A

### Creating a Class of the summary following all the above rule of text summarizing

In [43]:
# defining a class for the extraction based text summarization
class Summary:
    def __init__(self, text):
        '''basic class variables'''
        self.text = text
        self.words = self.get_words()
        self.len_orig = len(self.words)
        self.freq_dict = self.get_freq_dict()
        self.original_sents = self.get_sents()
        self.sent_score = self.get_sent_score()
        self.summary = self.get_summary()
        self.summ_len = self.get_summary_len()

    def get_words(self):
        '''word tokenzation'''
        doc = nlp(self.text)
        words = [x for x in doc]
        return words
    
    def get_freq_dict(self):
        '''frequency of the each words'''
        freq_dict = {}
        doc = nlp(self.text)
        words = [x.lemma_.lower() for x in doc if x.text.lower() not in string.punctuation and x.text.lower() not in STOP_WORDS]
        for x in words:
            if x not in freq_dict:
                freq_dict[x] = 1
            elif x in freq_dict:
                freq_dict [x] += 1
            else:
                pass
        max_freq = max(freq_dict.values())
        freq_dict = {word: freq/max_freq for word, freq in freq_dict.items()}
        return freq_dict
    
    def get_sents(self):
        '''sentence tokenization'''
        doc = nlp(self.text)
        original_sents = [x for x in doc.sents]
        return original_sents

    def get_sent_score(self):
        '''getting the sentence score'''
        processed_sents = []
        for sent in self.original_sents:
            new_sent = [word.lemma_.lower() for word in sent if word.text.lower() not in string.punctuation and word.text.lower() not in STOP_WORDS]
            processed_sents.append(new_sent)
        sent_score = {}
        for i in range(len(self.original_sents)):
            sent_score[self.original_sents[i]] = 0
            for word in processed_sents[i]:
                if word in self.freq_dict:
                    sent_score[self.original_sents[i]] += self.freq_dict[word] 
        sent_score = {sent: score for sent, score in sorted(sent_score.items(), key=lambda item: item[1], reverse=True)}
        return sent_score 
    
    def get_summary(self):
        '''getting the actual summary'''
        n_word_summ = math.floor(self.len_orig * 0.3)
        summary_sents = []
        for sent in self.sent_score.keys():
            summary_sents.append(sent)
            temp_summ = nlp(' '.join([x.text for x in summary_sents]))
            word_count = len([x for x in temp_summ])
            if word_count >= n_word_summ:
                break 
        idx = [self.original_sents.index(sent) for sent in summary_sents]
        sorted_idx = sorted(idx)
        final_summary_sents = [self.original_sents[idx].text for idx in sorted_idx]
        summary = ' '.join(final_summary_sents)
        return summary
    
    def get_summary_len(self):
        '''getting the summary length'''
        doc = nlp(self.summary)
        summ_len = len([x for x in doc])
        return(summ_len)

### Some Examples

In [46]:
# Examples 1
with open('./sample_texts/sample_text_1.txt', 'r') as fp:
    text1 = Summary(fp.read())

print(f'''
============================================================
Original Text
============================================================
{text1.text}
============================================================
Word Count: {text1.len_orig}

============================================================
Summary
============================================================
{text1.summary}
============================================================
Word Count: {text1.summ_len}''') 


Original Text
The evolution of the horse is a classic example of natural selection in action. Over millions of years, horses evolved from small, four-toed creatures to the large, powerful animals we know today. This evolution was driven by the need for horses to adapt to their changing environment.

Early horses lived in forests, where they were preyed upon by large predators. To survive, these horses needed to be able to run fast. They also needed to be able to see well in low light conditions. Over time, horses evolved longer legs and necks, which helped them to run faster and see better.

As the climate changed and forests gave way to grasslands, horses needed to adapt again. Grasslands are more open than forests, so horses needed to be able to see predators from a distance. They also needed to be able to run long distances to find food and water. Over time, horses evolved larger bodies and faster running speeds.

The evolution of the horse is a testament to the power of natural se

In [48]:
# Examples 2
with open('./sample_texts/sample_text_2.txt', 'r') as fp:
    text2 = Summary(fp.read())

print(f'''
============================================================
Original Text
============================================================
{text2.text}
============================================================
Word Count: {text2.len_orig}

============================================================
Summary
============================================================
{text2.summary}
============================================================
Word Count: {text2.summ_len}''') 


Original Text
The Great Wall of China is one of the most impressive feats of engineering in human history. It stretches for over 13,000 miles, making it the longest man-made structure in the world. The wall was built over centuries by different dynasties, and it served as a defense against invaders from the north.

The Great Wall is not a single, continuous structure. It is made up of a series of walls and fortifications that were built over time. The walls vary in height and width, and they are made from different materials, such as stone, brick, and earth. The most well-known section of the Great Wall is the Badaling section, which is located near Beijing.

The Great Wall is not just a physical barrier. It is also a cultural icon that is synonymous with China. The wall has been featured in many films and television shows, and it is a popular tourist destination. The Great Wall is a reminder of China's rich history and its unique culture.
Word Count: 193

Summary
The Great Wall of Ch

In [49]:
# Examples 3
with open('./sample_texts/sample_text_3.txt', 'r') as fp:
    text3 = Summary(fp.read())

print(f'''
============================================================
Original Text
============================================================
{text3.text}
============================================================
Word Count: {text3.len_orig}

============================================================
Summary
============================================================
{text3.summary}
============================================================
Word Count: {text3.summ_len}''') 


Original Text
The Industrial Revolution was a period of major industrialization that took place during the late 18th and early 19th centuries. It brought significant changes in agriculture, manufacturing, mining, transportation, and technology, which had a profound impact on the socioeconomic and cultural conditions of the time. The revolution began in Great Britain and then spread to other parts of Europe and North America. Key innovations during this period included the steam engine, textile machinery, iron production, and the development of railways. These advancements led to increased productivity, urbanization, and the rise of factory-based industries. However, the Industrial Revolution also brought many challenges, such as poor working conditions, environmental degradation, and social inequality. Overall, it was a transformative period that laid the foundation for modern industrial societies.

Word Count: 150

Summary
The Industrial Revolution was a period of major industrializa

In [50]:
# Examples 4
with open('./sample_texts/sample_text_4.txt', 'r') as fp:
    text4 = Summary(fp.read())

print(f'''
============================================================
Original Text
============================================================
{text4.text}
============================================================
Word Count: {text4.len_orig}

============================================================
Summary
============================================================
{text4.summary}
============================================================
Word Count: {text4.summ_len}''') 


Original Text
The human brain is the most complex organ in the human body. It is responsible for everything we do, from thinking and feeling to moving and breathing. The brain is made up of billions of neurons, which are interconnected by trillions of synapses. These neurons communicate with each other using electrical and chemical signals.

The brain is divided into two halves, called the left and right hemispheres. The left hemisphere is responsible for language, logic, and analysis. The right hemisphere is responsible for creativity, intuition, and spatial awareness. However, the two hemispheres work together to perform most tasks.

The brain is constantly changing and adapting. It can learn new things and form new memories. The brain is also able to repair itself to some extent. However, the brain is also susceptible to damage, which can lead to a variety of neurological disorders.
Word Count: 166

Summary
The human brain is the most complex organ in the human body. The brain is d