# **Automated Text Summarization Using Sumy Library**

**Install Libraries**

In [1]:
!pip install sumy


Collecting sumy
  Downloading sumy-0.11.0-py2.py3-none-any.whl.metadata (7.5 kB)
Collecting docopt<0.7,>=0.6.1 (from sumy)
  Downloading docopt-0.6.2.tar.gz (25 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting breadability>=0.1.20 (from sumy)
  Downloading breadability-0.1.20.tar.gz (32 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting pycountry>=18.2.23 (from sumy)
  Downloading pycountry-24.6.1-py3-none-any.whl.metadata (12 kB)
Downloading sumy-0.11.0-py2.py3-none-any.whl (97 kB)
Downloading pycountry-24.6.1-py3-none-any.whl (6.3 MB)
   ---------------------------------------- 0.0/6.3 MB ? eta -:--:--
   - -------------------------------------- 0.3/6.3 MB ? eta -:--:--
   - -------------------------------------- 0.3/6.3 MB ? eta -:--:--
   --- ------------------------------------ 0.5/6.3 MB 1.0 MB/s eta 0:00:06
   --- ---------------------------------

**Import Libraries**

In [2]:
from sumy.nlp.tokenizers import Tokenizer
import nltk
from sumy.nlp.stemmers import Stemmer
from sumy.parsers.plaintext import PlaintextParser
from sumy.summarizers.luhn import LuhnSummarizer
from sumy.utils import get_stop_words
from sumy.summarizers.edmundson import EdmundsonSummarizer
from sumy.summarizers.lsa import LsaSummarizer


In [7]:
nltk.download('punkt')


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\amamo\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

**Build Tokenizer**

In [14]:
tokenizer = Tokenizer("en")

sentences = tokenizer.to_sentences("""Hello, this is Analytics Vidhya! We offer a wide 
range of articles, tutorials, and resources on various topics in AI and Data Science. 
Our mission is to provide quality education and knowledge sharing to help you excel 
in your career and academic pursuits. Whether you're a beginner looking to learn 
the basics of coding or an experienced developer seeking advanced concepts, 
Analytics Vidhya has something for everyone. """)

for sentence in sentences:
    print(tokenizer.to_words(sentence))


('Hello', 'this', 'is', 'Analytics', 'Vidhya')
('We', 'offer', 'a', 'wide', 'range', 'of', 'articles', 'tutorials', 'and', 'resources', 'on', 'various', 'topics', 'in', 'AI', 'and', 'Data', 'Science')
('Our', 'mission', 'is', 'to', 'provide', 'quality', 'education', 'and', 'knowledge', 'sharing', 'to', 'help', 'you', 'excel', 'in', 'your', 'career', 'and', 'academic', 'pursuits')
('Whether', 'you', 'a', 'beginner', 'looking', 'to', 'learn', 'the', 'basics', 'of', 'coding', 'or', 'an', 'experienced', 'developer', 'seeking', 'advanced', 'concepts', 'Analytics', 'Vidhya', 'has', 'something', 'for', 'everyone')


**Build Stemmer**

In [15]:
stemmer = Stemmer("en")
stem = stemmer("Blogging")
print(stem)


blog


**Luhn Summarizer**

The Luhn Summarizer is one of the summarization algorithms provided by the Sumy library. This summarizer is based on the concept of frequency analysis, where the importance of a sentence is determined by the frequency of significant words within it. The algorithm identifies words that are most relevant to the topic of the text by filterin gout some common stop words and then ranks sentences. The Luhn Summarizer is effective for extracting key sentences from a document.

In [18]:
def summarize_paragraph(paragraph, sentences_count=2):
    parser = PlaintextParser.from_string(paragraph, Tokenizer("english"))

    summarizer = LuhnSummarizer(Stemmer("english"))
    summarizer.stop_words = get_stop_words("english")

    summary = summarizer(parser.document, sentences_count)
    return summary

if __name__ == "__main__":
    paragraph = """Artificial intelligence (AI) is intelligence demonstrated by machines, in contrast
                   to the natural intelligence displayed by humans and animals. Leading AI textbooks define
                   the field as the study of "intelligent agents": any device that perceives its environment
                   and takes actions that maximize its chance of successfully achieving its goals. Colloquially,
                   the term "artificial intelligence" is often used to describe machines (or computers) that mimic
                   "cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving"."""

    sentences_count = 2
    summary = summarize_paragraph(paragraph, sentences_count)

    for sentence in summary:
        print(sentence)


Artificial intelligence (AI) is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and animals.
Colloquially, the term "artificial intelligence" is often used to describe machines (or computers) that mimic "cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving".


**Edmundson Summarizer**

The Edmundson Summarizer is another powerful algorithm provided by the Sumy library. Unlike other summarizers that primarily rely on statistical and frequency-based methods, the Edmundson Summarizer allows for a more tailored approach through the use of bonus words, stigma words, and null words. These type of words enable the algorithm to emphasize or de-emphasize those words in the summarized text.

In [19]:
def summarize_paragraph(paragraph, sentences_count=2, bonus_words=None, stigma_words=None, null_words=None):
    parser = PlaintextParser.from_string(paragraph, Tokenizer("english"))

    summarizer = EdmundsonSummarizer(Stemmer("english"))
    summarizer.stop_words = get_stop_words("english")

    if bonus_words:
        summarizer.bonus_words = bonus_words
    if stigma_words:
        summarizer.stigma_words = stigma_words
    if null_words:
        summarizer.null_words = null_words

    summary = summarizer(parser.document, sentences_count)
    return summary

if __name__ == "__main__":
    paragraph = """Artificial intelligence (AI) is intelligence demonstrated by machines, in contrast
                   to the natural intelligence displayed by humans and animals. Leading AI textbooks define
                   the field as the study of "intelligent agents": any device that perceives its environment
                   and takes actions that maximize its chance of successfully achieving its goals. Colloquially,
                   the term "artificial intelligence" is often used to describe machines (or computers) that mimic
                   "cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving"."""

    sentences_count = 2
    bonus_words = ["intelligence", "AI"]
    stigma_words = ["contrast"]
    null_words = ["the", "of", "and", "to", "in"]

    summary = summarize_paragraph(paragraph, sentences_count, bonus_words, stigma_words, null_words)

    for sentence in summary:
        print(sentence)


Artificial intelligence (AI) is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and animals.
Leading AI textbooks define the field as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals.


**LSA Summarizer**

The LSA summarizer is the best one amognst all because it works by identifying patterns and relationships between texts, rather than soley rely on frequency analysis. This LSA summarizer generates more contextually accurate summaries by understanding the meaning and context of the input text.

In [20]:
def summarize_paragraph(paragraph, sentences_count=2):
    parser = PlaintextParser.from_string(paragraph, Tokenizer("english"))

    summarizer = LsaSummarizer(Stemmer("english"))
    summarizer.stop_words = get_stop_words("english")

    summary = summarizer(parser.document, sentences_count)
    return summary

if __name__ == "__main__":
    paragraph = """Artificial intelligence (AI) is intelligence demonstrated by machines, in contrast
                   to the natural intelligence displayed by humans and animals. Leading AI textbooks define
                   the field as the study of "intelligent agents": any device that perceives its environment
                   and takes actions that maximize its chance of successfully achieving its goals. Colloquially,
                   the term "artificial intelligence" is often used to describe machines (or computers) that mimic
                   "cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving"."""

    sentences_count = 2
    summary = summarize_paragraph(paragraph, sentences_count)

    for sentence in summary:
        print(sentence)


Leading AI textbooks define the field as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals.
Colloquially, the term "artificial intelligence" is often used to describe machines (or computers) that mimic "cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving".
