# Colab's New Code Editor

Colab is moving to a new code editor which includes:
* Richer completions
* Additional keybinding options
* Improved accessibility

In [1]:
# Implementation from https://dev.to/davidisrawi/build-a-quick-summarizer-with-python-and-nltk
# Setup
!pip install -q wordcloud
import wordcloud

import nltk
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger') 

import pandas as pd
import matplotlib.pyplot as plt
import io
import unicodedata
import numpy as np
import re
import string

from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize, sent_tokenize

text_str = '''
In order to solve a problem, it is important to follow a systematic approach. In the following we discuss different steps that can follow to solve a problem systematically. A well-defined problem is the one that does not contain ambiguities. All the conditions are clearly specified and it has a clear goal. It is easy to understand and solve.
Given a problem statement, first we need to see whether the problem is defined well or not. If the problem is not defined well then we can use one of the following strategies to define the problem.
Gain Background Knowledge: We try to know the situation and circumstances in which the problem is happening. In this way, we can identify the given state. It also helps to know what a good solution will look like. How we shall be able to measure the solution.
Use Guesses: We try to get the unknown information through appropriate guesses. The gases may be basis upon our past experiences.
Draw a Picture:  if the problem is not well-defined, we can draw a picture and fill the undefined information. Figure 1-1 shows pictorial representation of a problem.
It is important to understand the problem before jumping into the solution of the problem. For example, a riddle or a puzzle can be answered only after clear understanding. A clear understanding of a problem makes it easy easier to solve and help to save money, time and resources. Understanding of problem includes identification of the five Ws (what, who,  when, where, and why). Problem analysis is the process to figure out these 5 Ws from a problem statement. Problem analysis helps to understand a given problem.  Problem analysis helps to understand a given problem. These are the basic elements which lead towards the solution of a given problem.For example, consider the following problem statement.Suppose your class teacher assigns you a task to prepare a list of students in your school whose name start with letter A. The list is required in order to prepare an alphabetical directory of all school students and there is only one week to complete the task.We can analyze this problem by identifying 5 Ws in the problem statement as given below.What List of students names starting with letter A.Who Students.Why to prepare the director of students, When within a week, Where School .Figure 1-2 shows the metaphorical representation of problem where the red light presents a problem, the yellow light represents its analysis and the green light presents the solution. It shows that problem analysis makes us closer to a solution.
After analyzing a problem, we formulate a plan that may lead us towards the solution of a problem. This phase includes finding the right strategy for problem solving. Some of the strategies are Divide and Conquer,This strategy divides a complex problem into smaller problems. Guess, Check and Improve, The designer guesses a solution to a problem and then checks the correctness of the solution. If the solution is not according to the expectations, then he/she refines the solution. The refinement is an iterative process. Act it Out, In this strategy the designer defines the list of to-do tasks. Afterwards reforms he / she performs the task.Prototype (Draw), This technique draws a pictorial representation of the solution. It is not the final solution. However, it may help a designer to understand the important components of the solution.The selection of a strategy depends upon the problem. It is quite important that one strategy may be more suitable to implement a solution then the other one. Very specifically, the selection of the strategy depends upon the nature of a problem.
The word Candid refers to something spontaneous and unplanned. For example, if you are asked to find number of students in your school who can play cricket. You can estimate by finding cricket players in your class and then multiplying it by the total number of classes in your school. Your answer in this way is the Candid solution. To find exact number of cricket players, you have to opt some other way, like visiting each class or getting data from teachers. One can think of a candid solution any time. A candid solution can help to safe time. In Figure 1-4, there are different ways shown to reach a certain place (which can be reached either by going across the wall or by going sideways) and the one you think can work, is the candid solution. It is not necessary that candid solution is the actual solution of a problem.
Sometime we find more than one solutions of a problem and select the best one amongst them. For Example, assume that names of all the students in your school are available on our website and you are to search a particular name. You can solve this search problem by either of the following methods: 
Look at each number name on the website one by one until the name is found or the list is over. 
Take printouts and search the required name. 
Copy names, put them in Excel sheet and sort there in alphabetical order. Searching in a Sorted list is comparatively easy. 
Just press Ctrl + F, when the list is available in a web browser. You can type the name to search automatically.
There can be other solutions as well. Now we can identify a solution that has less number of steps or that seems more effective based on some criteria

'''


def _create_frequency_table(text_string) -> dict:
    """
    we create a dictionary for the word frequency table.
    For this, we should only use the words that are not part of the stopWords array.
    Removing stop words and making frequency table
    Stemmer - an algorithm to bring words to its root word.
    :rtype: dict
    """
    stopWords = set(stopwords.words("english"))
    words = word_tokenize(text_string)
    ps = PorterStemmer()

    freqTable = dict()
    for word in words:
        word = ps.stem(word)
        if word in stopWords:
            continue
        if word in freqTable:
            freqTable[word] += 1
        else:
            freqTable[word] = 1

    return freqTable


def _score_sentences(sentences, freqTable) -> dict:
    """
    score a sentence by its words
    Basic algorithm: adding the frequency of every non-stop word in a sentence divided by total no of words in a sentence.
    :rtype: dict
    """

    sentenceValue = dict()

    for sentence in sentences:
        word_count_in_sentence = (len(word_tokenize(sentence)))
        word_count_in_sentence_except_stop_words = 0
        for wordValue in freqTable:
            if wordValue in sentence.lower():
                word_count_in_sentence_except_stop_words += 1
                if sentence[:10] in sentenceValue:
                    sentenceValue[sentence[:10]] += freqTable[wordValue]
                else:
                    sentenceValue[sentence[:10]] = freqTable[wordValue]

        if sentence[:10] in sentenceValue:
            sentenceValue[sentence[:10]] = sentenceValue[sentence[:10]] / word_count_in_sentence_except_stop_words

        '''
        Notice that a potential issue with our score algorithm is that long sentences will have an advantage over short sentences. 
        To solve this, we're dividing every sentence score by the number of words in the sentence.
        
        Note that here sentence[:10] is the first 10 character of any sentence, this is to save memory while saving keys of
        the dictionary.
        '''

    return sentenceValue


def _find_average_score(sentenceValue) -> int:
    """
    Find the average score from the sentence value dictionary
    :rtype: int
    """
    sumValues = 0
    for entry in sentenceValue:
        sumValues += sentenceValue[entry]

    # Average value of a sentence from original text
    average = (sumValues / len(sentenceValue))

    return average


def _generate_summary(sentences, sentenceValue, threshold):
    sentence_count = 0
    summary = ''

    for sentence in sentences:
        if sentence[:10] in sentenceValue and sentenceValue[sentence[:10]] >= (threshold):
            summary += " " + sentence
            sentence_count += 1

    return summary


def run_summarization(text):
    # 1 Create the word frequency table
    freq_table = _create_frequency_table(text)

    '''
    We already have a sentence tokenizer, so we just need 
    to run the sent_tokenize() method to create the array of sentences.
    '''

    # 2 Tokenize the sentences
    sentences = sent_tokenize(text)

    # 3 Important Algorithm: score the sentences
    sentence_scores = _score_sentences(sentences, freq_table)

    # 4 Find the threshold
    threshold = _find_average_score(sentence_scores)

    # 5 Important Algorithm: Generate the summary
    summary = _generate_summary(sentences, sentence_scores, 1.3 * threshold)

    return summary


if __name__ == '__main__':
    result = run_summarization(text_str)
    print(result)

You should consider upgrading via the 'python -m pip install --upgrade pip' command.
[nltk_data] Downloading package stopwords to C:\Users\Farah
[nltk_data]     Maheen\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to C:\Users\Farah
[nltk_data]     Maheen\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt to C:\Users\Farah
[nltk_data]     Maheen\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\Farah Maheen\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


 It is easy to understand and solve. It is important to understand the problem before jumping into the solution of the problem. Problem analysis is the process to figure out these 5 Ws from a problem statement. Problem analysis helps to understand a given problem. Problem analysis helps to understand a given problem. Guess, Check and Improve, The designer guesses a solution to a problem and then checks the correctness of the solution. It is not the final solution. It is not necessary that candid solution is the actual solution of a problem. There can be other solutions as well.


# Colab's New Code Editor

Colab is moving to a new code editor which includes:
* Richer completions
* Additional keybinding options
* Improved accessibility