# Summarizing using Gensim

In [20]:
import requests 
from bs4 import BeautifulSoup
from nltk import sent_tokenize

from gensim.summarization import summarize

In [2]:
r = requests.get('https://click.palletsprojects.com/en/7.x/why/')
r

<Response [200]>

In [3]:
r.encoding = 'utf-8'

In [4]:
soup = BeautifulSoup(r.text)

In [17]:
# find tag with id = 'why click'
why_click = soup.find(id='why-click').text.replace('\n',' ')

In [18]:
print(why_click)

 Why Click?¶ There are so many libraries out there for writing command line utilities; why does Click exist? This question is easy to answer: because there is not a single command line utility for Python out there which ticks the following boxes:  is lazily composable without restrictions supports implementation of Unix/POSIX command line conventions supports loading values from environment variables out of the box support for prompting of custom values is fully nestable and composable works the same in Python 2 and 3 supports file handling out of the box comes with useful common helpers (getting terminal dimensions, ANSI colors, fetching direct keyboard input, screen clearing, finding config paths, launching apps and editors, etc.)  There are many alternatives to Click; the obvious ones are optparse and argparse from the standard library. Have a look to see if something else resonates with you. Click actually implements its own parsing of arguments and does not use optparse or argpars

In [19]:
import pandas as pd
pd.set_option('display.max_colwidth',500)

def show_sentences(text):
    return pd.DataFrame({'Sentence': sent_tokenize(text)})

show_sentences(why_click)

Unnamed: 0,Sentence
0,Why Click?¶ There are so many libraries out there for writing command line utilities; why does Click exist?
1,This question is easy to answer: because there is not a single command line utility for Python out there which ticks the following boxes: is lazily composable without restrictions supports implementation of Unix/POSIX command line conventions supports loading values from environment variables out of the box support for prompting of custom values is fully nestable and composable works the same in Python 2 and 3 supports file handling out of the box comes with useful common helpers (getting t...
2,There are many alternatives to Click; the obvious ones are optparse and argparse from the standard library.
3,Have a look to see if something else resonates with you.
4,Click actually implements its own parsing of arguments and does not use optparse or argparse following the optparse parsing behavior.
5,The reason it’s not based on argparse is that argparse does not allow proper nesting of commands by design and has some deficiencies when it comes to POSIX compliant argument handling.
6,Click is designed to be fun and customizable but not overly flexible.
7,"For instance, the customizability of help pages is constrained."
8,This constraint is intentional because Click promises multiple Click instances will continue to function as intended when strung together.
9,Too much customizability would break this promise.


In [21]:
summary = summarize(why_click)

In [22]:
summary

'This question is easy to answer: because there is not a single command line utility for Python out there which ticks the following boxes:  is lazily composable without restrictions supports implementation of Unix/POSIX command line conventions supports loading values from environment variables out of the box support for prompting of custom values is fully nestable and composable works the same in Python 2 and 3 supports file handling out of the box comes with useful common helpers (getting terminal dimensions, ANSI colors, fetching direct keyboard input, screen clearing, finding config paths, launching apps and editors, etc.)  There are many alternatives to Click; the obvious ones are optparse and argparse from the standard library.\nThe reason it’s not based on argparse is that argparse does not allow proper nesting of commands by design and has some deficiencies when it comes to POSIX compliant argument handling.\nClick is not based on argparse because it has some behaviors that mak

## Tuning the parameters

In [25]:
#split 
summary = summarize(why_click,split = True)
summary

['This question is easy to answer: because there is not a single command line utility for Python out there which ticks the following boxes:  is lazily composable without restrictions supports implementation of Unix/POSIX command line conventions supports loading values from environment variables out of the box support for prompting of custom values is fully nestable and composable works the same in Python 2 and 3 supports file handling out of the box comes with useful common helpers (getting terminal dimensions, ANSI colors, fetching direct keyboard input, screen clearing, finding config paths, launching apps and editors, etc.)  There are many alternatives to Click; the obvious ones are optparse and argparse from the standard library.',
 'The reason it’s not based on argparse is that argparse does not allow proper nesting of commands by design and has some deficiencies when it comes to POSIX compliant argument handling.',
 'Click is not based on argparse because it has some behaviors t

In [26]:
# ratio
# propotion of the original text returned

summary = summarize(why_click,ratio=0.1)
summary

'This question is easy to answer: because there is not a single command line utility for Python out there which ticks the following boxes:  is lazily composable without restrictions supports implementation of Unix/POSIX command line conventions supports loading values from environment variables out of the box support for prompting of custom values is fully nestable and composable works the same in Python 2 and 3 supports file handling out of the box comes with useful common helpers (getting terminal dimensions, ANSI colors, fetching direct keyboard input, screen clearing, finding config paths, launching apps and editors, etc.)  There are many alternatives to Click; the obvious ones are optparse and argparse from the standard library.\nThe reason it’s not based on argparse is that argparse does not allow proper nesting of commands by design and has some deficiencies when it comes to POSIX compliant argument handling.\nClick is not based on argparse because it has some behaviors that mak

In [27]:
# returns 50% of the original document
summary = summarize(why_click,ratio=0.5)
summary

'This question is easy to answer: because there is not a single command line utility for Python out there which ticks the following boxes:  is lazily composable without restrictions supports implementation of Unix/POSIX command line conventions supports loading values from environment variables out of the box support for prompting of custom values is fully nestable and composable works the same in Python 2 and 3 supports file handling out of the box comes with useful common helpers (getting terminal dimensions, ANSI colors, fetching direct keyboard input, screen clearing, finding config paths, launching apps and editors, etc.)  There are many alternatives to Click; the obvious ones are optparse and argparse from the standard library.\nClick actually implements its own parsing of arguments and does not use optparse or argparse following the optparse parsing behavior.\nThe reason it’s not based on argparse is that argparse does not allow proper nesting of commands by design and has some 

In [29]:
# word count 
# number of words returned

summary = summarize(why_click,word_count=200)
summary

'This question is easy to answer: because there is not a single command line utility for Python out there which ticks the following boxes:  is lazily composable without restrictions supports implementation of Unix/POSIX command line conventions supports loading values from environment variables out of the box support for prompting of custom values is fully nestable and composable works the same in Python 2 and 3 supports file handling out of the box comes with useful common helpers (getting terminal dimensions, ANSI colors, fetching direct keyboard input, screen clearing, finding config paths, launching apps and editors, etc.)  There are many alternatives to Click; the obvious ones are optparse and argparse from the standard library.\nThe reason it’s not based on argparse is that argparse does not allow proper nesting of commands by design and has some deficiencies when it comes to POSIX compliant argument handling.\nClick is not based on argparse because it has some behaviors that mak

In [32]:
summary = summarize(why_click,word_count=600)
summary

'This question is easy to answer: because there is not a single command line utility for Python out there which ticks the following boxes:  is lazily composable without restrictions supports implementation of Unix/POSIX command line conventions supports loading values from environment variables out of the box support for prompting of custom values is fully nestable and composable works the same in Python 2 and 3 supports file handling out of the box comes with useful common helpers (getting terminal dimensions, ANSI colors, fetching direct keyboard input, screen clearing, finding config paths, launching apps and editors, etc.)  There are many alternatives to Click; the obvious ones are optparse and argparse from the standard library.\nClick actually implements its own parsing of arguments and does not use optparse or argparse following the optparse parsing behavior.\nThe reason it’s not based on argparse is that argparse does not allow proper nesting of commands by design and has some 