- Text Cleaning
- Sentence Tokenization
- Word Tokenization
- Word-Frequency Table
- Summarization

In [6]:
text = '''The Trinamool Congress has written to Facebook CEO Mark Zuckerberg raising the issue of alleged bias of the social media giant towards the BJP, and claimed that there is enough evidence in public domain to substantiate this charge.

Party MP Derek O’Brien, who has written the letter to Mr. Zuckerberg also makes a reference to an earlier meeting between the two, where some of these concerns were raised.

Sources in the party said that Mr. O’Brien met Zuckerberg in October 2015 in Delhi.

“We, the All India Trinamool Congress (AITC), India’s second-largest opposition party, have had serious concerns about Facebook’s role during the 2014 and 2019 general elections in India,” Mr. O’Brien wrote in the letter accessed by PTI.


“With the elections in the Indian state of West Bengal just months away, your company’s recent blocking of Facebook pages and accounts in Bengal also points to the link between Facebook and the BJP. There is enough material now in the public domain, including internal memos of senior Facebook management, to substantiate the bias,” he wrote.'''

In [53]:
text = '''The scene is total chaos: a woman and all her purse's contents in middair as she trips over a child's toy, a man hastily trying to gather his spilled laundry, a screaming child weaving through the crowd. Somewhere, in the midst of it all, is the person you've been looking for: wearing a red and white striped shirt, black rimmed glasses and a lopsided cap. There he is! There's Waldo.

Many of us have fond memories of Waldo. But while he looms large in our imagination, our childhood searches for Waldo typically stayed pretty small – Waldo is a tiny person in the middle of lots of other tiny things.

And that's what this post is about: wee things. Specifically, the wee things that we see as part of graphics, maps, visualizations (wee things in space) as well as the wee things we experience as part of interactions, navigation, and usability (wee things in time). This means everything from sequences of small graphics that help us make comparisons, to tiny locator maps that help orient us within a larger graphic, to navigation icons that give hints about how we should make our way around a page.

Waldo, and the eternal search for him, can actually tell us quite a lot about design. In many ways, Waldo is a great example of what NOT to do when using wee things in your own work. So with Waldo as our anti-hero, let's take a look at how people read and interpret small visual forms, why tiny details can be hugely useful, and what principles we can apply to make all these little images and moments work for us as designers.

Wee Things In Space
Probably the most immediate definition of wee things are things that are physically small: little things on a page. We see these all the time in news graphics, and we're probably familiar with some of their forms: small multiples, sparklines, icons, etc. I'll go into more details about all of these.

These visual forms work because they serve as extensions of our mind – they are cognitive tools that complement our own mental abilities. They do this by recording information for us to make use of later, lending a hand to our (pretty terrible) working memories, helping us search and discover and recognize. We'll take a look at one task in particular they are great at: letting us make comparisons.

Make Comparisons

Tiny sequences of graphics, also known as small multiples, are great ways to help our brains compare. They are so successful because we don’t have to rely on working memory – every bit of information is in front of us at the same time. This means that we can easily see changes, patterns or differences.

Here are a bunch of examples of small multiples in the wild – maps and planets, first lady hair styles and telegraph signals, food trucks, fashion color trends and dressing appropriately for different climates, the distribution of deaths in the 1870’s and last but not least, Bill Murray’s hats.'''

In [54]:
text

"The scene is total chaos: a woman and all her purse's contents in middair as she trips over a child's toy, a man hastily trying to gather his spilled laundry, a screaming child weaving through the crowd. Somewhere, in the midst of it all, is the person you've been looking for: wearing a red and white striped shirt, black rimmed glasses and a lopsided cap. There he is! There's Waldo.\n\nMany of us have fond memories of Waldo. But while he looms large in our imagination, our childhood searches for Waldo typically stayed pretty small – Waldo is a tiny person in the middle of lots of other tiny things.\n\nAnd that's what this post is about: wee things. Specifically, the wee things that we see as part of graphics, maps, visualizations (wee things in space) as well as the wee things we experience as part of interactions, navigation, and usability (wee things in time). This means everything from sequences of small graphics that help us make comparisons, to tiny locator maps that help orient 

In [55]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation

In [56]:
stopwords = list(STOP_WORDS)

In [57]:
print(stopwords)

['then', 'not', 'neither', 'becomes', 'whatever', 'whenever', 'must', 'already', 'both', 'thence', "'d", 'almost', 'anything', "'re", 'thereafter', '‘ve', 'am', 'thereby', 'somehow', 'doing', 'anyway', 'before', 'none', 'are', 'get', 'become', 'had', 'former', 'just', 'which', 'somewhere', 'meanwhile', 'same', 'namely', 'mostly', 'show', 'now', 'as', 'sometimes', 'thru', 'nine', '‘d', 'anyhow', 'done', 'others', 'myself', 'she', 'keep', 'except', 'after', 'hereupon', 'sixty', 'nowhere', 'off', 'ourselves', "'ve", 'move', 'once', 'while', 'here', 'front', "n't", 'hundred', 'was', 'there', '’re', 'whereas', 'what', 'upon', 'behind', 'indeed', 'via', 'unless', 'too', 'beside', 'each', 'say', 'until', 'side', 'whole', 'well', 'really', 'together', 'under', 'enough', '‘ll', 'everything', 'its', 'five', 'a', 'herself', 'put', 'hereby', 'everywhere', 'one', '‘re', 'forty', 'but', 'another', 'beforehand', 'above', 'rather', 'since', 'seemed', 'twelve', 'i', 'four', 'such', 'someone', 'every', 

In [58]:
nlp = spacy.load('en_core_web_sm')

In [59]:
doc = nlp(text)

In [60]:
tokens = [token.text for token in doc]
print(tokens)

['The', 'scene', 'is', 'total', 'chaos', ':', 'a', 'woman', 'and', 'all', 'her', 'purse', "'s", 'contents', 'in', 'middair', 'as', 'she', 'trips', 'over', 'a', 'child', "'s", 'toy', ',', 'a', 'man', 'hastily', 'trying', 'to', 'gather', 'his', 'spilled', 'laundry', ',', 'a', 'screaming', 'child', 'weaving', 'through', 'the', 'crowd', '.', 'Somewhere', ',', 'in', 'the', 'midst', 'of', 'it', 'all', ',', 'is', 'the', 'person', 'you', "'ve", 'been', 'looking', 'for', ':', 'wearing', 'a', 'red', 'and', 'white', 'striped', 'shirt', ',', 'black', 'rimmed', 'glasses', 'and', 'a', 'lopsided', 'cap', '.', 'There', 'he', 'is', '!', 'There', "'s", 'Waldo', '.', '\n\n', 'Many', 'of', 'us', 'have', 'fond', 'memories', 'of', 'Waldo', '.', 'But', 'while', 'he', 'looms', 'large', 'in', 'our', 'imagination', ',', 'our', 'childhood', 'searches', 'for', 'Waldo', 'typically', 'stayed', 'pretty', 'small', '–', 'Waldo', 'is', 'a', 'tiny', 'person', 'in', 'the', 'middle', 'of', 'lots', 'of', 'other', 'tiny', '

In [61]:
punctuation = punctuation + '\n\n\n'
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\n\n\n'

In [62]:
word_frequencies = {}
for word in doc:
    if word.text.lower() not in stopwords:
        if word.text.lower() not in punctuation:
            if word.text not in word_frequencies.keys():
                word_frequencies[word.text] = 1
            else:
                word_frequencies[word.text] += 1

In [63]:
print(word_frequencies)

{'scene': 1, 'total': 1, 'chaos': 1, 'woman': 1, 'purse': 1, 'contents': 1, 'middair': 1, 'trips': 1, 'child': 2, 'toy': 1, 'man': 1, 'hastily': 1, 'trying': 1, 'gather': 1, 'spilled': 1, 'laundry': 1, 'screaming': 1, 'weaving': 1, 'crowd': 1, 'midst': 1, 'person': 2, 'looking': 1, 'wearing': 1, 'red': 1, 'white': 1, 'striped': 1, 'shirt': 1, 'black': 1, 'rimmed': 1, 'glasses': 1, 'lopsided': 1, 'cap': 1, 'Waldo': 7, 'fond': 1, 'memories': 2, 'looms': 1, 'large': 1, 'imagination': 1, 'childhood': 1, 'searches': 1, 'typically': 1, 'stayed': 1, 'pretty': 2, 'small': 7, '–': 4, 'tiny': 4, 'middle': 1, 'lots': 1, 'things': 10, 'post': 1, 'wee': 7, 'Specifically': 1, 'graphics': 4, 'maps': 3, 'visualizations': 1, 'space': 1, 'experience': 1, 'interactions': 1, 'navigation': 2, 'usability': 1, 'time': 3, 'means': 2, 'sequences': 2, 'help': 3, 'comparisons': 2, 'locator': 1, 'orient': 1, 'larger': 1, 'graphic': 1, 'icons': 2, 'hints': 1, 'way': 1, 'page': 2, 'eternal': 1, 'search': 2, 'actual

In [64]:
max_frequency = max(word_frequencies.values())

In [65]:
max_frequency

10

In [66]:
for word in word_frequencies.keys():
    word_frequencies[word] = word_frequencies[word]/max_frequency

In [67]:
print(word_frequencies)

{'scene': 0.1, 'total': 0.1, 'chaos': 0.1, 'woman': 0.1, 'purse': 0.1, 'contents': 0.1, 'middair': 0.1, 'trips': 0.1, 'child': 0.2, 'toy': 0.1, 'man': 0.1, 'hastily': 0.1, 'trying': 0.1, 'gather': 0.1, 'spilled': 0.1, 'laundry': 0.1, 'screaming': 0.1, 'weaving': 0.1, 'crowd': 0.1, 'midst': 0.1, 'person': 0.2, 'looking': 0.1, 'wearing': 0.1, 'red': 0.1, 'white': 0.1, 'striped': 0.1, 'shirt': 0.1, 'black': 0.1, 'rimmed': 0.1, 'glasses': 0.1, 'lopsided': 0.1, 'cap': 0.1, 'Waldo': 0.7, 'fond': 0.1, 'memories': 0.2, 'looms': 0.1, 'large': 0.1, 'imagination': 0.1, 'childhood': 0.1, 'searches': 0.1, 'typically': 0.1, 'stayed': 0.1, 'pretty': 0.2, 'small': 0.7, '–': 0.4, 'tiny': 0.4, 'middle': 0.1, 'lots': 0.1, 'things': 1.0, 'post': 0.1, 'wee': 0.7, 'Specifically': 0.1, 'graphics': 0.4, 'maps': 0.3, 'visualizations': 0.1, 'space': 0.1, 'experience': 0.1, 'interactions': 0.1, 'navigation': 0.2, 'usability': 0.1, 'time': 0.3, 'means': 0.2, 'sequences': 0.2, 'help': 0.3, 'comparisons': 0.2, 'loc

In [68]:
sentence_tokens = [sent for sent in doc.sents]
print(sentence_tokens)

[The scene is total chaos: a woman and all her purse's contents in middair as she trips over a child's toy, a man hastily trying to gather his spilled laundry, a screaming child weaving through the crowd., Somewhere, in the midst of it all, is the person you've been looking for: wearing a red and white striped shirt, black rimmed glasses and a lopsided cap., There he is!, There's Waldo.

, Many of us have fond memories of Waldo., But while he looms large in our imagination, our childhood searches for Waldo typically stayed pretty small – Waldo is a tiny person in the middle of lots of other tiny things.

, And that's what this post is about: wee things., Specifically, the wee things that we see as part of graphics, maps, visualizations (wee things in space) as well as the wee things we experience as part of interactions, navigation, and usability (wee things in time)., This means everything from sequences of small graphics that help us make comparisons, to tiny locator maps that help o

In [69]:
sentence_scores = {}
for sent in sentence_tokens:
    for word in sent:
        if word.text.lower() in word_frequencies.keys():
            if sent not in sentence_scores.keys():
                sentence_scores[sent] = word_frequencies[word.text.lower()]
            else:
                sentence_scores[sent] += word_frequencies[word.text.lower()]

In [70]:
sentence_scores

{The scene is total chaos: a woman and all her purse's contents in middair as she trips over a child's toy, a man hastily trying to gather his spilled laundry, a screaming child weaving through the crowd.: 2.200000000000001,
 Somewhere, in the midst of it all, is the person you've been looking for: wearing a red and white striped shirt, black rimmed glasses and a lopsided cap.: 1.4000000000000001,
 Many of us have fond memories of Waldo.: 0.30000000000000004,
 But while he looms large in our imagination, our childhood searches for Waldo typically stayed pretty small – Waldo is a tiny person in the middle of lots of other tiny things.
 : 4.2,
 And that's what this post is about: wee things.: 1.7999999999999998,
 Specifically, the wee things that we see as part of graphics, maps, visualizations (wee things in space) as well as the wee things we experience as part of interactions, navigation, and usability (wee things in time).: 8.5,
 This means everything from sequences of small graphics

In [71]:
from heapq import nlargest

In [72]:
select_length = int(len(sentence_tokens)*0.3)

In [73]:
select_length

7

In [74]:
summary = nlargest(select_length, sentence_scores, key = sentence_scores.get)

In [75]:
summary

[Specifically, the wee things that we see as part of graphics, maps, visualizations (wee things in space) as well as the wee things we experience as part of interactions, navigation, and usability (wee things in time).,
 Probably the most immediate definition of wee things are things that are physically small: little things on a page.,
 But while he looms large in our imagination, our childhood searches for Waldo typically stayed pretty small – Waldo is a tiny person in the middle of lots of other tiny things.
 ,
 This means everything from sequences of small graphics that help us make comparisons, to tiny locator maps that help orient us within a larger graphic, to navigation icons that give hints about how we should make our way around a page.
 ,
 Here are a bunch of examples of small multiples in the wild – maps and planets, first lady hair styles and telegraph signals, food trucks, fashion color trends and dressing appropriately for different climates, the distribution of deaths in