# Week 5 Lecture - Dictionaries


* What is a Dictionary
* key/value pairs
* indexing, but no slicing
* Loops?
* Making a word counter

## What is a Dictionary?

Similar to a Python list, a Python Dictionary is a data structure that behaves as a *container* or *collection* for other data values. 
Like a list becuase there are values and each has an index.
Not like a list because the indicies are not implicit, they can be more than numbers, and there is no order.
In a dicitonary, data is stored in *key*/*value* pairs. The *key* is the index and and *value* is the actual data.
When you create an *item* in a dictionary (another term for the key/value pairs) then you store teh data in the dictionary.
To get the data back out we use the key to *lookup* the value in the dictionary. In this way the dictionary behaves like a english language dictionary. The key is like the word and the value is like the definition.

In [None]:
# create a new dictionary
english2spanish = dict()
# another way to do the same thing using the literal syntax
english2spanish = {}

In [None]:
# add some values by setting the key value pairs using the assignment operator
english2spanish["one"] = "uno"
print(english2spanish) # look at the contents of the entire dictionary

Look at the syntax of what Python spit out. Curly braces with two string values separated by a colon. The curly braces mean we are looking at a dictionary (as opposed to a list which uses square brackets `[]`) and the colon always separates the key/value pairs (the *item*). Let's add another item to our `english2spanish` dictionary

In [None]:
# add another item to the dictionary
english2spanish["two"] = "dos"
print(english2spanish)

Now our `english2spanish` dictionary has two items! Each item is separated by a comman (like the way items are separated in lists). Remember, with dictionaries you look at the two things on either side of the colon(`:`) as a single item. The thing to the left of the colon is the key and the right of the colon is the value. So the first item in this dictionary has the key "one" and the value "uno".

We use the familiar *indexing syntax* (square brackets) to get a value based on its key. This is like lists for grabbing an item at  

In [None]:
# lookup the value associated with the key
english2spanish["one"]

In [None]:
# you can't look up by value
english2spanish["uno"]

Dictionary keys must be basic data types (ints and strings are most common, but floats work too)

In [None]:
floaty_keys = {4.5:"four point five", 2.1:"Two point one"}
floaty_keys[4.5]

Dictionary values can be any Python data type or data structure, even more dictionaries!

In [None]:
# make a "database" of users
user_info = {1:{"name":"Bobby McGee","occupation":"seartbreaker", "age":None},
            2:{"name":"Kris Kristofferson", "occupation":"singer-songwriter", "age":84},
            3:{"name":"Ray Price", "occupation":"crooner", "age":87}}

In [None]:
# get all the information about user id 1
user_info[1]

In [None]:
# get the value for the key "name" for user 1
user_info[1]["name"]

Dictionaries are used to store hierarchial data. Because you can nest a dictionary (or list) inside of a dictionary, this allows you to create arbitrarily complex structures of data. 

## Putting it all together - Stylometrics


Don't be afraid of the $5 word, it just a technique for analyzing texts (usually to [determine authorship](https://www.latimes.com/science/sciencenow/la-sci-sn-shakespeare-play-linguistic-analysis-20150410-story.html)). Computational Stylometics does a lot of fancy statistics, but much of it is based on *counting words*. 

In [69]:
#swoon
romeo = """
But, soft! what light through yonder window breaks?
It is the east, and Juliet is the sun.
Arise, fair sun, and kill the envious moon,
Who is already sick and pale with grief,
That thou her maid art far more fair than she:
Be not her maid, since she is envious;
Her vestal livery is but sick and green
And none but fools do wear it; cast it off.
It is my lady, O, it is my love!
O, that she knew she were!
She speaks yet she says nothing: what of that?
Her eye discourses; I will answer it.
I am too bold, 'tis not to me she speaks:
Two of the fairest stars in all the heaven,
Having some business, do entreat her eyes
To twinkle in their spheres till they return.
What if her eyes were there, they in her head?
The brightness of her cheek would shame those stars,
As daylight doth a lamp; her eyes in heaven
Would through the airy region stream so bright
That birds would sing and think it were not night.
See, how she leans her cheek upon her hand!
O, that I were a glove upon that hand,
That I might touch that cheek!
"""

### Computational Thinking

If we want to count the words in the text above, we need to do the following things.

1. Normalize the text by removing punctuation and converting to lowercase.
2. Split the string of text into a list of words
3. Loop over the list and count each instance of a word

In [70]:
# convert everything to lowercase
romeo.lower()

"\nbut, soft! what light through yonder window breaks?\nit is the east, and juliet is the sun.\narise, fair sun, and kill the envious moon,\nwho is already sick and pale with grief,\nthat thou her maid art far more fair than she:\nbe not her maid, since she is envious;\nher vestal livery is but sick and green\nand none but fools do wear it; cast it off.\nit is my lady, o, it is my love!\no, that she knew she were!\nshe speaks yet she says nothing: what of that?\nher eye discourses; i will answer it.\ni am too bold, 'tis not to me she speaks:\ntwo of the fairest stars in all the heaven,\nhaving some business, do entreat her eyes\nto twinkle in their spheres till they return.\nwhat if her eyes were there, they in her head?\nthe brightness of her cheek would shame those stars,\nas daylight doth a lamp; her eyes in heaven\nwould through the airy region stream so bright\nthat birds would sing and think it were not night.\nsee, how she leans her cheek upon her hand!\no, that i were a glove u

Now we have everything in lowercase, but we need to remove the punctuation. Now, we could use the `replace()` string method and manually identify and remove each punctuation mark, but that would make for some ugly code.

In [71]:
# ugly approach to removing punctuation
romeo.replace(".","").replace(",","").replace("!","").replace("'","") #and so on

'\nBut soft what light through yonder window breaks?\nIt is the east and Juliet is the sun\nArise fair sun and kill the envious moon\nWho is already sick and pale with grief\nThat thou her maid art far more fair than she:\nBe not her maid since she is envious;\nHer vestal livery is but sick and green\nAnd none but fools do wear it; cast it off\nIt is my lady O it is my love\nO that she knew she were\nShe speaks yet she says nothing: what of that?\nHer eye discourses; I will answer it\nI am too bold tis not to me she speaks:\nTwo of the fairest stars in all the heaven\nHaving some business do entreat her eyes\nTo twinkle in their spheres till they return\nWhat if her eyes were there they in her head?\nThe brightness of her cheek would shame those stars\nAs daylight doth a lamp; her eyes in heaven\nWould through the airy region stream so bright\nThat birds would sing and think it were not night\nSee how she leans her cheek upon her hand\nO that I were a glove upon that hand\nThat I might

Wouldn't it be nice it we could do this all in one shot? Fortunately, we can but it is a bit complicated.


```
Replace each character in the string using the given translation table.

table
    Translation table, which must be a mapping of Unicode ordinals to
    Unicode ordinals, strings, or None.
```

In [72]:
# remember the ord()
print("Period:", ord("."))
print("Comma:", ord(","))
print("Explaination", ord("!"))

Period: 46
Comma: 44
Explaination 33


In [73]:
translation_table = {46:"",
                     44:"",
                     33:""}
romeo.translate(translation_table)

"\nBut soft what light through yonder window breaks?\nIt is the east and Juliet is the sun\nArise fair sun and kill the envious moon\nWho is already sick and pale with grief\nThat thou her maid art far more fair than she:\nBe not her maid since she is envious;\nHer vestal livery is but sick and green\nAnd none but fools do wear it; cast it off\nIt is my lady O it is my love\nO that she knew she were\nShe speaks yet she says nothing: what of that?\nHer eye discourses; I will answer it\nI am too bold 'tis not to me she speaks:\nTwo of the fairest stars in all the heaven\nHaving some business do entreat her eyes\nTo twinkle in their spheres till they return\nWhat if her eyes were there they in her head?\nThe brightness of her cheek would shame those stars\nAs daylight doth a lamp; her eyes in heaven\nWould through the airy region stream so bright\nThat birds would sing and think it were not night\nSee how she leans her cheek upon her hand\nO that I were a glove upon that hand\nThat I migh

We can use the `maketrans()` function to automatically translate characters into their ordinal values

In [74]:
punctuation_dictionary = {
    ".":"",
    "!":"",
    ":":"",
    ",":"",
    "?":"",
    ";":"",
    ",":""
}
translation_table = romeo.maketrans(punctuation_dictionary)
translation_table

{46: '', 33: '', 58: '', 44: '', 63: '', 59: ''}

Now we can use this table to remove the punctuation from our string

In [75]:
romeo.translate(translation_table)

"\nBut soft what light through yonder window breaks\nIt is the east and Juliet is the sun\nArise fair sun and kill the envious moon\nWho is already sick and pale with grief\nThat thou her maid art far more fair than she\nBe not her maid since she is envious\nHer vestal livery is but sick and green\nAnd none but fools do wear it cast it off\nIt is my lady O it is my love\nO that she knew she were\nShe speaks yet she says nothing what of that\nHer eye discourses I will answer it\nI am too bold 'tis not to me she speaks\nTwo of the fairest stars in all the heaven\nHaving some business do entreat her eyes\nTo twinkle in their spheres till they return\nWhat if her eyes were there they in her head\nThe brightness of her cheek would shame those stars\nAs daylight doth a lamp her eyes in heaven\nWould through the airy region stream so bright\nThat birds would sing and think it were not night\nSee how she leans her cheek upon her hand\nO that I were a glove upon that hand\nThat I might touch th

But, that was still a lot of typing and it looks like we missed the apostrophe...ugh. More typing means more bugs...

If Pythong is *actually* batteries included, then wouldn't this already be a solved problem?

In [76]:
# Get a list of all the punctuation from the standard library
from string import punctuation
print(punctuation)

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~


In [77]:
# make our translation table programmatically with the excludes argument
translation_table = romeo.maketrans("", "", punctuation)
translation_table

{33: None,
 34: None,
 35: None,
 36: None,
 37: None,
 38: None,
 39: None,
 40: None,
 41: None,
 42: None,
 43: None,
 44: None,
 45: None,
 46: None,
 47: None,
 58: None,
 59: None,
 60: None,
 61: None,
 62: None,
 63: None,
 64: None,
 91: None,
 92: None,
 93: None,
 94: None,
 95: None,
 96: None,
 123: None,
 124: None,
 125: None,
 126: None}

In [78]:
# test our our punctuation remover
romeo.translate(translation_table)


'\nBut soft what light through yonder window breaks\nIt is the east and Juliet is the sun\nArise fair sun and kill the envious moon\nWho is already sick and pale with grief\nThat thou her maid art far more fair than she\nBe not her maid since she is envious\nHer vestal livery is but sick and green\nAnd none but fools do wear it cast it off\nIt is my lady O it is my love\nO that she knew she were\nShe speaks yet she says nothing what of that\nHer eye discourses I will answer it\nI am too bold tis not to me she speaks\nTwo of the fairest stars in all the heaven\nHaving some business do entreat her eyes\nTo twinkle in their spheres till they return\nWhat if her eyes were there they in her head\nThe brightness of her cheek would shame those stars\nAs daylight doth a lamp her eyes in heaven\nWould through the airy region stream so bright\nThat birds would sing and think it were not night\nSee how she leans her cheek upon her hand\nO that I were a glove upon that hand\nThat I might touch tha

Yes! Now we have almost solved the first step. Now make everything lowercase, fortunately that is easy.

In [79]:
# normalize the text
romeo_normalized = romeo.translate(translation_table).lower()
romeo_normalized

'\nbut soft what light through yonder window breaks\nit is the east and juliet is the sun\narise fair sun and kill the envious moon\nwho is already sick and pale with grief\nthat thou her maid art far more fair than she\nbe not her maid since she is envious\nher vestal livery is but sick and green\nand none but fools do wear it cast it off\nit is my lady o it is my love\no that she knew she were\nshe speaks yet she says nothing what of that\nher eye discourses i will answer it\ni am too bold tis not to me she speaks\ntwo of the fairest stars in all the heaven\nhaving some business do entreat her eyes\nto twinkle in their spheres till they return\nwhat if her eyes were there they in her head\nthe brightness of her cheek would shame those stars\nas daylight doth a lamp her eyes in heaven\nwould through the airy region stream so bright\nthat birds would sing and think it were not night\nsee how she leans her cheek upon her hand\no that i were a glove upon that hand\nthat i might touch tha

Now we can do computational thinking step 2: split the string of words into a list. Also an easy task thanks to the string method `split()` which will automatically split on whitespace

In [80]:
#split text into a list of words
romeo_list = romeo_normalized.split()
romeo_list[0:10] #look at the first 10 words in the list

['but',
 'soft',
 'what',
 'light',
 'through',
 'yonder',
 'window',
 'breaks',
 'it',
 'is']

Ok, now we can do the final step, which is loop over each word and count them up in a dictionary

In [81]:
# create a counter
word_counter = {}

# loop over each wor
for word in romeo_list:
    # check to see if we have encountered the word
    if word not in word_counter:
        # have not seen this word before, so create a key with value 1
        word_counter[word] = 1
    else: 
        # we have seen this word before, so increment the value by 1
        word_counter[word] += 1

print(word_counter)

{'but': 3, 'soft': 1, 'what': 3, 'light': 1, 'through': 2, 'yonder': 1, 'window': 1, 'breaks': 1, 'it': 7, 'is': 7, 'the': 7, 'east': 1, 'and': 6, 'juliet': 1, 'sun': 2, 'arise': 1, 'fair': 2, 'kill': 1, 'envious': 2, 'moon': 1, 'who': 1, 'already': 1, 'sick': 2, 'pale': 1, 'with': 1, 'grief': 1, 'that': 8, 'thou': 1, 'her': 11, 'maid': 2, 'art': 1, 'far': 1, 'more': 1, 'than': 1, 'she': 8, 'be': 1, 'not': 3, 'since': 1, 'vestal': 1, 'livery': 1, 'green': 1, 'none': 1, 'fools': 1, 'do': 2, 'wear': 1, 'cast': 1, 'off': 1, 'my': 2, 'lady': 1, 'o': 3, 'love': 1, 'knew': 1, 'were': 4, 'speaks': 2, 'yet': 1, 'says': 1, 'nothing': 1, 'of': 3, 'eye': 1, 'discourses': 1, 'i': 4, 'will': 1, 'answer': 1, 'am': 1, 'too': 1, 'bold': 1, 'tis': 1, 'to': 2, 'me': 1, 'two': 1, 'fairest': 1, 'stars': 2, 'in': 4, 'all': 1, 'heaven': 2, 'having': 1, 'some': 1, 'business': 1, 'entreat': 1, 'eyes': 3, 'twinkle': 1, 'their': 1, 'spheres': 1, 'till': 1, 'they': 2, 'return': 1, 'if': 1, 'there': 1, 'head': 1,