# Monday, October 23

## Announcements and Reminders

- Chapter 10 reading: something is broken
- Quiz on Wednesday (dictionaries)
- Celebration of Mind Wednesday Evening
- Exercises for Chapter 10: due Friday


## Activity: Counting Words with Dictionaries

Today we will continue to explore the `dictionary` type in python.

### Motivating Question

Which word appears most frequently in Edgar Alan Poe's poem *The Raven*?

#### Strategy

How would you do this by hand?  How would you keep track of your data?

### Dictionary Basics

A *dictionary* is a data type in python that holds an unordered list of *key:value* pairs.  The keys must be distinct, the values do not.  

Here is an example:

In [25]:
fav_nums = {'Oscar': 42, 'Sheldon': 73}

#We can access the dictionary using the '[]' method
print(fav_nums['Oscar'])
print(fav_nums)

#Here we can also get the keys
for person in fav_nums: 
#    print(fav_nums[person])
    print(person)

#Here we can also get the values of the keys
for x in fav_nums.values(): 
    print(x)

#One important thing, we can change a dictionary into a list using a 'list' command
num_list = list(fav_nums.values())
print(num_list)

42
{'Oscar': 42, 'Sheldon': 73}
[42, 73]


Some things to try:
* How could we find the favorite number of someone in the dictionary?
* How could we print out each favorite number?
* How could we get a list of the favorite numbers?
* How could we add a new person's favorite number?
* How could we check whether 496 is anyone's favorite number?
* How could we find the largest favorite number of anyone in the dictionary?

Here are some dictionary methods you might try to use: `.get()`, `.items()`, `.keys()`, `.values()`, `.update()`, `.pop()`.


In [31]:
fav_nums = {'Oscar': 42, 'Sheldon': 73}

#There are two ways to update a list, the first is to use the update command
fav_nums.update({'Gigi': 8, 'Nirod': 21})
print(fav_nums.items())

#We can also hardcode the new entry
fav_nums['Bryce'] = 23
print(fav_nums)

#We can also view the dictionary length 
print(len(fav_nums))

#Now let's try to look up some values
print(496 in fav_nums.values())

#How about we try sorting a dictionary 
nums = list(fav_nums.values())
nums.sort(reverse = True)
print(nums)

dict_items([('Oscar', 42), ('Sheldon', 73), ('Gigi', 8), ('Nirod', 21)])
{'Oscar': 42, 'Sheldon': 73, 'Gigi': 8, 'Nirod': 21, 'Bryce': 23}
5
False
[73, 42, 23, 21, 8]
73


In [34]:
#Now let's try to find who has the biggest value (cont from last part)
largest = nums[0]
print(largest)

for person in fav_nums: 
    if fav_nums[person] == largest: 
        print(person)

73
Sheldon


#### Adding new elements to a dictionary

With lists, we used `.append()` to add a new item to the list at the end of the list.  Or, if we want to change the item in the list at position 7 to the value 42, say, we could write `mylist[7] = 42`.

Dictionaries can similarly be updated like this.

In [35]:
fav_nums.update({"Wes":1})
fav_nums['Edgar'] = 13

print(fav_nums)

{'Oscar': 42, 'Sheldon': 73, 'Gigi': 8, 'Nirod': 21, 'Bryce': 23, 'Wes': 1, 'Edgar': 13}


##### Caution!

What happens if you use `.update()` but there is already a key of the name you want to add?  One way to make sure this isn't the case is to use the `.get()` method.  This returns the value of the key, or if the key isn't present, it returns `None`.

(You can also set a default value.)

In [36]:
fav_nums.get('Edgar')

13

### Back to our main question

Let's import the text of the poem (saved as `raven.txt` in this folder).  Print it out to make sure it worked.

In [43]:
with open ('raven.txt', 'r') as f: 
    text = f.read()

words = text.split()
print(words)
 
counter = 0
for word in words: 
    if word == 'nevermore': 
        counter += 1

['The', 'Raven', 'Edgar', 'Allen', 'Poe', 'Once', 'upon', 'a', 'midnight', 'dreary,', 'while', 'I', 'pondered,', 'weak', 'and', 'weary', 'Over', 'many', 'a', 'quaint', 'and', 'curious', 'volume', 'of', 'forgotten', 'lore,', 'While', 'I', 'nodded,', 'nearly', 'napping,', 'suddenly', 'there', 'came', 'a', 'tapping,', 'As', 'of', 'someone', 'gently', 'rapping,', 'rapping', 'on', 'my', 'chamber', 'door.', "'Tis", 'some', "visitor,'", 'I', 'muttered,', "'tapping", 'on', 'my', 'chamber', 'door.', 'Only', 'this,', 'and', 'nothing', "more.'", 'Ah,', 'distinctly,', 'I', 'remember', 'it', 'was', 'in', 'the', 'bleak', 'December', 'And', 'each', 'separate', 'dying', 'ember', 'wrought', 'its', 'ghost', 'upon', 'the', 'floor.', 'Eagerly', 'I', 'wished', 'the', 'morrow,', 'vainly', 'I', 'had', 'sought', 'to', 'borrow', 'From', 'my', 'books', 'surcease', 'of', 'sorrow--sorrow', 'for', 'the', 'lost', 'Lenore;', 'For', 'the', 'rare', 'and', 'radiant', 'maiden', 'whom', 'the', 'angels', 'named', 'Lenore'

Next we should make a list of words.

What about punctuation?  There are lots of different types of punctuation we need to consider.  Luckily, python has our backs with the `string` library.

What about capital and lowercase letters? For simplicity sake let's make everything lower case

In [47]:
import string

print(string.punctuation)

#Let's clean the list 
clean_words = []
for word in words: 
    word = word.lower()
    
    for p in string.punctuation: 
        word = word.replace(p , "")
    clean_words.append(word)


print(clean_words)

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
['the', 'raven', 'edgar', 'allen', 'poe', 'once', 'upon', 'a', 'midnight', 'dreary', 'while', 'i', 'pondered', 'weak', 'and', 'weary', 'over', 'many', 'a', 'quaint', 'and', 'curious', 'volume', 'of', 'forgotten', 'lore', 'while', 'i', 'nodded', 'nearly', 'napping', 'suddenly', 'there', 'came', 'a', 'tapping', 'as', 'of', 'someone', 'gently', 'rapping', 'rapping', 'on', 'my', 'chamber', 'door', 'tis', 'some', 'visitor', 'i', 'muttered', 'tapping', 'on', 'my', 'chamber', 'door', 'only', 'this', 'and', 'nothing', 'more', 'ah', 'distinctly', 'i', 'remember', 'it', 'was', 'in', 'the', 'bleak', 'december', 'and', 'each', 'separate', 'dying', 'ember', 'wrought', 'its', 'ghost', 'upon', 'the', 'floor', 'eagerly', 'i', 'wished', 'the', 'morrow', 'vainly', 'i', 'had', 'sought', 'to', 'borrow', 'from', 'my', 'books', 'surcease', 'of', 'sorrowsorrow', 'for', 'the', 'lost', 'lenore', 'for', 'the', 'rare', 'and', 'radiant', 'maiden', 'whom', 'the', 'angels', 'named',

Back to our main question (again...)

In [66]:
#Let's answer our main question 
counter = 0
for word in clean_words: 
    if word == 'some': 
        counter += 1

print(counter)

#We need to create a dictionary and keep track of all the values
word_dict = {}
for word in clean_words: 
    # Check to see if the word is in the dictionary.  If so, add to its counter.  Otherwise, start a new entry with counter 1.
    if word not in word_dict: 
        word_dict[word] = 0
    word_dict[word] += 1


print(word_dict)

5
{'the': 59, 'raven': 11, 'edgar': 1, 'allen': 1, 'poe': 1, 'once': 1, 'upon': 5, 'a': 15, 'midnight': 1, 'dreary': 1, 'while': 2, 'i': 32, 'pondered': 1, 'weak': 1, 'and': 40, 'weary': 1, 'over': 1, 'many': 2, 'quaint': 1, 'curious': 1, 'volume': 1, 'of': 23, 'forgotten': 1, 'lore': 1, 'nodded': 1, 'nearly': 1, 'napping': 2, 'suddenly': 1, 'there': 7, 'came': 3, 'tapping': 5, 'as': 5, 'someone': 1, 'gently': 2, 'rapping': 3, 'on': 10, 'my': 25, 'chamber': 12, 'door': 14, 'tis': 3, 'some': 5, 'visitor': 3, 'muttered': 2, 'only': 4, 'this': 17, 'nothing': 7, 'more': 9, 'ah': 2, 'distinctly': 1, 'remember': 1, 'it': 5, 'was': 6, 'in': 7, 'bleak': 1, 'december': 1, 'each': 2, 'separate': 1, 'dying': 1, 'ember': 1, 'wrought': 1, 'its': 3, 'ghost': 1, 'floor': 4, 'eagerly': 1, 'wished': 1, 'morrow': 2, 'vainly': 1, 'had': 1, 'sought': 1, 'to': 6, 'borrow': 1, 'from': 8, 'books': 1, 'surcease': 1, 'sorrowsorrow': 1, 'for': 3, 'lost': 2, 'lenore': 8, 'rare': 2, 'radiant': 2, 'maiden': 3, 'wh

### Counting words in the list

Now we should have a list of words from the poem, not including any whitespace or punctiation and all the same case.  How can we count how many times each word appears?

A few options:

* For each word, we can use `.count()` to see how many times it appears and just print this.  
* We could create a list of the number of times each word occurs in the same position as the word in the list of words.  
* We could create a variable for each word and have it store the number of times the word occurs
* Or we can do exactly this, but the right way: with a dictionary.

Let's start working with a smaller example of a string.

In [57]:
import string

lyrics = "We can dance if we want to We can leave your friends behind 'Cause your friends don't dance And if they don't dance Well, they're no friends of mine"
for mark in string.punctuation:
  lyrics = lyrics.replace(mark,'')

word_list = lyrics.lower().split()
print(word_list)


['we', 'can', 'dance', 'if', 'we', 'want', 'to', 'we', 'can', 'leave', 'your', 'friends', 'behind', 'cause', 'your', 'friends', 'dont', 'dance', 'and', 'if', 'they', 'dont', 'dance', 'well', 'theyre', 'no', 'friends', 'of', 'mine']


Now we will create a dictionary and start adding words to it.

In [60]:
# start with an empty dictionary
word_count = {}

# loop over words is word_list, adding them to the dictionary.
for word in word_list: 
    # Check to see if the word is in the dictionary.  If so, add to its counter.  Otherwise, start a new entry with counter 1.
    if word not in word_count: 
        word_count[word] = 0
    word_count[word] += 1

print(word_count)

{'we': 3, 'can': 2, 'dance': 3, 'if': 2, 'want': 1, 'to': 1, 'leave': 1, 'your': 2, 'friends': 3, 'behind': 1, 'cause': 1, 'dont': 2, 'and': 1, 'they': 1, 'well': 1, 'theyre': 1, 'no': 1, 'of': 1, 'mine': 1}


#### Sorting the dictionary

You cannot sort dictionaries!  But you can sort lists, and you can create a list out of a dictionary.  The tricky bit is figuring out how to sort that list based not on the words in the list, but by the value of the key that corresponds to that word in the dictionary.

The idea is this: define a function that returns some value for each key of the dictionary, and sort by these returned values.

In [67]:
#results = list(word_count.items())
#print(word_count)
results = list(word_dict.items())

switched = []
for a, b in results: 
    switched.append[b, a]
switched.sort()

results.sort()
print(results)

print(switched)

TypeError: 'builtin_function_or_method' object is not subscriptable