### This is the Notebook for Lecture 10

In this lecture, we will learn about the dictionary data structure.

### In-Class Coding Opportunity

Write a function read_write(file_name) that performs the following tasks:
<ol>
    <li>Opens the file to read, prints the contents, and then closes the file</li>
    <li>Using try/except (IOError), print a message if he file does not exist</li>
</ol>

In [1]:
def file_try_except(file_name):
    
    # In Class Code
    try:
        read_file = open(file_name)
        
        for line in read_file:
            print( line.stip() )
        
        read_file.close()
    
    except IOError as e:
        print('Unable to open file: ', e)

In [2]:
file_try_except('bad_file_name.txt')

Unable to open file:  [Errno 2] No such file or directory: 'bad_file_name.txt'


### Using the dict data structure to improve access!

In [3]:
# First approach to defining a dictionary, using dict()
english_to_french = dict()
type(english_to_french)

dict

In [6]:
english_to_french = dict(
    one = 'un',
    two = 'deux',
    three = 'trois',
    four = 'quatre',
    five = 'cinq'
)

In [7]:
english_to_french['four']

'quatre'

In [8]:
# Second approach to defining a dictionary using {}
english_to_french_deux = {}
type( english_to_french_deux )

dict

In [22]:
english_to_french_deux = dict(
    one = 'un',
    two = 'deux',
    three = 'trois',
    four = 'quatre',
    five = 'cinq'
)

In [23]:
english_to_french_deux['three']

'trois'

In [24]:
# Adding Key/Value pairs to a dictionary
english_to_french_deux['six'] = 'six'
english_to_french_deux['seven'] = 'sept'
english_to_french_deux['nine'] = 'neuf'
english_to_french_deux['eight'] = 'huit'

print(english_to_french_deux)

{'one': 'un', 'two': 'deux', 'three': 'trois', 'four': 'quatre', 'five': 'cinq', 'six': 'six', 'seven': 'sept', 'nine': 'neuf', 'eight': 'huit'}


### Dictionary Exception Examples

In [25]:
# Deliberate error to show a KeyError
english_to_french_deux('twentytween')

TypeError: 'dict' object is not callable

In [26]:
# One approach: Use true false to look up the KEY
'asf' in english_to_french_deux

False

In [27]:
'asf' not in english_to_french_deux

True

In [28]:
'one' in english_to_french_deux

True

In [29]:
'un' in english_to_french_deux

False

In [34]:
# Using try/except to print a dictionary
def print_dict( dict_print, key ):

    # In-Class Code
    try:
        print( dict_print[key] )
        
        return True
        
    except KeyError:
        print( key + ' is not in the dictionary')

In [35]:
print_dict( english_to_french, 'one' )

un


In [36]:
print_dict( english_to_french, '100' )

100 is not in the dictionary


Additionally, we can use the get method to specify a default value if the key is not present.

In [41]:
english_to_french.get('asdf', None)

In [39]:
english_to_french.get('four', None)

'quatre'

In [40]:
english_to_french.get('quatre', None)

### Printing all the keys and values

In [43]:
# Prints all the keys
english_to_french_deux.keys()

dict_keys(['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'nine', 'eight'])

In [46]:
# Prints all the values
english_to_french_deux.values()

dict_values(['un', 'deux', 'trois', 'quatre', 'cinq', 'six', 'sept', 'neuf', 'huit'])

### Ordering 

In [47]:
family_name = {}

In [50]:
family_name['Matthew'] = 40
family_name['Margot'] = 35
family_name['James'] = 37
family_name['Alfred'] = 73
family_name['Amy'] = 37
family_name['Aidan'] = 7
family_name['Kathy'] = 70
family_name['Evie'] = 3
family_name['Baby'] = 0
family_name['Eirinn'] = 8

In [51]:
# Note the ordering when I print the keys
family_name.keys()

dict_keys(['Matthew', 'Margot', 'James', 'Alfred', 'Amy', 'Aidan', 'Kathy', 'Evie', 'Baby', 'Eirinn'])

In [52]:
# Note the ordering when I print the values
family_name.values()

dict_values([40, 35, 37, 73, 37, 7, 70, 3, 0, 8])

### Compare access time for list and dict
<p></p>
Find an element in a dict is O(1) and a list is, on average O(n)

In [57]:
# Run Time Examples for dicts vs strings
import random

def pop_list( num_vals ):
    
    # Initialize the list
    the_list = [0]
    
    # Attempt to create num_vals random numbers
    for i in range( 0, num_vals ):
        
        # Generate the random integer
        insert_num = random.randint(0, num_vals)
        
        # Iterate through the list and stop when you either find the value or reach the end of the list
        iterator = 0
        while iterator < len(the_list) and the_list[iterator] != insert_num:
            iterator += 1
            
        # Question: Why is this commented code bad?
        # Answer: (Put your answer in here for class notes)
        # while the_list[iterator] != insert_num and iterator < len(the_list):
            # iterator += 1
        
        # If the iterator is at the end, append the list
        if iterator == len(the_list):
            the_list.append(insert_num)


def pop_dict( num_vals ):
    
    # Initialize the dictionary
    the_dict = dict()
    
    # Attempt to create num_vals random numbers
    for i in range( 0, num_vals ):
        
        # Generate the random numbeer
        insert_num = random.randint(0, num_vals)
        
        # Use not in to try to find the dict
        if insert_num not in the_dict:
            the_dict[ insert_num ] = 1

In [58]:
# -n has a limit of 100 runs
%timeit -n 100 pop_list(1000)

37.8 ms ± 3.4 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [59]:
%timeit -n 100 pop_dict(1000)

797 µs ± 34 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


### Combining Concepts
<p></p>
Now let's use <b>not in</b> with a list and check the times. Why do you 

In your notes, describe the **difference in time** between pop_list, pop_dict, and pop_list_mod. Why are the run times different, and what is the difference between **program run time** and **program time complexity**?
<p></p>
    <font color="red">Response Here</font>

In [60]:
def pop_list_mod( num_vals ):
    
    # Initialize the list
    the_list = [0]
    
    # Attempt to create num_vals random numbers
    for i in range( 0, num_vals ):
        
        # Generate the random integer
        insert_num = random.randint(0, num_vals)
        
        # Use not in to try to find the list
        if insert_num not in the_list:
            the_list.append(insert_num)

In [61]:
%timeit -n 100 pop_list_mod(1000)

3.78 ms ± 23 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


### In-Class Coding Opportunity

For every word in a file, count how many times it occurs. Use try/except where appropriate

In [65]:
def word_count(file_name):
    
    # In-Class Code Starts here
    word_dict = {}
    
    try:
        read_file = open(file_name)
        
        for line in read_file:
            
            for word in line.split():
                
                if word in word_dict:
                    word_dict[word] += 1
                    
                else:
                    word_dict[word] = 1
        
        for key in word_dict:
            print( key, word_dict[key] )
        
        read_file.close()
        
    except IOError as e:
        print('Unable to open file: ', e)

In [66]:
word_count('frost.txt')

Two 2
roads 2
diverged 2
in 3
a 3
yellow 1
wood, 2
And 6
sorry 1
I 8
could 2
not 1
travel 1
both 2
be 2
one 3
traveler, 1
long 1
stood 1
looked 1
down 1
as 5
far 1
To 1
where 1
it 2
bent 1
the 8
undergrowth; 1
Then 1
took 2
other, 1
just 1
fair, 1
having 1
perhaps 1
better 1
claim, 1
Because 1
was 1
grassy 1
and 3
wanted 1
wear; 1
Though 1
for 2
that 3
passing 1
there 1
Had 1
worn 1
them 1
really 1
about 1
same, 1
morning 1
equally 1
lay 1
In 1
leaves 1
no 1
step 1
had 1
trodden 1
black. 1
Oh, 1
kept 1
first 1
another 1
day! 1
Yet 1
knowing 1
how 1
way 1
leads 1
on 1
to 1
way, 1
doubted 1
if 1
should 1
ever 1
come 1
back. 1
shall 1
telling 1
this 1
with 1
sigh 1
Somewhere 1
ages 2
hence: 1
I- 1
less 1
traveled 1
by, 1
has 1
made 1
all 1
difference. 1


### Exercise: emoji translator

Write a function, emoji_translate, that replaces words with emoji icons.

In [None]:
EMOJIS = {
    ':)' : '😀',
    '<3' : '💙',
    'snek': '🐍',
    'pupper': '🐕'
}

def emoji_translate(text):
    # In-Class Code

In [None]:
emoji_translate('I <3 Notre Dame')

In [None]:
emoji_translate('Harry Potter speaks snek')

In [None]:
emoji_translate('Eirinn the Pupfessor is a good pupper')

In [None]:
from ipywidgets import interact

In [None]:
interact(emoji_translate, text='')

### Spell Checking Setup

In [1]:
# Use import requests to obtain public online files
import requests

# Import string to get the string library
import string

# wget is a Linux command used to download an online file
def wget(url, path):
    
    response = requests.get(url)
    
    with open( path, 'wb') as fh:
        fh.write(response.content)

In [2]:
wget('http://google.com', 'google.txt')

In [3]:
import os

In [4]:
# Review: Check the size of the file we just downloaded
os.path.getsize('google.txt')

14785

In [5]:
# We will now download a publicly available dictionary
wget('https://github.com/dwyl/english-words/raw/master/words.txt', 'words.txt')

In [6]:
os.path.getsize('words.txt')

4862992

In [8]:
# To review, we will print the first 100 words in the English dictionary
# Remember, with automatically opens and closes the file
with open('words.txt') as word_file:
    
    for index, line in enumerate(word_file):
        
        if index == 100:
            break
            
        print(line.strip())

2
1080
&c
10-point
10th
11-point
12-point
16-point
18-point
1st
2,4,5-t
2,4-d
20-point
2D
2nd
30-30
3D
3-D
3M
3rd
48-point
4-D
4GL
4H
4th
5-point
5-T
5th
6-point
6th
7-point
7th
8-point
8th
9-point
9th
a
a'
a-
A&M
A&P
A.
A.A.A.
A.B.
A.B.A.
A.C.
A.D.
A.D.C.
A.F.
A.F.A.M.
A.G.
A.H.
A.I.
A.I.A.
A.I.D.
A.L.
A.L.P.
A.M.
A.M.A.
A.M.D.G.
A.N.
a.p.
a.r.
A.R.C.S.
A.U.
A.U.C.
A.V.
a.w.
A.W.O.L.
A/C
A/F
A/O
A/P
A/V
A1
A-1
A4
A5
AA
AAA
AAAA
AAAAAA
AAAL
AAAS
Aaberg
Aachen
AAE
AAEE
AAF
AAG
aah
aahed
aahing
aahs
AAII
aal
Aalborg
Aalesund
aalii
aaliis


In [9]:
# Now we will load the dictionary into a list
def load_words_list(path):
    
    # Open the file using open(path) as the word_file
    # Add line of code here
    with open(path) as word_file:
    
        # Initialize a list
        words = []
        
        # Iterate through the word file
        for word in word_file:
            
            # Append the word into the dictionary
            # Change the word to lower case, and then strip
            # Add line of code here
            words.append(word.lower().strip())
    
    # Return the list representing the dictionary
    return words

In [10]:
english_words = load_words_list('words.txt')

In [11]:
len(english_words)

466550