## Chapter 11 - Dictionaries

Dictionaries are a built-in type in Python.  They are one approach to many very efficient processes.  

-- Dictionaries are mappings.
A dictionary is a like a list, but contains more flexibility.  Instead of associating one element with each non-negative integer index, it can associates each element with a general "key."  

These associations can be considered as key-value pairs, much like a word-definition pair (aka an item) in a language dictionary.  

This association in math is called a mapping; a dictionary maps a key to a value.  A key should map to one and only one value.  A value could be mapped to by more than one key.  


In [1]:
#Creating a dictionary
#one way to create a dictionary is using the 
#dict() constructor. Our example will be 
#Spanish to English colors
sp2eng = dict() #this creates an empty dictionary
sp2eng
#output is an empty set of {}'s, which denotes
#a dictionary.

{}

In [2]:
#We can add a key value pair like so:
sp2eng['rojo'] = 'red'
sp2eng

{'rojo': 'red'}

In [11]:
#another way to create a dictionary is 
#directly using the squiggly brackets.
sp2eng = {'rojo': 'red', 'azul': 'blue'}
sp2eng

{'rojo': 'red', 'azul': 'blue'}

In [36]:
#Dictionaries are not ordered!
sp2eng['amarillo'] = 'orange??'
sp2eng['verde'] = 'green'
sp2eng['amarillo'] = 'yellow'
display(sp2eng)
#The order may be not the same as how
#you entered them.  
#Dictionaries don't have a first, second, third
#element, they just have items that are matched
#keys to values.

{'rojo': 'red', 'azul': 'blue', 'amarillo': 'yellow', 'verde': 'green'}

In [37]:
#You can look up/retrieve values
#from their keys
sp2eng['rojo']

'red'

In [38]:
#But if you look for a key that's not in the 
#dictionary, you get an error
sp2eng['Rojooooooo']

KeyError: 'Rojooooooo'

In [42]:
#Dictionaries can be converted to list-like
#types, to use list functionality 
#you can retrieve the keys, the values,
#and the items (pairs)
display(sp2eng.keys())
display(sp2eng.values())
display(sp2eng.items())
display('rojo' in sp2eng.keys())
display('rojoooo' in sp2eng.keys())

dict_keys(['rojo', 'azul', 'amarillo', 'verde'])

dict_values(['red', 'blue', 'yellow', 'green'])

dict_items([('rojo', 'red'), ('azul', 'blue'), ('amarillo', 'yellow'), ('verde', 'green')])

True

False

In [6]:
#If there is no sorting, how does Python look
#up entries to see if they're there or not?

#Answer: the HASHTABLE / Hash function
#Remarkable part: lookups on a Hash
#take the same amount of time no matter
#how many entries are in the data set.
#This is different than a list, where 
#time increases with size of list.
from time import perf_counter as time
s = []
d = {}
ceiling = 10**7
start=time()
for i in range(ceiling):
    s.append(i)
    d[i] = i
end = time()
loop_time = end-start

start = time()
ceiling-1 in s
end = time()
list_time = end-start

start = time()
ceiling-1 in d.keys()
end = time()
dictionary_time = end-start

print('loop time:', loop_time)
print('list time:',list_time)
print('dictionary time:',dictionary_time)

loop time: 2.1616728489999844
list time: 0.164864184999999
dictionary time: 4.164800000694413e-05


# Applications of a Dictionary

Suppose you're given a long string, and you want to compute the frequency of each letter in the string. There are a few approaches:  
1) create 26 variables to store the counts of each letter, loop through the string and update each variable accordingly.  
2) Create a list with 26 elements.  You could convert each number into an index, then increment the corresponding element on the list each time you see that letter in the string.  
3) make a dictionary with keys as letters and counters as values.  Dictionaries are set up to do this very efficiently and easily.

In [13]:
#a histogram is a graphical/numerical collection of 
#counters/frequencies of events.
def histogram(s):
    d = dict()
    for c in s:
        if c not in d:
            d[c] = 1
        else:
            d[c] += 1
    return d

x = histogram('the quick brown fox jumps over the lazy dog')
display(len(x))
x

27

{'t': 2,
 'h': 2,
 'e': 3,
 ' ': 8,
 'q': 1,
 'u': 2,
 'i': 1,
 'c': 1,
 'k': 1,
 'b': 1,
 'r': 2,
 'o': 4,
 'w': 1,
 'n': 1,
 'f': 1,
 'x': 1,
 'j': 1,
 'm': 1,
 'p': 1,
 's': 1,
 'v': 1,
 'l': 1,
 'a': 1,
 'z': 1,
 'y': 1,
 'd': 1,
 'g': 1}

In [18]:
#using a dictionary in a for loop
#Python implements some very convenient handling
#for dictionary items with a for loop.
def print_hist(h):
    for c in h:
        print(c, h[c])
        
print_hist(histogram('dinosaur'))
#okay, that was nice...
#but maybe we can even do better...

d 1
i 1
n 1
o 1
s 1
a 1
u 1
r 1


In [19]:
def print_hist(h):
    for key, value in h.items():
        print(key, value)
print_hist(histogram('dinosaur'))

d 1
i 1
n 1
o 1
s 1
a 1
u 1
r 1


Reverse lookup  
Given a dictionary d and a key k, you can easily lookup the corresponding value.  This is called a lookup.  
But what if you have v and want any keys that map to v?  
One simple way:

In [23]:
def reverse_lookup(d,v):
    for k in d:
        if d[k] == v:
            return k
    raise LookupError()
    
h = histogram('aacaabbd')
reverse_lookup(h,2)

'b'

In [24]:
#alternately, 
reverse_lookup(h, 6)

LookupError: 

# Dictionaries and lists.
 You can have lists as the values of a dictionary, but not as the keys.  Lists in Python are unhashable.  For example:

In [25]:
t = [1,2]
d = {}
d['a'] = t #works okay.
d[t] = 'a' #doesn't work

TypeError: unhashable type: 'list'

Then what does it mean to be hashable/unhashable?  What's a hash?  
A hash is a function that takes a value and returns an integer.  Dictionaries use these integers called hash values to store and lookup key-value pairs.  
Works fine if the keys are immutable, but harder if the keys are mutable like lists.  
When you create a key-value pair map in a Python dictionary, Python hashes the key and stores it in the corresponding integer location in memory.  
It's for this reason that lookups are so fast - if you look up a key value, Python hashes it and instantly knows where to look in memory - if there's a value there, it returns it, if there isn't a value there, it knows the key doesn't exist in the dictionary.

# Other uses for Dictionaries:  Memoing

Remember the fibonacci sequence code we had which was recursive?  It took a long time to run to find high-value terms of fibonacci sequence because it re-calculated terms inefficiently.  
But, using dictionaries to record calculated fibonacci sequence values, we can improve the run-time substantially!

In [7]:
known = {0:0, 1:1}

def fibonacci(n):
    if n in known:
        return known[n]

    new = fibonacci(n-1) + fibonacci(n-2)
    known[n] = new
    return new

fibonacci(36)

14930352

In [8]:
#Compare with the original code from chapter 6.
def fibonacci(n):
    if n == 0:
        return 0
    elif  n == 1:
        return 1
    else:
        return fibonacci(n-1) + fibonacci(n-2)
fibonacci(36) 

14930352

# Global Variables  
Okay, that's all neat.  Now for a quick foray into a question I know some of you have asked before, about scope of variables in and out of functions (and will be relevant also in/out of objects).

Note that the "known" variable above exists outside of the function.  This was necessary to have access to it in each and every instance of the function call.  (It could have been passed back and forth along with each call, but that is more complicated than just having it global).  

There are a few subtleties in global variable usage.  


In [10]:
#One use is to make a boolean flag for verbose
#outputs.

verbose = True
def ex1():
    if verbose:
        print('Running ex1')
ex1()

Running ex1


In [11]:
#But if you try to reassign a global, 
#you might be surprised.
new_bool = False
def ex2():
    new_bool = True #need to make a change here
ex2()
new_bool

False

In [14]:
#You can see that global new_bool didn't change.  
#In the function, it creates a new local variable
#called new_bool and sets -that- to True.
#So how can we reassign the global?
new_bool = False
def ex2():
    global new_bool #This line makes it work!
    new_bool = True #need to make a change here
ex2()
new_bool

#NB: We got around it with "known" because we 
#aren't reassigning, we are using one of its
#built in abilities, making a key-value pair
#i.e. known[n] = new
#So, you can add/remove/replace elements of a global
#list or dictionary without this step,
#But if you want to reassign the variable,
#you have to declare it global like this.

True

Global handling is a little clunky and handled different in every language.  But in Python, if you stick to the approach above, you should be okay.

## Exercises!

In [17]:
#To get you started:
import time
def make_word_dict():
    """Reads lines from a file and builds a dictionary"""
    t = {}
    fin = open('words.txt')
    for line in fin:
        word = line.strip()
        ## ENTER DICTIONARY ENTRY LINE HERE
    return t

d = make_word_dict()
t1 = time.time()
'aardvark' in d
t2 = time.time()
'zebra' in d
t3 = time.time()
print(t3 - t2, t2 - t1)

3.1948089599609375e-05 5.1021575927734375e-05
