## What are dictionaries and why should we care about them?




Dictionaries are for **associating data** and **quick lookup**

Motivating example: I am making an index for a book, because I want to know which concepts show up on which pages, to make it easier to jump back to the right spots.

In [4]:
# how to know which chapters talk about strings? or debugging?
book = [
  "Chapter 1: talks about strings and how they have the property of immutability also some basic debugging",
  "Chapter 2: continues talking about advanced methods for strings and also introduces the concept of functions",
  "Chapter 3: discusses iteration and lists and also debugging",
]

In [9]:
# without dictionaries
#where are the chapters with strings in it or debugging
concepts = ['strings', 'debugging']
index = []

for chapter in book:
    #split into elements based ont he colon
    elements = chapter.split(":")
    #first element is the chpater
    chapter = elements[0]
    #second element is the text
    text = elements[1]
    
    #parse the text
    words = text.split()
    for keyconcept in concepts:
        if keyconcept in words:
            index.append([keyconcept, chapter])
index

[['strings', 'Chapter 1'],
 ['debugging', 'Chapter 1'],
 ['strings', 'Chapter 2'],
 ['debugging', 'Chapter 3']]

In [14]:
# with dictionaries
#where are the chapters with strings in it or debugging
concepts = ['strings', 'debugging']
#make a dictionary to hold the index
index = {}

for chapter in book:
    #split into elements based ont he colon
    elements = chapter.split(":")
    #first element is the chpater
    chapter = elements[0]
    #second element is the text
    text = elements[1]
    
    #parse the text
    words = text.split()
    for keyconcept in concepts:
        if keyconcept in words:
            #index.append([keyconcept, chapter])
            #get the curren list of chapters associatiend with this concept
            chs = index.get(keyconcept,[])
            chs.append(chapter)
            index.update({keyconcept:chs})
index

{'strings': ['Chapter 1', 'Chapter 2'],
 'debugging': ['Chapter 1', 'Chapter 3']}

In [10]:
# with dictionaries
#given 
query = "strings"
for item in index:
    if query == item[0]:
        print(item[1])
    #here you have to remember the order of the list, what part you are indexing 
    

Chapter 1
Chapter 2


In [15]:
#given this data structure how to fidn all the cahpaters that have strings in them
#the get function will quickly grab it
query = "strings"
index.get("strings")

['Chapter 1', 'Chapter 2']

Another common use case: attributes of data entries. For instance, attributes of a class, like credit hours, pre-reqs, instructor, location, hours, and so on.

In [16]:
# without dictionaries
courses = [
    ["INST126", 3, "no", "Chan", "hybrid", "MWF"],
    ["INST256", 4, "yes", "Kanishka", "in-person", "TR"]
]

# look up INST126 and check whether it has prereqs
for course in courses:
    if course[0] == "INST126":
        print(course[2])

no


Rather than trying to remember which position we happened to have decided to use to store a particular attribute (if we used lists), we can use **semantically meaningful indices for values**, i.e., keys!

In [18]:
# with dictionaries
courses = {
    "INST126": {
      "credit hours": 3, "prereqs": "no", "instructor": "Chan", "location": "hybrid", "hours": "MWF"
      },
    "INST256": {
      "credit hours": 4, "prereqs": "yes", "instructor": "Kanishka", "location": "in-person", "hours": "TR"
      },
}

# look up INST126 and check whether it has prereqs
#get the course, then get what information about the course
courses.get("INST126").get("prereqs")

3

It's a lot easier to remember keys (if we name them useful things) compared to just indices. And Python can help us remember too!

If you're interested, there are also formal technical reasons to prefer dictionaries over lists if you care about speed/efficiency and your computational task is **checking** if an item exists in a collection *and* you're dealing with very large scale data: https://www.jessicayung.com/python-lists-vs-dictionaries-the-space-time-tradeoff/. 

Later we will learn the `pandas` library (and the `dataframe` data structure, which is sort of a hybrid of `lists` and `dictionaries`): you can do really fast lookup, but also sort stuff!

## Anatomy of a dictionary





Dictionaries are not so different from... our dictionaries in real life. :) Basically *map* a bunch of **keys** (e.g., a word) to corresponding **values** (e.g., a definition). Another example is indices in the back of print(!!!) books that map key terms to pages where that term shows up, or tags on websites, that map tags to webpages that include those tags.

Let's look at a simple example that maps letters to an example word that starts with the letter

In [19]:
#there needs to be a comma or a syntax error would occur
d = {
   'a': 'apple' # an entry that maps the value apple to the letter a
   'b': 'ball', # another entry that maps the value ball to the letter b
   'c': 'crayon' 
}

SyntaxError: invalid syntax (1899481055.py, line 3)

In [None]:
#each key can map to only one value
d = {
   'a': 'apple', # an entry that maps the value apple to the letter a
   'b': 'ball', # another entry that maps the value ball to the letter b
   'c': 'crayon' 
}
d = {'a': 'apple', 'b': 'ball', 'c': 'crayon'} # you can also write it out like this, but i find it harder to read

another = {
    'a': 1,
    'b': 2,
    'c': 3
}

grades = {
    'A': [93, 100],
    'B': [87, 93]
}

In [None]:
grades.get("A")

[93, 100]

The key parts of a dictionary are:
1. The `{ }` curly braces tell you and Python that it's a dictionary (similar to `""` for strings, or `[]` for lists)
2. Each entry has maps a **value** on the right of a  `:` --- which functions like the `=` expression --- to a **key** on the left. For example, our first entry maps the value "apple" to the key "a".
3. We include `,`, similar to lists, to separate entries in the dictionary.

### Properties of a dictionary

Similar to lists, dictionaries have **length**. 

Different from lists, dictionaries **do not have an order**. So you can't really sort a list, or grab things by position. You grab things by... key!

Also, all keys in a dictionary have to be **unique**. This is handy for keeping track of unique items. Values in the dictionary do *not* have to be unique, though: you can have different keys point to the same value, but not multiple values point to duplicate keys. There is a related data structure that has a similar property called `sets` if you're interested.

Dictionaries are also **mutable**: you can modify them directly (in contrast to strings, where you never modify them directly, but only ever create a new modified version of the string).

In [None]:
#you don't know the order to a dictionary, it is different than a list
d = {
   'a': 'apple', # an entry that maps the value apple to the letter a
   'b': 'ball', # another entry that maps the value ball to the letter b
   'c': 'crayon' 
}
len(d)

3

In [None]:
d = {
   'a': 'apple', # an entry that maps the value apple to the letter a
   'b': 'ball', # another entry that maps the value ball to the letter b
   'c': 'crayon',
   'a': 'animal' 
}
print(d)
d = {
   'a': ['apple', "animal"], # an entry that maps the value apple to the letter a
   'b': 'ball', # another entry that maps the value ball to the letter b
   'c': 'crayon',
   'd': [1, 3, "denizen"]
}
print(d)

{'a': 'animal', 'b': 'ball', 'c': 'crayon'}
{'a': ['apple', 'animal'], 'b': 'ball', 'c': 'crayon'}


In [None]:
dempty = {}
dempty

{}

### What kinds of data can we put in a dictionary?

Basically anything goes for **values**. You can even nest a dictionary inside another dictionary, by mapping a dictionary value to some key.

In [None]:
students = {
    'joel': {
        'major': 'info sci',
        'year': 'senior',
        'interests': ['programming', 'football', 'dancing']
    },
}

But **keys** need to be *hashable*. More info here: https://stackoverflow.com/questions/14535730/what-does-hashable-mean-in-python

In [None]:
d = {[1, 2, 3]: 'apple'}
#keys have to be hashable, values can be anything

TypeError: ignored

I mention this because a common error when first working with dictionaries is to try to use an unhashable data structure as a key. The basic rule of thumb for now is: strings and numbers are ok as keys; everything else (that you'll learn now) is not.

## Working with dictionaries: basics


### How to create a dictionary

In [None]:
d = {
   'a': 'apple',
   'b': 'ball',
   'c': 'crayon', 
}

You can also start with an empty dictionary, and then add stuff later, programmatically or with other functions.

In [None]:
emptyd = {} # dictionary with nothing in it
emptyd = dict() # same thing
print(emptyd)
len(emptyd)

{}


0

### Looking up entries in a dictionary

This is called the "old style" or "indexing" pattern. Looks a little bit like lists.

In [None]:
d = {
   'a': 'apple',
   'b': 'ball',
   'c': 'crayon' 
}
d['c'] # put a key inside square brackets associated with a dictionary

'crayon'

This is the newer pattern that I prefer for clarity.

In [None]:
d = {
   'a': 'apple',
   'b': 'ball',
   'c': 'crayon' 
}
d.get('c') # use the get function to get the value for the key that we give it

'crayon'

It also has the advantage of not breaking your program if you try to access a key that doesn't exist.

In [None]:
d = {
   'a': 'apple',
   'b': 'ball',
   'c': 'crayon' 
}
d['d'] # will crash the program with key error

KeyError: ignored

`.get()` lets you specify a default value that should come back if the key doesn't exist. This is very useful for writing clean and understandable dictionary patterns, such as indexing, which we'll dig into next week.

In [None]:
d = {
   'a': 'apple',
   'b': 'ball',
   'c': 'crayon' 
}
v = d.get('d') # will return None as a default
print(v)
v = d.get('d', "key not found") # but you can also specify a default value. this is very handy for initializing counts/indices in dictionaries
print(v)

None
key not found


### Adding/updating entries to a dictionary

Classic style, using indexing and assignment

In [None]:
d = {
   'a': 'apple',
   'b': 'ball',
   'c': 'crayon' 
}
print(d)
# add a new entry for e
d['e'] = 'egg' # map the value egg to the key e
print(d)
# update the entry for b
d['b'] = 'bread' # map the value bread to the key b (which happens to already exist, so we update it)
print(d)
# update the entries for a and c
d['a'] = 'ashes'
d['c'] = 'charming' 
print(d)

{'a': 'apple', 'b': 'ball', 'c': 'crayon'}
{'a': 'apple', 'b': 'ball', 'c': 'crayon', 'e': 'egg'}
{'a': 'apple', 'b': 'bread', 'c': 'crayon', 'e': 'egg'}
{'a': 'ashes', 'b': 'bread', 'c': 'charming', 'e': 'egg'}


Newer style, using `.update()`

In [None]:
d = {
   'a': 'apple',
   'b': 'ball',
   'c': 'crayon' 
}
print(d)
# add a new entry for e
d.update({'e': 'egg'}) # map the value egg to the key e
print(d)
# update the entry for b
d.update({'b': 'bread'}) # map the value bread to the key b (which happens to already exist, so we update it)
print(d)
# update the entries for a and c
d.update({'a': 'ashes', 'c': 'charming'})
print(d)

{'a': 'apple', 'b': 'ball', 'c': 'crayon'}
{'a': 'apple', 'b': 'ball', 'c': 'crayon', 'e': 'egg'}
{'a': 'apple', 'b': 'bread', 'c': 'crayon', 'e': 'egg'}
{'a': 'ashes', 'b': 'bread', 'c': 'charming', 'e': 'egg'}


`.update()` has the advantage of being able to add multiple key-value pairs at once.

In [None]:
d = {
   'a': 'apple',
   'b': 'ball',
   'c': 'crayon' 
}
print(d)
d.update({'a': 'ashes', 'b':'bread', 'c':'charming', 'e': 'egg'})
print(d)

{'a': 'apple', 'b': 'ball', 'c': 'crayon'}
{'a': 'ashes', 'b': 'bread', 'c': 'charming', 'e': 'egg'}


You can use the pattern that is comfortable for you, but I prefer `.get()` and `.update()` for now because it's more readable and robust.

### List keys and values

.keys()

.values()

.items()

each of these is iterable

In [None]:
# .keys() gives you all of the keys
# .values() gives you all of the values
# .items gives all of the key-value pairs

In [None]:
d

{'a': 'ashes', 'b': 'bread', 'c': 'charming', 'e': 'egg'}

In [None]:
# list all the keys in the dictionary
d.keys()

dict_keys(['a', 'b', 'c', 'e'])

In [None]:
# list all the values in the dictionary
d.values()

dict_values(['ashes', 'bread', 'charming', 'egg'])

In [None]:
# list all the key-value pairs in the dictionary
d.items()

dict_items([('a', 'ashes'), ('b', 'bread'), ('c', 'charming'), ('e', 'egg')])

In [None]:
# this means you can iterate through the keys/values
for key in d.keys():
  print(key)

a
b
c
e


In [None]:
# iterate through the items
for key, value in d.items():
  print(f'{key} is associated with the value {value}')

a is associated with the value ashes
b is associated with the value bread
c is associated with the value charming
e is associated with the value egg


### Check if a key is in a dictionary

In [None]:
d = {
   'a': 'apple',
   'b': 'ball',
   'c': 'crayon' 
}
print('a' in d)

True


Can only use 'in' operator with keys

You can also do this with `.get()`!

In [None]:
# retrieve value for a, if it's not found, say "not found"
d.get("a", "Not found")

### Reverse look up keys from values: YOU CAN'T! Not really...

Dictionaries are very powerful transformations of your data that make it REALLY easy to do a specific kind of operation, but lock you out of doing other things. So design the structure of the dictionary carefully. For example, if you make an index, and find that you actually care a lot about grabbing the top N words, you probably want to map counts (as keys) to words (as values), not words to counts.

In [None]:
s = "she sells sea shells by the sea shore in the sea and the shells and the sea sea sea"

d = {} # define a dictionary to hold hte index
# go through word by word
for word in s.split():
  # get the current count for the word, default to 0 if we haven't seen it
  current_count = d.get(word, 0)
  # update the count
  new_count = current_count + 1
  # update the dictionary with the word and count
  d.update({word: new_count})

d

{'and': 2,
 'by': 1,
 'in': 1,
 'sea': 6,
 'sells': 1,
 'she': 1,
 'shells': 2,
 'shore': 1,
 'the': 4}

Could try to invert it, but....

In [None]:
def reverse_dictionary(d):
  return {v: k for k, v in d.items()}

You actually lose information, because remember: keys are unique! No duplicates! So we lose one of our "2" entries.

In [None]:
d_invert = reverse_dictionary(d)
d_invert

{1: 'in', 2: 'and', 4: 'the', 6: 'sea'}

In [None]:
d_invert = {}
for word, count in d.items():
  words = d_invert.get(count, []) # get the current list of words associated with this count
  words.append(word)
  d_invert.update({count: words})
d_invert

{1: ['she', 'sells', 'by', 'shore', 'in'],
 2: ['shells', 'and'],
 4: ['the'],
 6: ['sea']}

## Advanced

Next week we will dig into some design patterns and advanced usage of dictionaries, and briefly introduce files. For now, consider this worked example of our problem from the beginning.

In [None]:
# how to know which chapters talk about strings? or debugging?
book = [
  "Chapter 1: talks about strings and how they have the property of immutability also some basic debugging",
  "Chapter 2: continues talking about advanced methods for strings and also introduces the concept of functions",
  "Chapter 3: discusses iteration and lists and also debugging",
]

In [21]:
# with dictionaries
index = {} # dictionary to hold index
keywords = ["strings", "debugging", "immutability", "functions"]
for chapter in book:
    
    #parse the chapter entry
    elements = chapter.split(":") # split into chapter and contents
    #key
    ch_num = elements[0]
    #values
    contents = elements[1]
    words = contents.split(" ")
    # go through every word
    for word in words:
        if word in keywords: # if it's a keyword
            # index.append([word, ch_num]) # associate word wit hchapter number
            keyword = word
            # get current chapters associated with this word; default to empty list
            chapters = index.get(keyword, []) 
            #update the value
            chapters.append(ch_num) # add the current chapter number
            #update the dictionary
            #update the index dictionary with this key:with this value
            index.update({keyword: chapters}) # update the index
index

{'strings': ['Chapter 1', 'Chapter 2'],
 'immutability': ['Chapter 1'],
 'debugging': ['Chapter 1', 'Chapter 3'],
 'functions': ['Chapter 2']}