# Dictionaries

Dictionaries are like lists in that they hold collections of data.  However, as we discussed with the for and while loops, they have fundamentallf different purposes and so they act in very different ways.  Don't try to understand dictionaries like you understand lists.  Yes, they each hold things, but that's where similarites stop.

Think of a dictionary as a small database.  It has two pieces:  unique IDs, and associated records.  We call these keys and values.  

## Think of an apartment building

Your apartment has a unique address.  First off, the street address tells you which unique buildng you want to find.  There may be several identical buildings, but each has a unique address.

Inside that apartment building there are unit IDs.  These may be letters (A, B, etc), labels (Garden, Penthouse, etc), or numbers (6, 303. etc).  Many apartment buildings share the same numbering systems, and this is fine because that apartment belongs to that building.  So the unique combination of street address and unit number can uniquly identify a unit in the same city.

Inside these units you can hold whatever you want.  Usually household goods, food, pets, collectables, human bodies, etc.  The idea is that in order to access this unit, you must go into a building and enter a specific ID.  The IDs in that unit must be unique.  

You also have to access things in order.  So you can't just directly go into an apartment number.  Even if that apartment door is outside, you still have to find the building.  Imagine saying, "Hi, I live in New York City and I likein apartment 3.  The party will be at 8pm."  Will anyone be at your party?  Not so much.

In this analogy, the building is our dictionary.  The street address is our variable name, and the unit number is our key.  You use the key with the variable name to access the contents of that unit.

That's pretty much it.

In [3]:
mybuilding = {1: ['cat', 'adult human', 'adult human',
                  'human child', 'rosy boa', "Dumeril's boa"],
              2: ['adult human'],
              3: ['adult human', 'adult human', 'human child', 'dog?']}

If I'm using this to represent my apartment, there are 3 units:  `1`, `2`, and `3`.  Each of these unit labels is actually my key.  See the : in there after each key?  That is separating the key and the value, and the order does matter.  It is always `key: value`.  Meanwhile, each of my values is a list that contains strings of the occupants.

As we start covering syntax, I suggest that you write them down or make a small cheat sheet for yourself.  You'll likely want to look it up each time you want to use it until you learn them by heart.

## Accessing values with a key

Here's how we can look up the contents:  `dictvariable[key]`

In [4]:
print(mybuilding[1])

['cat', 'adult human', 'adult human', 'human child', 'rosy boa', "Dumeril's boa"]


In [5]:
print(mybuilding[2])

['adult human']


In [6]:
print(mybuilding[3])

['adult human', 'adult human', 'human child', 'dog?']


And if I try to look up a key that isn't in there?  I get a key error.

In [7]:
print(mybuilding[4])

KeyError: 4

Something to note is that these keys and values can be of any data type, and you can have a combination of them in the same dictionary.  You just need to match how that data type is actually typed in.  Don't forget the quotes if you have a string!

In [8]:
anotherbuilding = {'Garden': ['Adult human'],
                    2: ['Adult human', 'Adult human', 'cat']}

## Adding a new key/value pair

Say that I leart more about my neighbors and I want to add more things in.  I can add garages as well.

The syntax for adding a key/value looks like an assignment statement extending our lookup syntax.

`dictvariable[new_key] = new_value`

So you place the new key that you want to add in the `[]` and whatever the value is after the `=`.

In [9]:
mybuilding['garage 1'] = ['car', 'innumerable crap']

In [11]:
print(mybuilding)

{1: ['cat', 'adult human', 'adult human', 'human child', 'rosy boa', "Dumeril's boa"], 2: ['adult human'], 3: ['adult human', 'adult human', 'human child', 'dog?'], 'garage 1': ['car', 'innumerable crap']}


Take a closer look at that syntax.  It has the `[]` with the new key on the left side of the `=` statement and only 1 `=`.  So I don't need to completely reassign my dictionary to change it.

## Changing the value with a given key

Warning!  This syntax is also shared for the reassignment.  If you reuse a key that is in the dictionary, it will overwrite that value without warning.

In [16]:
mybuilding['garage 1'] = ['car', 'bike', 'bike', 'innumerable crap']

In [17]:
print(mybuilding)

{1: ['cat', 'adult human', 'adult human', 'human child', 'rosy boa', "Dumeril's boa"], 2: ['adult human', 'human child'], 3: ['adult human', 'adult human', 'human child', 'dog?'], 'garage 1': ['car', 'bike', 'bike', 'innumerable crap']}


Alternatively, since our value for each key is in fact a mutator, we can directly reference that object using our access syntax and alter it.

Say that the person in apartment 2 has a baby and I want to add that to their record.  Since the data type of that value is a list, I can call .append() to that lookup statement.

In [14]:
mybuilding[2].append('human child')

In [18]:
print(mybuilding)

{1: ['cat', 'adult human', 'adult human', 'human child', 'rosy boa', "Dumeril's boa"], 2: ['adult human', 'human child'], 3: ['adult human', 'adult human', 'human child', 'dog?'], 'garage 1': ['car', 'bike', 'bike', 'innumerable crap']}


## checking to see if a key is in there

You'll often be looping over some data and building up a dictionary in the process.  Oftentimes you'll want to handle things defferently if that key is not already in the dictionary.  For example, let's say I want to add some new people to a unit not already in my dictionary.

In [19]:
mybuilding[5].append('adult human')

KeyError: 5

We can check this with the `in` keyword that we've seen elsewhere.  This will create a boolean expression that will return `True` if the key does exist in that dictionary, and `False` if it doesn't.

In [20]:
5 in mybuilding

False

I can't append something to a key that doesn't already exist, but if I use my assignment statement I'll destroy the data I already have in there.  Here's a fragment showing how we might handle such a situation.

In [21]:
new_member = 'adult human'
key = 5

if key in mybuilding:
    mybuilding[key].append()
else:
    mybuilding[key] = [new_member]

In [23]:
print(mybuilding)

{1: ['cat', 'adult human', 'adult human', 'human child', 'rosy boa', "Dumeril's boa"], 2: ['adult human', 'human child'], 3: ['adult human', 'adult human', 'human child', 'dog?'], 'garage 1': ['car', 'bike', 'bike', 'innumerable crap'], 5: ['adult human']}


This is an incredibly common pattern.  You'll also notice that I can reference my key as a variable.  Hint hint, this means you can loop over a set of keys and access the values in turn.

## Getting the keys and values out separately

There are several helpful dictionary methods to use.

Warning!  Dictionaries have no actual order.  The following methods will give you lists, which have order, but that order should not be depended on.

`mydict.keys()` will give you the keys and `mydict.values()` will give you the values.

In [24]:
print(list(mybuilding.keys()))

[1, 2, 3, 'garage 1', 5]


In [25]:
print(list(mybuilding.values()))

[['cat', 'adult human', 'adult human', 'human child', 'rosy boa', "Dumeril's boa"], ['adult human', 'human child'], ['adult human', 'adult human', 'human child', 'dog?'], ['car', 'bike', 'bike', 'innumerable crap'], ['adult human']]


I know they may look like they are in order, but you shouldn't depend on that.

## getting the keys and values out together

Here's how you can get the pairs of data out such that the pair relationship can be reliably maintained.

In [26]:
print(list(mybuilding.items()))

[(1, ['cat', 'adult human', 'adult human', 'human child', 'rosy boa', "Dumeril's boa"]), (2, ['adult human', 'human child']), (3, ['adult human', 'adult human', 'human child', 'dog?']), ('garage 1', ['car', 'bike', 'bike', 'innumerable crap']), (5, ['adult human'])]


These are tuple pairs, and are valuable when looping over them.

## looping over lists

You may want to loop over the entire thing at once.  Using `.items()` for this is great for when you need both the key and value items out at the same time, and you want to keep your code very tidy.

In [27]:
for key, value in mybuilding.items():
    print("Unit", key, "has", ", ".join(value))

Unit 1 has cat, adult human, adult human, human child, rosy boa, Dumeril's boa
Unit 2 has adult human, human child
Unit 3 has adult human, adult human, human child, dog?
Unit garage 1 has car, bike, bike, innumerable crap
Unit 5 has adult human


However, sometimes you might want to mess with the keys and only look up a few things.  (massive homework hint) In this case, you can extract out the list of keys, do what you need with it, and then loop through those remaining (LITERALLY DYING OF HINT HERE) keys.

Say that I only want the integer value keys.

In [31]:
allkeys = mybuilding.keys()

justints = []

for key in allkeys:
    if type(key) == int:
        justints.append(key)
    else:
        print("rejected!", key)
        
print("Remaining keys are:", justints)

for key in justints:
    print(key, "has:", ", ".join(mybuilding[key]))

rejected! garage 1
Remaining keys are: [1, 2, 3, 5]
1 has: cat, adult human, adult human, human child, rosy boa, Dumeril's boa
2 has: adult human, human child
3 has: adult human, adult human, human child, dog?
5 has: adult human


## Counting occurances

There are several things we can do to count occurances, but I suggest using the `Counter` object first.

Let's revisit last week's example of randomly counting to 100.  Below we've got code that runs the simulation 10,000 times and collects up the number of steps taken for each.

In [33]:
import random

# priming all the values
count_sum = 0
top = 100
step_count = 0

all_step_counts = []

for i in range(10000):
    while count_sum < 100:
        my_num =  random.randint(0, 10)

        count_sum += my_num # sum = sum + my_num
        step_count += 1 # step_count = step_count + 1
#     print(sum, step_count)
    all_step_counts.append(step_count)
    # reset!
    count_sum = 0
    step_count = 0

print("The average number of steps to take is:", sum(all_step_counts) / len(all_step_counts))

The average number of steps to take is: 20.5735


Now we have a list of 10,000 data points, but we want to count them up.  We can use `Counter` to do just this.

First we'll need to import it to bring the module into our namespace, then we can use the function.

In [64]:
from collections import Counter

print(Counter("hello howdy"))
print(Counter("hello howdy").most_common(2))

Counter({'h': 2, 'l': 2, 'o': 2, 'e': 1, ' ': 1, 'w': 1, 'd': 1, 'y': 1})
[('h', 2), ('l', 2)]


This tool takes a sequence (string, list, etc) and counts how many times each item occurs.  This isn't exactly a dictionary, but it acts a lot like one.  Let's apply it to our results.

In [36]:
countedsteps = Counter(all_step_counts)
print(countedsteps)

Counter({20: 1429, 19: 1364, 21: 1296, 22: 1091, 18: 1086, 23: 880, 17: 701, 24: 587, 16: 419, 25: 403, 26: 223, 15: 171, 27: 161, 28: 58, 29: 45, 14: 41, 30: 19, 31: 15, 13: 8, 32: 2, 33: 1})


In [39]:
# here are the unique values

counts = list(countedsteps.keys()) 

# now I can sort them

counts.sort()

print(counts)

[13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33]


In [40]:
# now I can loop in order

for step in counts:
    print(step, ":", countedsteps[step])

13 : 8
14 : 41
15 : 171
16 : 419
17 : 701
18 : 1086
19 : 1364
20 : 1429
21 : 1296
22 : 1091
23 : 880
24 : 587
25 : 403
26 : 223
27 : 161
28 : 58
29 : 45
30 : 19
31 : 15
32 : 2
33 : 1


In [56]:
# now we can do a janky histogram

for step in counts:
    print(step, ":", (countedsteps[step] // 15) * 'X')

13 : 
14 : XX
15 : XXXXXXXXXXX
16 : XXXXXXXXXXXXXXXXXXXXXXXXXXX
17 : XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
18 : XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
19 : XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
20 : XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
21 : XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
22 : XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
23 : XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
24 : XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
25 : XXXXXXXXXXXXXXXXXXXXXXXXXX
26 : XXXXXXXXXXXXXX
27 : XXXXXXXXXX
28 : XXX
29 : XXX
30 : X
31 : X
32 : 
33 : 


Likewise, we can filter things on the fly.  Say that we only want to see the values for steps 13 through 20.

In [59]:
for step in counts:
    if step >= 13 and step <= 20:
        print(step, ':', countedsteps[step])

13 : 8
14 : 41
15 : 171
16 : 419
17 : 701
18 : 1086
19 : 1364
20 : 1429


However, if we want to see the most common things, we don't need to do much that's fancy. The counter object comes with some nice things.  Be careful here, because counter really only works well with a large number of things to count.  Sometimes things get weird when you have repeated counts.  It'll give you the 10 most common items, not the items in the 10 most common values.

In [65]:
countedsteps.most_common()

[(20, 1429),
 (19, 1364),
 (21, 1296),
 (22, 1091),
 (18, 1086),
 (23, 880),
 (17, 701),
 (24, 587),
 (16, 419),
 (25, 403),
 (26, 223),
 (15, 171),
 (27, 161),
 (28, 58),
 (29, 45),
 (14, 41),
 (30, 19),
 (31, 15),
 (13, 8),
 (32, 2),
 (33, 1)]

Here's a little program we can run to actually grab the items with the top 6 frequencies.  We're doing this by way flipping the dictionary around.  Instead of having a unive value with a count, we want all the unique counts and their values.  

In [74]:
s = Counter("hello howdy how are you fine day more letters here hahahaha")
print(s.most_common())

tops = {}

howmany = 6

pos = 0
alls = s.most_common()

while len(tops) <= howmany:
    content, count = alls[pos]
    if count in tops:
        tops[count].append(content)
    else:
        tops[count] = [content]
    pos += 1
        
print(tops)

[(' ', 10), ('h', 8), ('e', 8), ('a', 6), ('o', 5), ('r', 4), ('l', 3), ('y', 3), ('w', 2), ('d', 2), ('t', 2), ('u', 1), ('f', 1), ('i', 1), ('n', 1), ('m', 1), ('s', 1)]
{10: [' '], 8: ['h', 'e'], 6: ['a'], 5: ['o'], 4: ['r'], 3: ['l', 'y'], 2: ['w']}


# More crappy state data

Sometimes you'll need to create your own dictionary of things before you can get into the fun stuff. We're going to play with the data that we collected to create a dictionary of what we found.  So that we have the states as our keys and the cities as the values.

Remember, this is the code that we previously did.  We could add in the dictionary stuff into this code, but we don't have to.  Instead, we can directly use the list that we create to make the dictionary.

In [2]:
fileio = open('crappystatedata.txt', 'r')

text = fileio.read()
lines = text.split("\n")

allchunks = []
statechunk = []

foundfirst = False

for line in lines:
    if line.endswith("[edit]") and foundfirst == False:
#         print("This is the first state!", line)
        statechunk.append(line)
        foundfirst = True
    elif line.endswith("[edit]"):
#         print("a new state has begun!", line)
        # add completed state
        allchunks.append(statechunk)
        # reset the chunk
        statechunk = [line]
    else:
        statechunk.append(line)
    allchunks.append(statechunk)
        

This will be another accumulator pattern, and `{}` represents an empty list.  Remember that we can't add something to a dictionary that doesn't exist, and we can't define the dictionary in our loop, because it'll be erased each time the loop starts over.  So as we've done elsewhere, we need to declare out dictionary outside of and before our for loop through the data.

Remember the structure of our allstates list.  This is a list of lists, where the first element of the list is the state name and the remaining elements are all the cities.

Let's remember our list position lookups.

* "first element of" is `list[0]`
* "everything else (after the first)" is `list[1:]` We're omitting the stop position because there are a variable number of values in each, so when we don't provde a stop position it says "go until the end".

The state will be our key and the cities will be our value, so let's remind ourselves of the assignment syntax.

`dict[state] = cities`

In [3]:
collegecities = {}

for statechunk in allchunks:
    state = statechunk[0]
    cities = statechunk[1:]
    collegecities[state] = cities

In [4]:
collegecities

{'Alabama[edit]': ['Auburn (Auburn University, Edward Via College of Osteopathic Medicine)[7]',
  'Birmingham (University of Alabama at Birmingham, Birmingham School of Law, Cumberland School of Law, Miles Law School)[8]',
  'Dothan (Fortis College, Troy University Dothan Campus, Alabama College of Osteopathic Medicine)',
  'Florence (University of North Alabama)',
  'Homewood (Samford University)',
  'Huntsville (University of Alabama, Huntsville)',
  'Jacksonville (Jacksonville State University)[9]',
  'Livingston (University of West Alabama)[9]',
  'Mobile (University of South Alabama)[8]',
  'Montevallo (University of Montevallo, Faulkner University)[9]',
  'Montgomery (Alabama State University, Huntingdon College, Auburn University at Montgomery, H. Councill Trenholm State Technical College, Faulkner University)',
  'Troy (Troy University)[9][10]',
  'Tuscaloosa (University of Alabama, Stillman College, Shelton State)[11][12]',
  'Tuskegee (Tuskegee University)[13]'],
 'Alaska[edi

This is great!  Except we've got some junk happening in the state name with the edit.  We can go back in and edit that to make it more readable.

In [5]:
collegecities = {}

for statechunk in allchunks:
    state = statechunk[0].replace("[edit]", "")
    cities = statechunk[1:]
    collegecities[state] = cities


In [6]:
collegecities

{'Alabama': ['Auburn (Auburn University, Edward Via College of Osteopathic Medicine)[7]',
  'Birmingham (University of Alabama at Birmingham, Birmingham School of Law, Cumberland School of Law, Miles Law School)[8]',
  'Dothan (Fortis College, Troy University Dothan Campus, Alabama College of Osteopathic Medicine)',
  'Florence (University of North Alabama)',
  'Homewood (Samford University)',
  'Huntsville (University of Alabama, Huntsville)',
  'Jacksonville (Jacksonville State University)[9]',
  'Livingston (University of West Alabama)[9]',
  'Mobile (University of South Alabama)[8]',
  'Montevallo (University of Montevallo, Faulkner University)[9]',
  'Montgomery (Alabama State University, Huntingdon College, Auburn University at Montgomery, H. Councill Trenholm State Technical College, Faulkner University)',
  'Troy (Troy University)[9][10]',
  'Tuscaloosa (University of Alabama, Stillman College, Shelton State)[11][12]',
  'Tuskegee (Tuskegee University)[13]'],
 'Alaska': ['Ancho