# Dictionaries

Like lists and tuples, a *dictionary* is a *collection* of data.

Unlike strings, lists and tuples it's not a *sequence* of data.

Instead of sequential access, dictionaries are accessed by a *key*. Search for a key, obtain a value. They are designed to be very efficient.

Dictionaries are also known as *maps*, *associative arrays* or *hash tables*.

- **Dictionary**: the *key* is like the word we use to look up a definition (the *value*).
- **Map**: the *key* maps to a *value*.
- **Associative array**: the *key* is associated with a *value*.
- **Hash table**: *hashing* is the technique by which key lookups are made fast.

**Crucial point**: any Python type can be a key, so long as it is *immutable*. Why? If the key were mutable, once changed, it would no longer "open" the right value!

**Q**: Which types can Python use as keys?

a) strings and integers  
b) lists  
c) tuples  
d) a and c  

Now we might see a reason Python makes strings immutable!

Let's look at an example: a phone directory.

- Bill: 212-111-2233
- Mary: 212-555-6677
- Sue: 212-555-4444

First we try this as a list:

In [None]:
contact_list = ["Bill: 212-111-2233",
                "Mary: 212-555-6677",
                "Sue: 212-555-4444"]
for contact in contact_list:
    if contact.startswith("Sue"):
        print(contact)

We could also try this way:

In [None]:
contact_list = ["Bill", "212-111-2233",
                "Mary", "212-555-6677",
                "Sue", "212-555-4444"]
for name_idx in range(0, len(contact_list), 2):
    if contact_list[name_idx] == "Sue":
        print(contact_list[name_idx + 1])

Look how much easier this is with a dictionary:

In [None]:
contacts = {"Bill": "212-111-2233",
            "Mary": "212-555-6677",
            "Sue": "212-555-4444"}
print(contacts["Bill"])

### Creating dictionaries

Let's make a dictionary of dictionaries with a real use: looking up zip codes for a city:

In [None]:
states = {
    "AK": {"Anchorage": [89783, 89728, 78676, 85463],
           "Juneau": [39783, 39728, 38676],},
    "AL": {"Birmingham": [49783, 49728, 48676],
           "Mobile": [44783, 44728, 44676]},
    "AR": {"Little Rock": [18783, 18728, 18676, 18183],
           "Hot Springs": [99783, 99728, 98676]},
}

state = input("Enter a state: ")
city = input("Enter a city: ")
print("Zip codes for {}, {} are: {}".format(city, state,
                                            states[state][city]))

What if we had a zip code and needed the neighborhood, city and state?

In [None]:
zips = {
    # alternate structure:
    # "11231-2314": ["NY", "Brooklyn", "Sunset Park"]
    "11231-2314": {"state": "NY", 
                   "city": "Brooklyn",
                   "nhood": "Sunset Park",
                   "pop.": 125000},
    "11232-1244": {"state": "NY", 
                   "city": "Brooklyn",
                   "nhood": "Carroll Gardens",
                   "notes": "Good Italian food",},
    "11201-1213": {"state": "NY", 
                   "city": "Brooklyn",
                   "nhood": "Downtown",
                   "notes": "Lots of offices"},
}

zip = input("Please enter a 5+4 zip code: ")
try:
    print("That zip code is for {} in {}, {}".format(zips[zip]["nhood"],
                                                     zips[zip]["city"],
                                                     zips[zip]["state"]))
except KeyError as kerr:
    print("Zip code not found:", kerr)
print("Got here!")

**Q**: `KeyError` is ______.

a) an error message  
b) an exception  
c) a container  
d) a return value  

If we don't want a `KeyError` we can have a default by using the `get()` method:

In [None]:
place = zips.get("11201-1214", "Zip code not found!")
print(place)

### Dictionary Indexing and Assignment

We use index operator `[ ]` to access elements, but we do **not**
use index numbers. The keys are used to access the values.

In [None]:
contacts = {"Bill" : "212-111-2233", 
            "Mary" : "212-555-6677",
            "Sue" : "212-555-4444" }

# Accessing one contact, access through the *key*:
contacts['Mary']

Modifying a value using a key:

In [None]:
contacts["Bill"] = "401-846-4318"
print(contacts)

### Add a key-value pair via an assignment

If the key does not exist, it is created and then
associated with the specified value, otherwise the
key's value is updated (see above).

In [None]:
# with a list, we can't add an item with lst[idx] = '865-5523'
contacts['Deanna'] = '865-5523'
print(contacts)

### Dictionary Operations

In [None]:
nyu_by_id = {}
with open("nyu_by_id.txt", "r") as nyu_file:
    for line in nyu_file:
        (net_id, name) = line.split(',')
        nyu_by_id[net_id] = name
        # print(id, name)
net_id = input("Enter NetID to look up: ")
print(nyu_by_id[net_id])

len(nyu_by_id)
net_id = ["e", "r"]
nyu_by_id[net_id] = "Excelsius Regnum"

### Sidebar: Hashing

What does Python mean by "unhashable"? What is "hashing"? Hashing is how Python objects are turned into indexes to get a value from a *hash table*:

Let's write a *very* simple hash function:

In [None]:
TABLE_SIZE = 100
hash_table = [None]*TABLE_SIZE
# print(hash_table)

def hash_it(s):
    hash_val = 0
    for i, ch in enumerate(s):
        hash_val += ord(ch) * 10**i
    return hash_val % TABLE_SIZE

# print(hash_it("nahallaC"))
print(hash_it("Mitchell"))
print(hash_it("Callahan"))
print(hash_it("nahallaC"))

hash_table[hash_it("Mitchell")] = "Student"
hash_table[hash_it("Callahan")] = "Professor"
hash_table[hash_it("nahallaC")] = "Anti-Professor"

print(hash_it("Mitchell"), ":", hash_table[hash_it("Mitchell")])
print(hash_it("Callahan"), ":", hash_table[hash_it("Callahan")])


### Back to dictionary methods

`in` returns `True` if a **key** is in the dictionary, otherwise `False`:

In [None]:
contacts = {"Bill" : "212-111-2233", 
            "Mary" : "212-555-6677",
            "Sue" : "212-555-4444" }

fname = "Bill"
print(fname in contacts)
print("212-555-4444" in contacts)
print("Phone #:", contacts[fname])

In [None]:
result = 'smith' in my_dict
print(result)

Another way to protect against `KeyError`:

In [None]:
# an alternative to `zips.get("11201-1214", "Zip not found!")`

zip = "82673-7666"
if zip in zips:
    zips[zip]
else:
    print(zip, "not found!")

We can iterate through keys in the dictionary:

In [None]:
nyu_schools = {
    "Tandon": ["CSE", "ChemEng", "Physics", "CivilEng"],
    "Stern": ["Management", "Finance"],
    "Courant": ["Mathematics", "CS"],
    "Gallatin": ["Independent Study"],
}
for school in nyu_schools:
    print("school =", school, "; departments =",
          nyu_schools[school])

### Dictionary Methods

`items()` iterates through all the key-value pairs as tuples:

In [None]:
for school, depts in nyu_schools.items():
    print("Key: {}, Value: {}".format(school, depts))

We can also get the keys, and turn them into a list, if we need that for some reason:

In [None]:
schools = nyu_schools.keys()
print(type(schools))
schools_list = list(schools)
print("Here are the schools at NYU:", schools_list)

We can also iterate over just the values:

In [None]:
for depts in nyu_schools.values():
    print("Departments: {}".format(depts))

How do we see all of the methods available for a dictionary named `my_dict`?

a) methods(my_dict)  
b) len(my_dict)  
c) dir(my_dict)  
d) repr(my_dict)  

In [None]:
dir(contacts)

Let's try out `fromkeys()`:

In [None]:
from random import randint

apps = ['Calendar', 'Mail', 'Safari', 'Slack',
        'Photo', 'Duo', 'App Store', 'Facetime']
app_dict = dict.fromkeys(apps, 0)
print(app_dict)
for app in app_dict:
    app_dict[app] = randint(0, 100)
app_nm = input("What app do you want to know about? ")
print("This week you used {} {} times".format(app_nm, app_dict[app_nm]))

### Python Views

The dictionary methods `keys()`, `values()`, and `items()` all return a Python *view*, which, although not a *sequence* (we can't index into them by position), can be iterated over, as shown above. If one needs to index into them, one can use `list()` to convert the view to a list.

These are called *views* because they do **not** make a copy of the keys, or values, or items: they are just a window through which we can view them.

In [None]:
my_dict = {"a": 1, "b": 2}
key_view = my_dict.keys()
print(type(key_view))
item_view = my_dict.items()
print(type(item_view))
item_list = list(item_view)
print(item_list)

### Dictionaries with Different Key Types

**Keys**: different types can be used in the same dictionary, as long as they are immutable.

**Values**: can be any object, even another dictionary.

In [None]:
demo = {2: ['a', 'b', 'c'], (2, 4): 27, 'x': {1: 1.25, 'a': 3}}

print(demo)

print(demo[2])
print(demo[(2, 4)])
print(demo['x'])
print(demo['x']['a'])

### Other dictionary methods

How can we add one dictionary to another? `+` doesn't work:

In [1]:
dict1 = {"a": 1, "b": 2}
dict2 = {"c": 3, "d": 4}
dict3 = {"c": 4, "d": 4}
dict4 = {"c": 5, "d": 4}
# dict1 += dict2

So what can we do? We can use the `dict` method `update`:

In [2]:
dict1.update(dict2)
print(dict1)

{'a': 1, 'b': 2, 'c': 3, 'd': 4}


In [None]:
dict1.update(dict2)
dict1.update(dict3)
dict1.update(dict4)
print(dict1)
type(dict1)

**Q:** How many keys with the value "c" will be in `dict1` after the above code runs?

a) 1  
b) 2  
c) 3  
d) 4  


Think about this: if "c" were in the dictionary multiple times, what would we get back if we asked for:

`dict1["c"]`

**Q**: Can multiple keys map to the same value?

In [6]:
name = "Jaden"
test_dict = {"a": name, "b": name, "c": name}
for key in test_dict:
    print("{}: {}".format(test_dict[key], id(test_dict[key])))

Jaden: 140501273718448
Jaden: 140501273718448
Jaden: 140501273718448


**Q**: How do we use a tuple as a key?

In [5]:
my_abcs = ("a", "b", "c")
my_dict = {my_abcs: name}
print(my_dict[("a", "b", "c")])
print(my_dict["a"])

Jaden


KeyError: 'a'

**Q**: When would we use a tuple as a key?

In [7]:
GRID_HEIGHT = 10
GRID_WIDTH = 10

hero1 = "Hercules"
hero2 = "Perseus"
hero3 = "Ariadne"
bomb = "Boom!"
# let's make a *map* of who is at what location:
locations = {(3, 4): hero1, (5, 0): hero2, (1, 6): hero3, (9, 9): bomb}
print("Which hero is at (5, 0)?", locations[(5, 0)])

Which hero is at (5, 0)? Perseus


In [9]:
for x in range(0, GRID_WIDTH):
    for y in range(0, GRID_HEIGHT):
        coord = (x, y)
        name = locations.get(coord, "no one")
        if name == "Boom!":
            print("You hit the bomb!")
        else:
            print("At location {} is {}".format(coord, name))

At location (0, 0) is no one
At location (0, 1) is no one
At location (0, 2) is no one
At location (0, 3) is no one
At location (0, 4) is no one
At location (0, 5) is no one
At location (0, 6) is no one
At location (0, 7) is no one
At location (0, 8) is no one
At location (0, 9) is no one
At location (1, 0) is no one
At location (1, 1) is no one
At location (1, 2) is no one
At location (1, 3) is no one
At location (1, 4) is no one
At location (1, 5) is no one
At location (1, 6) is Ariadne
At location (1, 7) is no one
At location (1, 8) is no one
At location (1, 9) is no one
At location (2, 0) is no one
At location (2, 1) is no one
At location (2, 2) is no one
At location (2, 3) is no one
At location (2, 4) is no one
At location (2, 5) is no one
At location (2, 6) is no one
At location (2, 7) is no one
At location (2, 8) is no one
At location (2, 9) is no one
At location (3, 0) is no one
At location (3, 1) is no one
At location (3, 2) is no one
At location (3, 3) is no one
At location (

**Q:** What happens using `is` with numbers?

In [None]:
x = 7
y = 7
print("x is y?", x is y)
a = 1000
b = 1000
print("a is b?", a is b)
print("a == b?", a == b)

### A Concordance Program

A *concordance* is a list of words from a text along with how many times the words appear in that text.

Note: what is `string.punctuation`?

In [None]:
import string
string.punctuation

In [None]:
import os
import string

def get_filename():
    filename = input("Input the filename: ")
    while not os.path.exists(filename):
        print("That file does not exist.")
        filename = input ("Input the filename: ")
    return filename


def chunk_to_word(chunk):
    """Clean up chunks to turn them into words."""
    chunk = chunk.lower()
    chunk = chunk.strip()
    word = chunk.strip(string.punctuation)
    return word


def add_to_word_count(word, concordance):
    """If word is in concordance, up its count.
    Otherwise, add the word and set its count to 1."""
    if word in concordance:  # we already saw this word
        concordance[word] += 1
        return 0
    else:  # word we haven't seen before
        concordance[word] = 1
        return 1  # return 1 cause new word

        
def build_concordance(name):
    concordance = {}
    uniq_words = 0
    total_words = 0
    
    file = open(name, 'r')
    for line in file:
        text_chunks = line.split()  # split line on spaces
        # we will get chunks like 'final!' or 'come,'
        # or 'Final' `chunk_to_word()` will fix these.
        for chunk in text_chunks:
            word = chunk_to_word(chunk)
            uniq_words += add_to_word_count(word, concordance)
            total_words += 1

    print("{} unique words out of {} words total.".format(uniq_words,
                                                          total_words))
    return concordance
        

def get_max_word(word_list):
    max = 0
    for word in word_list:
        if len(word) > max:
            max = len(word)
    return max


def sort_words(words):
    """
    We're going to use `keys()` to get a list and then sort it.
    Return the sorted list.
    """
    word_list = list(words.keys())
    word_list.sort()
    return word_list


def print_pairs(concordance, key_list, length):
    format_str = "{:" + str(length) + "s}: {}"
    for key in key_list:
        print(format_str.format(key, "*"*concordance[key]))

    
def main():
    filename = get_filename()
    concordance = build_concordance(filename)
    words = sort_words(concordance)
    print_pairs(concordance, words, get_max_word(words) + 1)

main()

### An OS database

Let's say we want to find out if there is a difference in OS usage between men and women. This program explores that question:

In [None]:
BROWSER_COL = 11


def open_file():
    file_ok = False
    while not file_ok:
        filename = input("Input the filename: ")
        try:
            file_handle = open(filename, "r")
        except FileNotFoundError:
            print("That file does not exist.")
        else:
            file_ok = True
    return file_handle


def build_dictionary(file):
    gender_data = { }
    first_line = True
    for lines in file:
        if not first_line:
            lines = lines.strip()
            columns = lines.split(",")
            data_list = list(columns[1:])
            gender_data[columns[0]] = data_list
        else:
            first_line = False
            
    return gender_data


def get_os(browser):
    if browser.find("Windows") != -1:
        os = "Windows"
    elif browser.find("Macintosh") != -1:
        os = "Macintosh"
    elif browser.find("Linux") != -1:
        os = "Linux"
    else:
        os = "Other"
    return os


def os_list_by_gender(dictionary):
    data_dict = {'female': {}, 'male': {}}
    for value in dictionary.values():
        gender = value[0].strip()
        this_os = get_os(value[BROWSER_COL].strip())
        
        if this_os in data_dict[gender]:
            data_dict[gender][this_os] += 1
        else:
            data_dict[gender][this_os] = 1
    return data_dict


def print_report(data):
    for gender in data:
        print(gender)
        for op_sys in data[gender]:
            print("{:>12s}: {}".format(op_sys, data[gender][op_sys]))


def main():
    file = open_file()
    dictionary = build_dictionary(file)    
    os_list = os_list_by_gender(dictionary)
    print_report(os_list)
    

main()

In [None]:
!cat names.csv