# Agenda, day 3: Dictionaries and files

1. Dictionaries
    - What are dictionaries?
    - How can we define them and retrieve from them?
    - Paradigm 1: Read-only dictionaries
    - Paradigm 2: Updating dictionaries
    - Paradigm 3: Start with an empty dict, and modify it from there
    - How do dicts work behind the scenes?
    - Iterating over a dict with a `for` loop
3. Files
    - What does it mean to work with files?
    - Reading from (text) files -- good and bad techniques
    - Working with text from files
    - Writing to text files
    - Using the `with` statement

# Loops

The basic idea behind loops is that we have a task that we want to do repeatedly. Instead of typing the command many times, we type the command one time, inside of a loop, and the computer repeats things for us.

There are two types of loops in Python, which reflect two different types of repetition we might want to do:

1. `for` loops -- which go through a sequence (string, list, or tuple) one element at a time, letting us do something with each element.
2. `while` loops -- which repeat the body of the loop until a condition (a la `if`) returns a `False` value.

`for` loops are great for:
- Doing the same thing with each element of a sequence
- Going through a range of numbers -- using `range`
- Going through a list of filenames in a directory
- Going through a list of records that you have retrieved from a database
- Testing all of the IP addresses on your network, to make sure they're connected

`while` loops are great for:
- You know what you want to do, but don't know how long you'll have to do it -- you can identify the condition when it should end, though
- We want to get input from the user repeatedly, and don't know when they'll stop
- We want to get a command input from the user, and don't know how many commands they'll give us

# Lists as accumulators

We can use a list to accumulate information over the life of a program. We do this by defining an empty list. Whenever we want to put a new value onto the list, we just run `list.append`.


In [1]:
evens = []
odds = []

number = 10    

# if number is even, then we'll add it to evens
if number % 2 == 0:
    evens.append(number)
# otherwise, we'll add it to odds
else:
    odds.append(number)

print(evens)
print(odds)

[10]
[]


In [2]:
number = 13 

# if number is even, then we'll add it to evens
if number % 2 == 0:
    evens.append(number)
# otherwise, we'll add it to odds
else:
    odds.append(number)

print(evens)
print(odds)

[10]
[13]


In [3]:
# wouldn't it be better to just iterate over a list of numbers?
evens = []
odds = []

all_numbers = [10, 15, 20, 35, 17, 22]

for number in all_numbers:
    # if number is even, then we'll add it to evens
    if number % 2 == 0:
        evens.append(number)
    # otherwise, we'll add it to odds
    else:
        odds.append(number)

print(evens)
print(odds)

[10, 20, 22]
[15, 35, 17]


In [4]:
# what if I got inputs as a string?

evens = []
odds = []

s = input('Enter numbers: ').strip()

all_numbers = s.split()    # split always returns a list of strings

for number in all_numbers:
    number = int(number)   # get an integer based on the current number, which is a string

    # if number is even, then we'll add it to evens
    if number % 2 == 0:
        evens.append(number)
    # otherwise, we'll add it to odds
    else:
        odds.append(number)

print(evens)
print(odds)

Enter numbers:  10 15 20 35 22 17


[10, 20, 22]
[15, 35, 17]


In [5]:
# vowels, digits, and others

vowels = []
digits = []
others = []

s = input('Enter string: ').strip()

for one_character in s:
    if one_character.isdigit():
        digits.append(one_character)

    elif one_character in 'aeiou':
        vowels.append(one_character)

    else:
        others.append(one_character)

print(vowels)
print(digits)
print(others)

Enter string:  hello! 123


['e', 'o']
['1', '2', '3']
['h', 'l', 'l', '!', ' ']


In [6]:
# str.strip is a method that returns a new string, without any whitespace (space, \n, \t, \r, \v)
# on the outside of the string

s = '     a    b    c    '

s.strip()  # this returns a new string -- it doesn't modify s!

'a    b    c'

# Dictionaries

Dictionaries (aka "dicts" in the Python world) are the most powerful, most useful data structure in the language. They are not unique to Python! However, in other languages, we call them other things:

- hash tables
- hashes
- name-value pairs
- key-value pairs
- hashmaps
- maps
- associative arrays

The basic idea of a dictionary is that we have names and values, or keys and values.  

When you think about a Python list, you think about the elements, the values of the list. The index that you use to retrieve those values is an annoyance, isn't set by you, and has nothing to do with the problem you're trying to solve.

You can think of dictionaries (kind of) as lists in which you determine both the index and the value.  The index is known as the "key." Via the key, you can retrieve the value.

This means that you no longer need to just use 0, 1, 2, 3, etc. to retrieve values from your dict. Rather, you need to know what keys are in there.

Keys:
- Can be any immutable type (i.e., basically numbers and strings)
- Must be unique -- if you reuse them, then you will lose data, too.

In [7]:
# let's create a dict!
# - use {} to define it
# - we define it with key-value pairs
# - each key is whatever you wish to set it to (assuming that it's immutable)
# - each value can be literally anything at all in Python
# - we separate the key and value with :
# - we separate pairs with commas

d = {'a':10, 'b':20, 'c':30}

In [8]:
len(d)  # how many key-value pairs are there in d?

3

In [9]:
# how can I retrieve from the dict?
d['a']

10

In [10]:
d['b']

20

In [11]:
d['c']

30

In [12]:
d['hello']

KeyError: 'hello'

In [13]:
# we can avoid key errors if we first check whether the key is in the dict
# we use "in" for that

if 'hello' in d:      # "in" only looks at the keys -- not the values!
    print(d['hello'])
else:
    print(f'No such key "hello" in here')

No such key "hello" in here


# Exercise: Restaurant

0. Define `total` to be 0
1. Define a dict, called `menu`, whose keys are item names on a menu, and whose values are the prices of those items.
2. Have the user ask, again and again, for something on the menu.
    - If we got an empty string, break out of the loop
    - If we got a string that *is* a key, then retrieve the value, print the price, and add to the total.
    - If we got a request that is *not* a key, then scold the user
3. After the user enters an empty string, we can exit and print the total.

Example:

    Order: apple
    apple is 3, total is 3
    Order: banana
    banana is 2, total is 5
    Order: elephant
    we're out of elephant today!
    Order: [ENTER]
    Total is 5

In [14]:
total = 0

menu = { 'sandwich':10,  'tea':5,  'apple':3, 'cake':7    }

In [15]:
len(menu)

4

In [17]:
menu['sandwich']   # get the value in the dict "menu" for the key "sandwich"

10

In [18]:
order = 'sandwich'

menu[order]

10

In [19]:
order = 'ostrich'

menu[order]   # what is the price of ostrich?

KeyError: 'ostrich'

In [20]:
if order in menu:    # is the thing the person wants also a key in "menu"?
    print(menu[order])
else:
    print(f'{order} is not on the menu')

ostrich is not on the menu


In [23]:
total = 0
menu = { 'sandwich':10,  'tea':5,  'apple':3, 'cake':7    }

while True:
    order = input('Order: ').strip()

    if order == '':
        break
    
    if order in menu:    # is the thing the person wants also a key in "menu"?
        price = menu[order]
        total += price
        print(f'{order} costs {price}; total is now {total}')
    else:
        print(f'{order} is not on the menu')

print(f'Total is {total}')

Order:  sandwich


sandwich costs 10; total is now 10


Order:  tea


tea costs 5; total is now 15


Order:  cake


cake costs 7; total is now 22


Order:  banana


banana is not on the menu


Order:  


Total is 22


# Dictionaries are mutable!

- If you want to add key-value pairs to a dict, you can.
- If you want to modify the value associated with a key, you can.

In [24]:
d = {'a':10, 'b':20, 'c':30}

d['a'] = 2345    # replace the existing value asociated with 'a' with 2345
d

{'a': 2345, 'b': 20, 'c': 30}

In [25]:
# when we assign to a key that already exists, the key remains, and the value is changed/updated

In [26]:
# how can I add new key-value pairs to my dict?
# we just assign, same as we did to update values -- make sure to use a key that doesn't already exist in there

d['x'] = 12345
d

{'a': 2345, 'b': 20, 'c': 30, 'x': 12345}

In [27]:
d['y'] = 'goodbye!'
d

{'a': 2345, 'b': 20, 'c': 30, 'x': 12345, 'y': 'goodbye!'}

In [28]:
menu

{'sandwich': 10, 'tea': 5, 'apple': 3, 'cake': 7}

In [29]:
menu['coffee'] = 7
menu

{'sandwich': 10, 'tea': 5, 'apple': 3, 'cake': 7, 'coffee': 7}

In [30]:
menu['coffee'] = 8   # now I have increased the price
menu

{'sandwich': 10, 'tea': 5, 'apple': 3, 'cake': 7, 'coffee': 8}

In [32]:
# what about removing things from a dict?
# we use the "pop" method -- give it a key, and it'll remove the key and its value - -and will return the value

menu.pop('coffee')

8

In [33]:
menu

{'sandwich': 10, 'tea': 5, 'apple': 3, 'cake': 7}

In [34]:
del(menu['tea'])  # you can remove a key-value pair this way, too

In [35]:
menu

{'sandwich': 10, 'apple': 3, 'cake': 7}

In [36]:
# the dict.clear method removes all key-value pairs from a dict
menu.clear()
menu

{}

In [37]:
del(menu)  # now the variable is gone, losing us the entire dict

In [38]:
menu

NameError: name 'menu' is not defined

# Next up:

1. Paradigm 2 -- updating a dict, but not modifying the keys
2. Looping on dicts
3. Paradigm 3 -- starting with an empty dict

# Examples of where to use dicts

1. Month names (keys) to month numbers (values)
2. Month numbers (keys) to month names (values)
3. User IDs and real names
4. User IDs and dicts (yes, a dict can be a value) with first name, last name, etc.

You will, at some point, see opportunities to use dicts all over. 

# Paradigm 2

In this paradigm for dict use, you create it at the start of the program, and you give each of the values default/starting values - often 0 or the empty list.

Over the course of the program, you add to that default. You won't ever add or remove keys, but you will definitely update the values.

Could you use several variables instead of one dict? Yes! But having them all in a dict gives you semantic power, and also helps you to keep track of them.

# Exercise: Digits, vowels, and others (dict edition)

1. Define a dict with three keys -- `digits`, `vowels`, and `others` -- all of which have 0 for the value.
2. Ask the user to enter a string.
3. Go through the string, one character at a time:
    - If it's a digit, increase the count for `digits`
    - If it's a vowel, increase the count for `vowels`
    - If neither is the case, then increase the count for `others`
4. Print the dict.

Example:

    Enter a string: hello! 123
    {'digits':3, 'vowels':2, 'others':5}

In [39]:
counts = {'digits':0, 
          'vowels':0, 
          'others':0}

s = input('Enter a string: ').strip()

for one_character in s:
    if one_character.isdigit():
        counts['digits'] += 1
    elif one_character in 'aeiou':
        counts['vowels'] += 1
    else:
        counts['others'] += 1

print(counts)

Enter a string:  hello! 123


{'digits': 3, 'vowels': 2, 'others': 5}


# Loops and dictionaries

Can we loop over a dict? Let's try!


In [40]:
d = {'a':100, 'b':200, 'c':300}

# if you loop over a dict, you get the keys
for one_item in d:
    print(one_item)

a
b
c


In [41]:
# if I want to loop over a dict and print all of the keys and values,
# one way to do it is as follows:

for one_key in d:
    print(f'{one_key}: {d[one_key]}')

a: 100
b: 200
c: 300


In [42]:
# there are some dict methods you might have discovered...
d.keys()

dict_keys(['a', 'b', 'c'])

In [43]:
# could I instead say:

for one_key in d.keys():   # NEVER EVER EVER EVER EVER EVER EVER DO THIS! PLEASE!
    print(f'{one_key}: {d[one_key]}')    

a: 100
b: 200
c: 300


Running a `for` loop on a dict object gives you the keys.

Running a `for` loop on `d.keys()` first requires that Python find the `keys` method. Then the method needs to run. Then the method needs to return its data structure. Then we can finally iterate over the keys... the same keys we get from the object itself.

Using `d.keys()` for iteration has no added value.

In [44]:
# there is also a d.values()

d.values()

dict_values([100, 200, 300])

In [45]:
d.items()

dict_items([('a', 100), ('b', 200), ('c', 300)])

In [46]:
# there's a better way to iterate over a dict
# the dict.items method returns what's basically a list of tuples
# each tuple contains two elements, a key and a value

for t in d.items():
    key, value = t   # unpacking
    print(f'{key}: {value}')

a: 100
b: 200
c: 300


In [47]:
# remember from yesterday.. unpacking!

# my favorite way to iterate over a dict!

for key, value in d.items():
    print(f'{key}: {value}')

a: 100
b: 200
c: 300


# Paradigm 3

- Start with an empty dict
- Sometimes, you'll need to add a new key
- Sometimes, you'll need to add/update a value

This is appropriate when we don't know what names (keys) we'll get, and we don't know what values we'll get, but we know what we want to do with them.



In [48]:
# Example: Character counter

counts = {}

s = 'hello! 123'

for one_character in s:
    counts[one_character] += 1    # my ideal!

KeyError: 'h'

In [49]:
# Example: Character counter

counts = {}

s = 'hello! 123'

for one_character in s:
    if one_character in counts:
        counts[one_character] += 1    # add 1 to the value of anything that already exists
    else:
        counts[one_character] = 1     # assign a new key-value pair to the dict

counts

{'h': 1, 'e': 1, 'l': 2, 'o': 1, '!': 1, ' ': 1, '1': 1, '2': 1, '3': 1}

# Exercise: Rainfall

1. Define an empty dictionary, called `rainfall`. Eventually, the keys will be strings (names of cities) and the values will be integers (mm of rain).
2. Ask the user to enter the name of a city.
    - If they give us an empty city name, stop asking
3. If they gave us a city name, then ask (in a separate question) for the number of mm that fell there yesterday. We can assign this to `mm_rain`.
4. Check to see if we previously knew about this city:
    - If it's new, then assign the city name and `mm_rain` to the `rainfall` dict
    - If it's not new, then just update the value, adding `mm_rain`
5. Print out the `rainfall` dict, with one key-value pair on each line.

Example:

    City: Tel Aviv
    Rain: 2
    City: Jerusalem
    Rain: 3
    City: Tel Aviv
    Rain: 1
    City: [ENTER]
    Tel Aviv: 3
    Jeusalem: 3
    

In [54]:
rainfall = {}

while True:    
    city_name = input('City: ').strip()

    if city_name == '':
        break

    mm_rain = input('Rain: ').strip()
    mm_rain = int(mm_rain)   # we are not checking whether mm_rain contains only digits

    if city_name in rainfall:   # if city_name is already a key...
        rainfall[city_name] += mm_rain  # existing cities -- add to their values
    else:
        rainfall[city_name] = mm_rain   # new cities -- set the value

for key, value in rainfall.items():
    print(f'{key}: {value}')
    

City:  a
Rain:  5
City:  b
Rain:  4
City:  a
Rain:  3
City:  


a: 8
b: 4


In [52]:
print(rainfall)

{'Tel Aviv': '53', 'Jerusalem': '4'}


# Next up

1. How do dictionaries work?
2. A little more dict practice
3. Files -- reading and writing 

# How do dicts work?

Let's start off by considering (to some degree) how lists work:

- We start with an empty list
- We add an element, and it gets index 0 (automatically)
- We add another element, and it gets index 1 (automatically)
- When we add the nth element, it gets index n-1 automatically

There isn't any way to predict where a value will be stored in a list. This means that if someone asks whether a value is in the list, we'll need to look (potentially) through the whole list to find it.

This means that searching for a value in a list will take longer if the list is longer.  

In CS theory circles, we can say that the time it takes to search a list is O(n) -- meaning, it grows with the length of the list.

Dictionaries are completely different. When we store data in a dict, we are determining the key, and we are determining the value. Here's the thing: The key helps to determine where the key-value pair is stored in memory.

When we want to store key `'a'` and value `5`, Python needs to decide where it will be stored. In the case of a list, it just took the next possible location. In the case of a dictionary, Python runs a function on the key. The result of that function tells us where to store the key-value pair.

This is known as a "hash function," and it returns the same value for the same argument, again and again. But the value for `'a'` and the value for `'b'` seem to be completely disconnected. A good hash function doesn't let you predict what value you'll get for a given input.  

When I want to store

    d['a'] = 1

Python run:

    location = hash('a')
    store_at[location]('a', 1)

This is one reason why dict keys have to be unique -- if two keys have the same hash function result, then you could end up in such a situation, where two key-value pairs want to be stored in the same place.

Let's say I want to find out whether `'a'` is in the dict. I can just run the hash function on `'a'`, get a number and jump to there -- if I see it, then yes, 'a' is in the dict. If not, I can just give up.

Lookup in a dict is thus super super fast -- we call it O(1), "constant time." It doesn't matter (mostly) how big a dict gets, beacuse the search and lookup times will be so fast.

We can see:
- This is why keys need to be unique
- This is why keys need to be immutable - huh? If we could change a key, then the result from running `hash` on it would change, too. That would mean losing data inside of our dict.

# Exercise: Travel

We are going to keep track of where you (or someone else) has traveled by city and country:

- We'll build a dict whose keys are country names (strings) and whose values are lists of city names in that country (strings)

1. Define `all_places` to be an empty dict.
2. Ask the user (repeatedly) to enter the name of a city and country they have visited. The city and country should be separated by a comma.  (e.g., London, England; Paris, France; Boston, USA.)
3. Break the city and country apart (using split)
4. If we have seen this country before, then append the city to its list for the country
5. If we have not seen this country before, then add the country to the dict, plus the city (as a value) as an element in a list.
6. When the user enters an empty string in response to the prompt, Print all countries, and then all cities in them.

In [57]:
all_places = {}

while True:
    print(all_places)
    s = input('Enter city, country: ').strip()

    if s == '':
        break

    # str.split *always* returns a list of strings
    city, country = s.split(',')  # unpacking
    city = city.strip()
    country = country.strip()

    if country in all_places:
        all_places[country].append(city)
    else:
        all_places[country] = [city]

print(all_places)

{}


Enter city, country:  Boston, USA


{'USA': ['Boston']}


Enter city, country:  Chicago, USA


{'USA': ['Boston', 'Chicago']}


Enter city, country:  Taipei, Taiwan


{'USA': ['Boston', 'Chicago'], 'Taiwan': ['Taipei']}


Enter city, country:  Tainan, Taiwan


{'USA': ['Boston', 'Chicago'], 'Taiwan': ['Taipei', 'Tainan']}


Enter city, country:  Salt lake City, USA


{'USA': ['Boston', 'Chicago', 'Salt lake City'], 'Taiwan': ['Taipei', 'Tainan']}


Enter city, country:  Prague, Czechia


{'USA': ['Boston', 'Chicago', 'Salt lake City'], 'Taiwan': ['Taipei', 'Tainan'], 'Czechia': ['Prague']}


Enter city, country:  


{'USA': ['Boston', 'Chicago', 'Salt lake City'], 'Taiwan': ['Taipei', 'Tainan'], 'Czechia': ['Prague']}


In [59]:
for key, value in all_places.items():
    print(key)
    for one_city in value:
        print(f'\t{one_city}')

USA
	Boston
	Chicago
	Salt lake City
Taiwan
	Taipei
	Tainan
Czechia
	Prague


# Files

A file is a bunch of bytes that were written by a program, and can be read by a program, to avoid having to enter that data by hand each time we turn on the computer.

We often think of files along with the applications they're used for -- Excel, PowerPoint, PDF, etc. All of these are structured collections of bytes on disk, structured because the program knows what to expect in them.

Those kinds of files are very complex to work with, and it's good that other people are willing to deal with that. Most of the time, I work with plain-text files, which are far easier to work with. (They're also unstructured, which can be an issue.)

We'll talk about working with text files.

- How can we read from text files on our computer?
- How can we create/write to text files on our computer?

If we want to work with a file, we'll need to use the OS. We'll typically get an "agent" program that can work on our behalf with the OS. 

- We open the file (i.e., we tell the OS we want to read from the file)
- We ask the agent (the file object we got back) that we want to read from the file
- We get the contents of the file
- We close the file, thus returning resources to what they were.

In [60]:
# open the file /etc/passwd
# get a file object back

f = open('/etc/passwd', 'r')   # 2nd argument here is optional -- 'r' just means "read" from the file

# I could read the file's contents this way:
f.read() 

'##\n# User Database\n# \n# Note that this file is consulted directly only when the system is running\n# in single-user mode.  At other times this information is provided by\n# Open Directory.\n#\n# See the opendirectoryd(8) man page for additional information about\n# Open Directory.\n##\nnobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false\nroot:*:0:0:System Administrator:/var/root:/bin/sh\ndaemon:*:1:1:System Services:/var/root:/usr/bin/false\n_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico\n_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false\n_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false\n_installassistant:*:25:25:Install Assistant:/var/empty:/usr/bin/false\n_lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false\n_postfix:*:27:27:Postfix Mail Server:/var/spool/postfix:/usr/bin/false\n_scsd:*:31:31:Service Configuration Service:/var/empty:/usr/bin/false\n_ces:*:32:32:Certificate Enrollment Service:/var/empty:/usr/bin/fa