# Week 3: Dictionaries and files

In [1]:
print('Hello!')

Hello!


In [2]:
# to create executable Python programs, you can use Pyinstaller
# go to https://pyinstaller.org/ for more details

In [3]:
# we talked a bit at the end about tuples
# tuples are basically immutable lists

t = (10, 20, 30, 40, 50)
type(t)

tuple

In [5]:
# no parentheses needed for tuples!  The comma is the most important thing
t = 10, 20, 30, 40, 50
type(t)

tuple

In [6]:
mylist = [10, 20, 30]
x = mylist

x  # it'll be [10, 20, 30], of course

[10, 20, 30]

In [7]:
x,y,z = mylist

In [8]:
x

10

In [9]:
y

20

In [10]:
z

30

In [11]:
# this is known as "tuple unpacking," because the variables on the left
# are in a tuple

# so long as the data on the right is iterable (i.e., can run in a for loop)
# and so long as the number of elements on the right == number of variables on the left
# then we're OK!

x,y,z = 'abc'

x

'a'

In [12]:
y

'b'

In [13]:
z

'c'

In [14]:
person = ('Reuven', 'Lerner', 46)

# this is a tuple describing a person's information (first, last, shoe_size)

In [15]:
first_name, last_name, shoe_size = person

In [16]:
first_name

'Reuven'

In [17]:
last_name

'Lerner'

In [18]:
shoe_size

46

In [20]:
x = 222   # x is an integer, 222
y = 444   # y is an integer, 444

# (1) in assignment, right side before left side
# (2) on the right, we're creating a new tuple from (y,x)
# (3) on the left, we're using unpacking to grab two values and assign them to x and y
# (4) it's a complete coincidence that x,y is on the left and y,x is on the right
# The effect?  We have swapped the variables' values

x,y = y,x

In [21]:
x

444

In [22]:
y

222

# Dictionaries

Dictionaries (aka "dicts" in Python) are not unique to Python.  They have other names in other languages:

- Associative arrays
- Hash tables
- Hash maps
- Hashes
- Key value pairs
- Name value pairs

And some other names, too...

Dicts give us a combination of benefits.

In [24]:
# define a new dictionary using {}  (curly braces)
# a dict always has pairs of data -- key-value pairs, 

# (1) use curly braces
# (2) the pairs are separated by commas
# (3) the keys and values are separated by colons
# (4) you can use any type (more or less) for keys and values
#   values can be ABSOLUTELY ANYTHING AT ALL -- strings, lists, dicts, etc.

d = {'a':1, 'b':2, 'c':3}

In [25]:
# to retrieve from a dict, we use [] and the key
d['a']

1

In [26]:
d['b']

2

In [27]:
d['c']

3

In [29]:
d['x']  # the key 'x' does not exist, so we get an error

KeyError: 'x'

In [30]:
# what if I want to find out if a key exists in a dict?
# I can use the "in" operator
# "in" *ONLY* looks in the keys!  It never looks at the values!

In [31]:
'a' in d

True

In [32]:
'x' in d

False

In [33]:
1 in d

False

In [34]:
# Keys must be unique.  If you try to create a dict with a key 
# that repeats, Python will enforce the uniqueness.

# there is no restriction on the values, and how often they repeat

d = {'a':1, 'b':2, 'a':3, 'b':4}
d

{'a': 3, 'b': 4}

In [35]:
# the keys of a dict can be any IMMUTABLE type -- strings, ints, floats, even tuples
# but *not* lists or dicts

In [36]:
# a list or tuple can contain anything, but its indexes are always going to be
# integers, starting at 0

# a dict's keys can be any ints, or any string, giving us more semantic power
# -- our data structure can feel more human

# Summary of dicts, so far

- We create a dict with `{}`, and key-value pairs inside of the `{}`
- Pairs are separated with commas
- Keys and values are separated by `:`
- Keys must be immutable (strings, ints, floats, tuples)
- Keys are unique -- only once in each dict
- Values can be absolutely anything at all, of any type, and can repeat
- Search for a key in a dict with `in`: `k in d`
- Retrieve from a dict with `[]`
- Every key must have a value, and every value must have a key.

- You can use variables, as well:


In [37]:
d = {'a':1, 'b':2, 'c':3}
k = 'a'
d[k]   # this will retrieve d['a'], because the variable k contains 'a'

1

# Exercise: Restaurant

1. Define a variable, `total`, to be 0.
2. Define a variable, `menu`, to be a dict. The keys in the dict will be the items on the menu, and the values will be the prices.
3. Ask the user, repeatedly, to enter their order.
    - If they enter an empty string, stop looping and print the total
    - If they enter a string, and that item is on the menu, then print the price, the new total, and ask again
    - If they enter a string, and that item is *not* on the menu, then scold them, and ask again
4. Print the final total.

Example:

    Order: sandwich
    sandwich costs 10, total is 10
    Order: tea
    tea costs 5, total is 15
    Order: elephant
    We are fresh out of elephant today!
    Order: [ENTER]
    Total is 15
    
### Hints and reminders
- Use a `while True` loop to loop infinitely
- Check if the user entered an empty string, and then `break` from the loop


In [38]:
total = 0
menu = {'sandwich':10, 'tea':5, 'cake':3, 'apple':2}

while True:
    order = input('Order: ').strip()  # get input, remove surrounding whitespace
    
    if order == '':      # did we get nothing at all? break from the "while" loop
        break
        
    if order in menu:        # is the user's input a key in our "menu" dict?
        price = menu[order]  # retrieve the price of the user's order
        total += price       # add price to total
        print(f'{order} costs {price}; total is now {total}')
    else:
        print(f'Sorry, we are out of {order} today!')
        
print(f'Total is {total}.')

Order: sandwich
sandwich costs 10; total is now 10
Order: tea
tea costs 5; total is now 15
Order: apple
apple costs 2; total is now 17
Order: banana
Sorry, we are out of banana today!
Order: 
Total is 17.


In [39]:
# can we modify dictionaries?
# that is: are they mutable?

# YES WE CAN!

d = {}     # empty dict
d['a'] = 1 # adds the key-value pair 'a':1 to our dict

d

{'a': 1}

In [40]:
d['b'] = 2
d

{'a': 1, 'b': 2}

In [41]:
d['c'] = 3
d

{'a': 1, 'b': 2, 'c': 3}

In [42]:
# there isn't any "append" method for dicts, as we saw for lists
# we just assign a new key-value pair, and it's added to the dict.

# what if we want to revise an existing value?
# same thing -- we just assign!

d

{'a': 1, 'b': 2, 'c': 3}

In [43]:
d['a'] = 999
d

{'a': 999, 'b': 2, 'c': 3}

In [44]:
# keys must be unique in a dict!
# so assigning a new value to an existing key updates the value

In [45]:
# can you remove key-value pairs from a dict? yes!
# Use the "pop" method

d

{'a': 999, 'b': 2, 'c': 3}

In [46]:
d.pop('a')  # removes the key-value pair with 'a' as the key, and returns the value

999

In [48]:
d

{'b': 2, 'c': 3}

In [49]:
d['b']

2

In [50]:
d['B']  # completely different key!

KeyError: 'B'

# Exercise: Vowels, digits, and others

1. Create a dict, `counts`, in which you have three keys-value pairs. The keys will be `vowels`, `digits`, and `others`. The values will all be 0.
2. Ask the user to enter a string.
3. Iterate over the string, one character at a time.  Examine the character:
    - If it's a digit (0-9), then add 1 to the `digits` value
    - If it's a vowel (aeiou), then add 1 to the `vowels` value
    - Otherwise, add 1 to the `others` value
4. At the end, print the resulting dict.

Example:

    Enter a string: hello 123!!!
    {'digits': 3 'vowels': 2 'others': 7}
    
### Hints and reminders    
- The `.isdigit` method for strings returns `True` if all of the characters in the string are only 0 through 9.  So `123`.isdigit() will return `True`.


In [51]:
total = 5
print(f'Your total is ${total}')

Your total is $5


In [52]:
x = '5'  # this is a string
 
# how can I get an integer from it?  I call int()
int(x)

5

In [53]:
x = 123  # this in int

# how can I turn it into a string?
str(x)   # I get a string back, based on x

'123'

In [54]:
# Turns out you can use str() on ANYTHING AT ALL in Python
str(d)

"{'b': 2, 'c': 3}"

In [55]:
counts = {'digits':0, 'vowels':0, 'others':0}

s = input('Enter a string: ').strip()

for one_character in s:  # go through s, one character at a time
    if one_character.isdigit():   # is one_character one of 0, 1, 2...9?
        counts['digits'] += 1
    elif one_character in 'aeiou':
        counts['vowels'] += 1
    else:
        counts['others'] += 1
        
print(counts)        

Enter a string: hello 123!!!
{'digits': 3, 'vowels': 2, 'others': 7}


In [56]:
one_character = 'e'

one_character in 'aeiou'  # search in a string

True

In [57]:
one_character in 'a,e,i,o,u'  # still searching in a string, but a longer string

True

In [None]:
one_character in ['a', 'e', 'i', 'o', 'u']   # searching in a listb

In [58]:
d

{'b': 2, 'c': 3}

In [59]:
print(f'{d['b']}')   # this will not work!

SyntaxError: f-string: unmatched '[' (<ipython-input-59-2eda18716e33>, line 1)

In [60]:
print(f"{d['b']}")

2


# Next up:

1. More methods on dicts
2. Accumulating the unknown
3. Looping over dicts

In [62]:
print(f'{d['b']}')

SyntaxError: f-string expression part cannot include a backslash (<ipython-input-62-5d0c2d64ab05>, line 1)

In [64]:
d = {'a':1, 'b':2, 'c':3}

# when should you use this? almost never
d.keys()  # this returns the keys -- not really a list, but sorta kinda like one

dict_keys(['a', 'b', 'c'])

In [66]:
d.values()  # returns the values a list-like object

dict_values([1, 2, 3])

In [67]:
d = {'c':3, 'b':2, 'a':1}

In [68]:
d.keys()

dict_keys(['c', 'b', 'a'])

In [69]:
d.values()

dict_values([3, 2, 1])

In [70]:
str(d)  # returns a string based on the dict d

"{'c': 3, 'b': 2, 'a': 1}"

In [72]:
# what if I want to create a dict?  I can call dict()
# I call it on a list of lists or a list of tuples

stuff = [('a', 1), ('b', 2), ('c', 3)]
d = dict(stuff)
d

{'a': 1, 'b': 2, 'c': 3}

Many times, we don't know what the user will enter for keys and/or values, but we know what we want to do with them.

With vowels/digits/others, we knew that the user's inputs would fit into existing categories.  But sometimes, we need to handle more ambiguous data.

In [73]:
# Example: Letter counter

counts = {}   # empty dict

s = input('Enter a string: ').strip()

for one_character in s:
    counts[one_character] += 1  # this won't work!
    
print(counts)    

Enter a string: hello


KeyError: 'h'

In [74]:
# Example: Letter counter

counts = {}   # empty dict

s = input('Enter a string: ').strip()

for one_character in s:
    if one_character in counts:
        counts[one_character] += 1     # we've seen one_character before
    else:
        counts[one_character] = 1      # first time encountering one_character
    
print(counts)    

Enter a string: hello
{'h': 1, 'e': 1, 'l': 2, 'o': 1}


In [75]:
d = {'a':1, 'b':2, 'c':3}

while True:
    k = input('Enter a key: ').strip()
    
    if k == '':
        break
        
    elif k in d:
        print(f'd[{k}] is {d[k]}')
    else:
        print(f'{k} is not a key in {d}')

Enter a key: a
d[a] is 1
Enter a key: c
d[c] is 3
Enter a key: x
x is not a key in {'a': 1, 'b': 2, 'c': 3}
Enter a key: 


In [76]:
# shortcut is : the dict.get method

# If you call .get on a dict, passing an argument,
# it tries to use [] on that argument
# if the argument exists as a key, you get the value back
# if it doesn't, you get None back

d.get('a')   # just like d['a']

1

In [77]:
d.get('b')   # just like d['b']

2

In [78]:
d.get('x')    # 'x' is not a key in d... so we get back None

In [79]:
# the get method is a more forgiving way of retrieving values from dicts
# if the key doesn't exist, the program won't blow up!

# get is even better - you can pass a second argument!
# if the key doesn't exist, we get back that argument

d.get('x', 12345)   # ask for d['x'], but if 'x' is not in d as a key, return 12345

12345

In [80]:
# rewrite with dict.get
d = {'a':1, 'b':2, 'c':3}

while True:
    k = input('Enter a key: ').strip()
    
    if k == '':
        break
        
    print(f'd[{k}] is {d.get(k, f"No such key {k}")}')


Enter a key: a
d[a] is 1
Enter a key: x
d[x] is No such key x
Enter a key: 


In [81]:
# Example: Letter counter

counts = {}   # empty dict

s = input('Enter a string: ').strip()

for one_character in s:
    counts[one_character] = counts.get(one_character, 0) + 1
    
print(counts)    

Enter a string: hello
{'h': 1, 'e': 1, 'l': 2, 'o': 1}


In [82]:
# the below code means:

# (1) is 'a' a key in d?
# (2) if so, then return d['a']
# (3) if not, then return 0

d.get('a', 0)    

1

In [83]:
# (1) is 'a' a key in d?
# (2) If so, return d['a']
# (3) if not, return [10, 20, 30]

d.get('a', [10, 20, 30])

1

In [84]:
d

{'a': 1, 'b': 2, 'c': 3}

In [85]:
# 2nd argument to get means: give me this value if the key doesn't exist.

d.get('a', 0)  # give me d['a'] if it exists. If it doesn't, give me 0

1

# Exercise: Rainfall

1. Create an empty dict, `rainfall`.
2. Ask the user, again and again, to enter the name of a city.
3. If the user enters an empty string, then stop asking, and print the contents of `rainfall`
4. If we have the name of a city, then ask a 2nd question: How much rain fell there yesterday?  (Answer will be in mm of rain)
5. Add the new city-rain amount to the dictionary, adding to an existing amount, if it's already there.
6. At the end, print the dict.

Example:

    City: Jerusalem
    Rain: 2
    City: Tel Aviv
    Rain: 3
    City: Jerusalem
    Rain: 4
    City: [ENTER]
    {'Jerusalem':6, 'Tel Aviv':3}

In [86]:
rainfall = {}

while True:
    city_name = input('City: ').strip()
    
    if city_name == '':
        break
        
    mm_rain = input('Rain: ').strip()  # this is a string!
    mm_rain = int(mm_rain)    # get an int, assume user input was valid
    
    if city_name in rainfall:
        rainfall[city_name] += mm_rain
    else:
        rainfall[city_name] = mm_rain
        
print(rainfall)        
    
    

City: a
Rain: 5
City: b
Rain: 4
City: a
Rain: 3
City: 
{'a': 8, 'b': 4}


In [None]:
rainfall = {}

while True:
    city_name = input('City: ').strip()
    
#    if city_name == '':

    if not city_name:   # same thing as saying "if city_name == ''"
        break
        
    mm_rain = input('Rain: ').strip()  # this is a string!
    mm_rain = int(mm_rain)    # get an int, assume user input was valid
    
    rainfall[city_name] = rainfall.get(city_name, 0) + mm_rain
        
print(rainfall)    

In [89]:
rainfall = {}

# new operator in Python 3.8, the "walrus" -- assignment expression
# both assigns *and* returns a value

while city_name := input('City: ').strip():
        
    mm_rain = input('Rain: ').strip()  # this is a string!
    mm_rain = int(mm_rain)    # get an int, assume user input was valid
    
    rainfall[city_name] = rainfall.get(city_name, 0) + mm_rain
        
print(rainfall)    

City: a
Rain: 5
City: b
Rain: 3
City: 
{'a': 5, 'b': 3}


In [91]:
# __main__   # pronounced "dunder main"

# Next up
- Looping over dicts
- How do dicts work?
- Start to work with files (if you can download from https://files.lerner.co.il/)


In [88]:
# if I call the "globals()" function, it returns a dict
# with all of the global variables in Python!

# keys are our variable names
# the values are our variable values

globals()

{'__name__': '__main__',
 '__doc__': 'Automatically created module for IPython interactive environment',
 '__package__': None,
 '__loader__': None,
 '__spec__': None,
 '__builtin__': <module 'builtins' (built-in)>,
 '__builtins__': <module 'builtins' (built-in)>,
 '_ih': ['',
  "print('Hello!')",
  '# to create executable Python programs, you can use Pyinstaller\n# go to https://pyinstaller.org/ for more details',
  '# we talked a bit at the end about tuples\n# tuples are basically immutable lists\n\nt = (10, 20, 30, 40, 50)\ntype(t)',
  't = 10, 20, 30, 40, 50\ntype(t)',
  '# no parentheses needed for tuples!  The comma is the most important thing\nt = 10, 20, 30, 40, 50\ntype(t)',
  "mylist = [10, 20, 30]\nx = mylist\n\nx  # it'll be [10, 20, 30], of course",
  'x,y,z = mylist',
  'x',
  'y',
  'z',
  '# this is known as "tuple unpacking," because the variables on the left\n# are in a tuple\n\n# so long as the data on the right is iterable (i.e., can run in a for loop)\n# and so long

In [None]:
print(d)

In [92]:
# we can loop over a string:
for one_item in 'abcd':
    print(one_item)

a
b
c
d


In [93]:
# we can loop over a list:
for one_item in [10, 20, 30]:
    print(one_item)

10
20
30


In [94]:
# what happens if we loop over a dict?
d = {'a':1, 'b':2, 'c':3}

# we get the *keys* from the dict!
for one_item in d:
    print(one_item)

a
b
c


In [95]:
# To print my dict, I can say this:

for one_key in d:
    print(f'{one_key}: {d[one_key]}')

a: 1
b: 2
c: 3


In [97]:
# could I instead do this:

# this will work, but it is slower and has no advantage
for one_key in d.keys():
    print(f'{one_key}: {d[one_key]}')

a: 1
b: 2
c: 3


In [98]:
# there is a method, items(), which returns a list of tuples (keys and values)

for t in d.items():
    print(t)

('a', 1)
('b', 2)
('c', 3)


In [100]:
# getting each key-value tuple from d
# printing each key-value pair
for t in d.items():
    print(f'{t[0]}: {t[1]}')

a: 1
b: 2
c: 3


In [101]:
# I use tuple unpacking to turn each iteration on d.items() into key-value variables
for key,value in d.items():
    print(f'{key}: {value}')

a: 1
b: 2
c: 3


In [103]:
# enumerate returns a tuple (index, value) with each iteration
# we can use unpacking to retrieve and separate them

for index, one_letter in enumerate('abc'):
    print(f'{index}: {one_letter}')

0: a
1: b
2: c


In [104]:
s = 'abcde'
s[::-1]  # returns a new string -- s in reverse

'edcba'

In [105]:
d = {'a':1, 'b':2, 'c':3, 'd':2, 'e':5, 'f':2, 'g':6, 'h':2}

for key, value in d.items():
    print(f'{key}: {value}')

a: 1
b: 2
c: 3
d: 2
e: 5
f: 2
g: 6
h: 2


In [106]:
# I'm going to iterate over the dict
# whenever I see the value 2, I will pop that key from the dict

d = {'a':1, 'b':2, 'c':3, 'd':2, 'e':5, 'f':2, 'g':6, 'h':2}
value_to_remove = 2

for key, value in d.items():
    if value == value_to_remove:
        d.pop(key)
        
print(d)        

RuntimeError: dictionary changed size during iteration

In [107]:
# Try again, using a list of keys to remove, and then iterate over that

d = {'a':1, 'b':2, 'c':3, 'd':2, 'e':5, 'f':2, 'g':6, 'h':2}
value_to_remove = 2
keys_to_remove = []

for key, value in d.items():
    if value == value_to_remove:
        keys_to_remove.append(key)
        
for one_key in keys_to_remove:
    d.pop(one_key)
        
print(d)        

{'a': 1, 'c': 3, 'e': 5, 'g': 6}


In [108]:
d = {'a':0, 'b':1, 'c':0, 'd':1}

# how many keys have 0 values, and how many have 1 values?
zeroes = 0
ones = 0

for key, value in d.items():
    if value == 1:
        ones += 1
    elif value == 0:
        zeroes += 1
    else:
        print(f'Bad value {value}')

In [109]:
zeroes

2

In [110]:
ones

2

# Searching in data structures

If I have a list, and want to know if a value is in the list, I can use `in`. Python needs to go through each element in the list to see if it matches our search target.  Once we find it, Python can stop.

The longer the list, the longer Python potentially needs to search.  This is described as `O(n)` in computer science.  

If I have a dict, and want to know if something is a key in the dict, I can use `in`. Python finds keys in constant time, which we write as `O(1)`.  It doesn't matter how  many key-value pairs you have! 

# Files!

We are going to deal with plain-text files, which often have a `.txt` extension. Can Python handle fancier file types? Absolutely. But you need patience and/or special libraries to work with them, and we don't have those tonight.

To work with a file, we'll need to "open" it:

- We call `open` on a filename
- This asks the operating system to open the file on our behalf.  
- We get a "file-like object" back.  Some languages would call this a "file handle." Some would just call it a file object.
- We use that file object to read data, or to write data.
- When we're done with the file object, we close it.  That releases resources on the computer.

In [111]:
# Example: Reading from /etc/passwd, which is a standard Unix file.

# opening a file without specifying read/write, means: read
f = open('/etc/passwd')  # opening a file, getting a file object back
type(f)

_io.TextIOWrapper

In [112]:
f

<_io.TextIOWrapper name='/etc/passwd' mode='r' encoding='UTF-8'>

In [114]:
# how can I read from the file?  I could use the read() method

f.read()    # return a string with the file's contents since the last read()

''