# Week 3: Dictionaries and files

1. Data structures recap
2. Dictionaries
    - Creating them
    - Using them
    - Dictionaries vs. other data structures
    - The different paradigms for dict use
3. Files
    - (Just text files)
    - Reading from files
    - Writing to files
    - files.lerner.co.il, grab the first zipfile, for basic/intro Python (https://files.lerner.co.il/exercise-files.zip)

# Data structures so far

1. Numbers (`int` and `float`)
    - Counting
    - Calculating
    - Indexes into strings/lists/tuples
    - Used in data science
    - *Neither* 32, nor 64-bit integers.  Integers in Python will take as much memory as possible to keep the number in memory.  So if you have a 1,000,000-digit integer, that's OK!
2. Strings
    - Text we want to print
    - Input from the user
    - Keys in dictionaries
    - Reading from / writing to files
    - Ordered, with an index starting at 0
    - Immutable -- cannot be changed, once created
3. Lists
    - (Traditionally) collections of the same type -- list of numbers, list of strings
    - List of usernames, or a list of IP addresses, or a list of filenames
    - Often, we'll iterate over a list with a `for` loop
    - These are ordered, with an index starting at 0
    - Lists are mutable -- they can be changed, and extended, once created
4. Tuples
    - (Traditionally) collections of different types
    - Used as records/struct
    - Behind the scenes, Python uses tuples when we call functions (to pass the arguments)
    - Immutable, like strings
    - Can contain anything, like lists
    
Lists and tuples can both contain any combination of any types you want, including other lists and tuples.    
    
Strings, lists, and tuples are all "sequences," with a common family of functionality:
- They all use 0-based indexes
- Retrieve one item with `[i]`
- Retrieve a slice with `[start:end]` or even `[start:end:step]`
- You can iterate over them all with `for` loops
- You can search in them with `in`


# What about arrays?

An array is a data structure in many different programming languages.  It is an ordered collection, often with an index starting at 0.  (Sometimes, in some languages, it starts with 1, but that's not important.)

So, are lists arrays?  The answer is: No.  Why not?  Because by definition, an array is:

1. All of one type.
2. Unchanging in size from when it was first created.

Lists violate both of these rules, so they aren't arrays.

Do we have arrays in Python?  Do we need arrays in Python?

And the answer is... there is an "array" type in Python, but it's used so rarely that it'll soon be removed from the language.

In the world of data science, we use NumPy and Pandas, two add-on extension packages.  These define arrays that are *very* popular.  But they're not core to the language.

# Dictionary

If you've used other programming languages before, then you might be familiar with "dicts" (as they're known in Python) by another name:

- Hash maps
- Hashes
- Maps
- Associative arrays
- Hash tables
- Key-value stores
- Name-value stores

The ideas behind a dictionary are:

- Data comes in pairs (known as a key and value, or a name and value)
- You can decide what the keys and values are
- Nearly any type of data can be a key
- Any type of data can be a value

It turns out that a very large number of programming problems can be solved with key-value pairs.

In [4]:
# let's define a dict:

# (1) we use {} around the dictionary
# (2) The keys and values are separated by :
# (3) Each key-value pair is separated from other pairs by ,
# (4) All keys have values, all values have keys

d = {'a':10, 'b':20, 'c':30}

# first pair: key is the string 'a', value is the int 10
# second pair: key is the string 'b', value is the int 20
# third pair: key is the string 'c', value is the int 30


In [2]:
# let's ask Python: What is d?
type(d)

dict

In [3]:
# how many key-value pairs are there in d?
len(d)

3

In [6]:
# what happens if I don't put quotes around 'a', 'b', and 'c'

d = {a:10, b:20, c:30}

# first pair: key is the value in the variable a, value is the int 10
# second pair: key is the value in the variable b, value is the int 20
# third pair: key is the value in the variable c, value is the int 30


NameError: name 'a' is not defined

In [7]:
d

{'a': 10, 'b': 20, 'c': 30}

In [8]:
# we can retrieve from a dict with [], just like with strings, lists, and tuples
# but now we don't use the index (starting at 0), but rather the key

d['a']

10

In [9]:
d['b']

20

In [10]:
d['c']

30

In [11]:
# what happens if I retrieve a key that doesn't exist?
d['x']

KeyError: 'x'

In [12]:
# how can I check to see if a key is in a dict?
# I can use "in" to look for that.
# ("in" only looks at the keys, not at the values)

'a' in d

True

In [13]:
'b' in d

True

In [14]:
'c' in d

True

In [15]:
'x' in d

False

In [16]:
k = 'a'

print(d[k])  # retrieve the value associated with d[k], where k is a variable whose value is 'a'

10


In [20]:
k = input('Enter a key: ').strip()

print(d[k])

Enter a key: x


KeyError: 'x'

# Dict rules

1. Keys must be *immutable* types, normally meaning integers and strings.  (Tuples can be, but that's much rarer.)
2. Values can be any type at all, without any exceptions.
3. Keys are unique. A key can exist only once in a dict.  Values can recur as often as we want.

In [21]:
# one use for a dict is as a small database

person = {'first':'Reuven', 'last':'Lerner', 'shoesize':46}

In [22]:
person['first']

'Reuven'

In [23]:
person['last']

'Lerner'

In [24]:
person['shoesize']

46

In [25]:
# this is much easier to read (I think!) than a list or tuple with indexes 0, 1, and 2

In [30]:
# tuples are indexes starting at 0, just like strings and lists
t = ('Reuven', 'Lerner', 'shoesize')

In [27]:
t[0]

'Reuven'

In [28]:
t[1]

'Lerner'

In [29]:
t[2]

'shoesize'

In [31]:
# could I create a dict in which the keys are numbers? Sure!

months = {1:'January', 2:'February', 3:'March', 4:'April', 5:'May'}

In [32]:
months[5]

'May'

In [33]:
# can we create a dictionary of tuples? Yes!
# tuples are immutable, so they could be the keys and/or the values

In [34]:


cities = {(40.7127837, -74.0059413): 'New York',
          ( 34.0522342, -118.2436849): 'Los Angeles'}

In [35]:
cities

{(40.7127837, -74.0059413): 'New York',
 (34.0522342, -118.2436849): 'Los Angeles'}

In [36]:
# dict keys can be any immutable type -- numbers (int, float), strings, or tuples
# dict values can be any type at all, including both immutable and mutable types

In [None]:
# in this dict, months, the keys are integers (1, 2, 3, 4, etc. ) and the values are strings

months = {1:'January', 2:'February', 3:'March', 4:'April', 5:'May'}

Dicts will *always* be defined as 

{key1:value1, key2:value2, key3:value3}


In [37]:
# could I do this?

a=1
b=2
c=3

months = {a:'January', b:'February', c:'March'}



In [38]:
months

{1: 'January', 2: 'February', 3: 'March'}

In [39]:
months[1]

'January'

In [40]:
months[a]  # a is a variable with the value 1

'January'

In [41]:
months['a']   # does the "months" dict have a key 'a'?  I don't think so...

KeyError: 'a'

# Paradigms of dictionary use

1. Create a dict at the start of the program, and use it as a small database inside of your program.

(More paradigms coming soon!)

# Exercise: Restaurant

1. Define a dict, `menu`, that contains the items and prices on a restaurant menu.
2. Ask the user, repeatedly, to order something on the menu.
    - If the user enters an empty string, then stop asking, and print the total price
    - If the user enters something on the menu, then print the price and the current total
    - If the user enters something *not* on the menu, then scold them and let them try again
    
Example:

    Order: sandwich
    sandwich is 10, total is 10
    Order: tea
    tea is 7, total is 17
    Order: elephant
    we are out of elephant today!
    Order: [ENTER]
    total is 17

In [42]:
menu = {'sandwich':10, 'tea':7, 'apple':3, 'cake':5}

In [43]:
len(menu)

4

In [44]:
menu['sandwich']

10

In [45]:
menu['tea']

7

In [46]:
'sandwich' in menu   # is the string 'sandwich' a key in menu?

True

In [47]:
menu = {'sandwich':10, 'tea':7, 'apple':3, 'cake':5}
total = 0

while True:   # ask the user, repeatedly

    order = input('Order: ').strip()
    
    if order == '':   # did we get an empty string? break out of the while loop
        break
        
    if order in menu:   # is the user's order a key in our dict?
        price = menu[order]   # get the price
        total += price         # add the price to the total
        print(f'{order} costs {price} -- total is now {total}')
    else:
        print(f'We are out of {order} today')
        
print(f'total is {total}')          

Order: sandwich
sandwich costs 10 -- total is now 10
Order: tea
tea costs 7 -- total is now 17
Order: elephant
We are out of elephant today
Order: apple
apple costs 3 -- total is now 20
Order: 
total is 20


In [49]:
# we can get the keys of a dict with the .keys method
# (pretty rare to use this)

menu.keys()

dict_keys(['sandwich', 'tea', 'apple', 'cake'])

In [50]:
# we can get the values from a dict with the .values method
# (more common)

menu.values()

dict_values([10, 7, 3, 5])

In [51]:
10 in menu.values()

True

In [54]:
# Search for "How to sort anything" that I gave at Euro Python (on YouTube)
# for info how to sort dictionaries
# (includes advanced techniques!)

In [55]:
d = {'a':10, 'b':20, 'c ':30} # notice the third key!

In [56]:
d['a']

10

In [57]:
d['b']

20

In [58]:
d['c']

KeyError: 'c'

In [59]:
# dictionaries are mutable!  (We can change them)

In [61]:
d = {'a':10, 'b':20, 'c':30}

# I can update/change a value by assigning to it
# assigning to an existing key updates the value for that key
d['c'] = 12345

d

{'a': 10, 'b': 20, 'c': 12345}

In [62]:
# can I add new key-value pairs to my dict? Yes!
# there is no "append" method for dicts -- you just assign to a new key

d['x'] = 98765

In [63]:
d

{'a': 10, 'b': 20, 'c': 12345, 'x': 98765}

In [64]:
# updating a value in a dict, and adding a new key-value pair to a dict are both done with assignment

In [65]:
# can we remove key-value pairs from a dict?
# yes, but you should know that it's kind of rare to do this

d.pop('x')   # (a) removes the key 'x' and (b) returns its value

98765

In [66]:
d

{'a': 10, 'b': 20, 'c': 12345}

In [67]:
# I have a dict with three key-value pairs

d = {'a':10, 'b':20, 'c':30}

d

{'a': 10, 'b': 20, 'c': 30}

In [68]:
# I want to remove the key-value pair associated with the key 'c'
# I invoke the "pop" method, passing the key 'c'

# this will:
# (1) remove the key-value pair 'c':30
# (2) return the value 30 which was associated with 'c'

d.pop('c')   

30

In [69]:
d

{'a': 10, 'b': 20}

In [70]:
d.pop('c')  # can we remove the key 'c' once again?

KeyError: 'c'

# Everything is via the key!

In a dict, the key is king!

- Keys must be unique
- Searching in a dict is via the key
- We add key-value pairs via the key (and the value is brought along for the ride)
- We update key-value pairs via the key
- We remove key-value pairs via the key

In [71]:
# values can repeat, even though keys cannot

d = {'a':10, 'b':10, 'c':10, 'd':10}

In [72]:
# if I could remove a key via a value, how would that work?
# which of the keys would be removed if I could say

d.remove_by_value(10)   # this does not exist... but if it did, what would happen?



AttributeError: 'dict' object has no attribute 'remove_by_value'

In [73]:
menu = {'sandwitch': 10, 'Tea': 7, 'apple': 3, 'cake': }

SyntaxError: expression expected after dictionary key and ':' (3560276099.py, line 1)

In [74]:
d = {'a':10, 'b':20, 'c':None, 'd':None, 'e':50}

In [76]:
print(d['c'])

None


# Next up

1. Using dicts to accumulate data in a program
2. Using dicts to track data when we don't know either the keys or the values in advance!
3. Looping over dicts

In [77]:
d = {'a':10, 'b':20, 'c':30}

d['a']

10

In [78]:
d['a'] = 20
d

{'a': 20, 'b': 20, 'c': 30}

In [79]:
# can I do this?
d['a'] += 1 

In [80]:
d

{'a': 21, 'b': 20, 'c': 30}

In [81]:
# If I want, I can create a dict with keys and 0s as values, and then accumulate
# information over time in my program.  At the end of the program, I can print the 
# dict, which will contain all of the counts.

# Paradigms for dict uses

1. Define the dict at the start of the program, and use it as a read-only database.
2. Define the dict at the start of the program with keys and 0s as values.  Over the course of the program, add to those numbers.  The keys will remain the same, but the values will change.



# Exercise: Vowels, digits, and others

1. Define a `counts` dict with three keys: `vowels`, `digits`, and `others`.  All three should have the value 0.
2. Ask the user, repeatedly, to enter a string.
    - If they enter an empty string, stop asking, and print `counts`.
3. Go through each character in the entered string, and check:
    - Is it a vowel? If so, add 1 to `counts['vowels']`
    - Is it a digit? If so, add 1 to `counts['digits']`
    - If neither, add 1 to `counts['others']`.
    
Example:

    Enter string: hello
    Enter string: bye 123
    Enter string: [ENTER]
    {'vowels':3, 'digits':3, 'others':6}

In [82]:
counts = {'vowels':0, 'digits':0, 'others':0}

while True:   # infinite loop
    s = input('Enter a string: ').strip()
    
    if s == '':   # if the user enters an empty string, stop asking
        break
        
    # go through each character in the user's input
    for one_character in s:
        if one_character in 'aeiou':  # if it's a vowel
            counts['vowels'] += 1
        elif one_character.isdigit():  # if it's a digit
            counts['digits'] += 1
        else:
            counts['others'] += 1
            
print(counts)    

Enter a string: hello
Enter a string: bye 123
Enter a string: 
{'vowels': 3, 'digits': 3, 'others': 6}


In [84]:
x = 100  # behind the scenes, this is stored in a dict, as 'x':100

# Namespaces

A "namespace" is a programming term for a bunch of variables that are grouped together, sort of like a last name / surname.  A namespace ensures that there won't be "namespace collisions," when more than one part of a program tries to use the same variable name.

So if you have a variable `x` and I have a variable `x`, we can ensure they won't collide by having separate namespaces.

We've effectively created our own, new, simple namespace here by putting `vowels`, `digits`, and `others` into a dict, rather than creating variables at the top level of our program.

In [85]:
# Another way to do vowels, digits, and others

# this time, I won't count how many vowels, digits, and others I have
# rather, I'll store them in lists!

counts = {'vowels':[], 'digits':[], 'others':[]}

while True:   # infinite loop
    s = input('Enter a string: ').strip()
    
    if s == '':   # if the user enters an empty string, stop asking
        break
        
    # go through each character in the user's input
    for one_character in s:
        if one_character in 'aeiou':  # if it's a vowel
            counts['vowels'].append(one_character)
        elif one_character.isdigit():  # if it's a digit
            counts['digits'].append(one_character)
        else:
            counts['others'].append(one_character)
            
print(counts)    

Enter a string: hello
Enter a string: bye 123
Enter a string: 
{'vowels': ['e', 'o', 'e'], 'digits': ['1', '2', '3'], 'others': ['h', 'l', 'l', 'b', 'y', ' ']}


# Dict paradigms

1. Define the dict, and use it as a read-only database in the program.
2. Define the dict, and update its values but keep the keys as they are.
3. Define an empty dict, and add both keys as values as necessary.

In [86]:
# how often does key character appear?

counts = {}   # empty dict!

s = input('Enter a string: ').strip()

for one_character in s:
    if one_character in counts:      # have we already seen this character before?
        counts[one_character] += 1   # add 1 to its count
    else:
        counts[one_character] = 1    # first time seeing it? set it to 1
        
print(counts)        

Enter a string: hello
{'h': 1, 'e': 1, 'l': 2, 'o': 1}


# Exercise: Rainfall

We're going to ask the user to enter the name of a city, and how many mm of rain fell in that city yesterday.  We'll keep track of the rainfall in a dict, called `rainfall`, in which the keys are city names and the values are integers.  Over time, we'll know how much rain fell in each city.

1. Set up an empty dict, `rainfall`.
2. Ask the user, repeatedly, to enter the name of a city.
3. If the city name is empty, then stop asking (i.e., break out of the loop).
4. If we got a city name, then ask the user to enter the number of mm rain that fell.  (Assume this is a legal number.)
5. If we've seen the city before, then add this new amount of rain to the existing amount.
6. If we haven't seen this city before, then add a new key-value pair to our dict, the city and the amount of rain.
7. After exiting from the loop, print the `rainfall` dict.

Example:

    City: Jerusalem
    Rain: 5
    City: Tel Aviv
    Rain: 4
    City: Jerusalem
    Rain: 3
    City: [ENTER]
    {'Jerusalem':8, 'Tel Aviv':4}

In [90]:
rainfall = {}   # empty dict

while True:     # I don't know how many cities/reports will be entered
    
    city_name = input('Enter city name: ').strip()
    
    if city_name == '':   # stop asking if we got an empty city name
        break
        
    mm_rain = input('Enter mm rain: ').strip()  # remember: input always gives us a string!
    mm_rain = int(mm_rain)                      # get an integer based on mm_rain
    
    if city_name in rainfall:
        rainfall[city_name] += mm_rain    # non-first times we see a city, add to what's already there
    else:
        rainfall[city_name] = mm_rain     # first time we see a city, assign
    
print(rainfall)    

Enter city name: Jerusalem
Enter mm rain: 5
Enter city name: Tel Aviv
Enter mm rain: 4
Enter city name: Jerusalem
Enter mm rain: 3
Enter city name: 
{'Jerusalem': 8, 'Tel Aviv': 4}


# Next up

1. Dicts
    - Looping over dicts
    - How do dicts work? (Why are they so beloved?)
2. Files
    - Opening files
    - Reading from files
    - Iterating over files
    - Writing to files 
    
(If you haven't yet downloaded the zipfile with some example files, please do that.)

In [91]:
# Another way to do vowels, digits, and others

# this time, I won't count how many vowels, digits, and others I have
# rather, I'll store them in lists!

# dict keys: strings
# dict values: lists
counts = {'vowels':[], 'digits':[], 'others':[]}

while True:   # infinite loop
    s = input('Enter a string: ').strip()
    
    if s == '':   # if the user enters an empty string, stop asking
        break
        
    # go through each character in the user's input
    for one_character in s:
        if one_character in 'aeiou':  # if it's a vowel
            counts['vowels'].append(one_character)
        elif one_character.isdigit():  # if it's a digit
            counts['digits'].append(one_character)
        else:
            counts['others'].append(one_character)
            
print(counts)    

Enter a string: hello
Enter a string: bye 123!
Enter a string: 
{'vowels': ['e', 'o', 'e'], 'digits': ['1', '2', '3'], 'others': ['h', 'l', 'l', 'b', 'y', ' ', '!']}


In [92]:
# can I count the number of characters in each value?
# answer: yes! These are lists

len(counts['vowels'])

3

In [93]:
len(counts['digits'])

3

In [94]:
len(counts['others'])

7

# Brief overview of sets

Sets are a core data type in Python (that we're not really going to discuss here).  A set is basically the same as a dict's keys.  Meaning: The values are guaranteed to be unique, and I can search for them.  You can add and remove elements to a set, as well.

Sets are often used when you want to ensure that you have unique values, or be able to search for things (e.g., usernames or IP addresses) quickly.

# Looping over dicts

If I run a `for` loop over a data structure, what I get back depends on the data structure:

- Each iteration over a string gives me one character
- Each iteration over a list or tuple gives me one element

In [95]:
s = 'abcd'
for one_item in s:
    print(one_item)

a
b
c
d


In [96]:
mylist = [10, 20, 30]
for one_item in mylist:
    print(one_item)

10
20
30


In [97]:
mylist = ['abcd', 'ef', 'ghij']

for one_item in mylist:
    print(one_item)

abcd
ef
ghij


In [98]:
# what happens if I iterate over a dict?

d = {'a':10, 'b':20, 'c':30}

for one_item in d:   # I get the keys, and *only* the keys!
    print(one_item)

a
b
c


In [99]:
for one_key in d:
    print(f'{one_key}: {d[one_key]}')

a: 10
b: 20
c: 30


In [100]:
rainfall

{'Jerusalem': 8, 'Tel Aviv': 4}

In [101]:
for one_city in rainfall:
    print(f'{one_city}: {rainfall[one_city]}')

Jerusalem: 8
Tel Aviv: 4


In [102]:
counts

{'vowels': ['e', 'o', 'e'],
 'digits': ['1', '2', '3'],
 'others': ['h', 'l', 'l', 'b', 'y', ' ', '!']}

In [103]:
for one_key in counts:
    print(f'{one_key}')
    for one_item in counts[one_key]:
        print(f'\t{one_item}')

vowels
	e
	o
	e
digits
	1
	2
	3
others
	h
	l
	l
	b
	y
	 
	!


In [104]:
# to learn how to sort dictionaries, see my Euro Python talk from 2021,
# "How to sort anything"

# https://www.youtube.com/watch?v=Z3c2LvEJeu0

In [106]:
# another (I think better) way to iterate over a dict
# the "items" method is designed for use in a "for" loop
# it returns a key-value tuple with each iteration
# we can then pull that apart with unpacking

for t in d.items():
    print(t)

('a', 10)
('b', 20)
('c', 30)


In [107]:
for key, value in d.items():   # use unpacking in the for loop to assign (key, value) to 2 variables
    print(f'{key}: {value}')

a: 10
b: 20
c: 30


In [108]:
print(d)

{'a': 10, 'b': 20, 'c': 30}


In [109]:
# you might know that Python lets us multiply an integer by a string

'x' * 3

'xxx'

In [110]:
'x' * 10

'xxxxxxxxxx'

In [112]:
# we can create a simple histogram! 

for key, value in d.items():
    print(f'{key}: {value * "x"}')

a: xxxxxxxxxx
b: xxxxxxxxxxxxxxxxxxxx
c: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx


In [114]:
for city, mm_rain in rainfall.items():
    print(f'{city:12}: {mm_rain * "x"}')

Jerusalem   : xxxxxxxx
Tel Aviv    : xxxx


In [115]:
# don't do this!

for one_key in d.keys():   # why do this, when we can just say "for one_key in d"?
    print(one_key)

a
b
c
