# Agenda: Week 3

1. Questions
2. Dictionaries
    - What are they?
    - How do we create them?
    - How do we use them?
    - Different paradigms for dict usage
3. Files
    - What are files?
    - How can we read from a file?
    - Iterating over files
    - Turning files into data structures
    - (A little bit) writing to files, and what's involved there

# The story so far

We've seen a bunch of things so far:

- Python's nouns are values
- We can assign any value to a variable (a name that lets us keep track of it)
- Every value has a type
    - Integers
    - Floats
    - Strings
    - Lists
    - Tuples
- Each data structure is useful for different things
    - Faster/slower at retrieval and storage
    - Can we modify it? (Is it mutable?)
    - What can we store in it?

The default data structure we use for storing (and retrieving) is the *list*:
- Can store anything at all (although it's traditional for all values to be of the same type)
- Items are stored and retrieved via a numeric index, starting at 0
- We can modify a list (i.e., it is *mutable*):
    - We can update/change an existing value by assigning to that index
    - We can add one or more new items to a list, typically at the end, using some methods, including `list.append`
    - We can remove one or more items from a list, typically from the end, using some methods, including `list.pop`
- You can iterate over the elements of a list, one at a time
- What's wrong with lists?
    - Because we can modify them, they always have some extra space around in memory, so that the computer doesn't need to allocate new memory each time we do sometime
    - If we want to search for a value in a list, we need to iterate over it, one element at a time (this can be slow, if we have a big list)

(Both lists and tuples are ordered -- whatever order you create values, the elements will remain in that order.)

The first problem is solved, to some degree, by *tuples*.
- Tuples are like lists, but they are *immutable*, they cannot be changed
- As a result, they have no extra memory -- they are as compact and efficient as something can be in Python
- As a result, tuples are often used in Python to represent a collection of data
    - A person, who has a name, birthdate, and height -- three different types of values, appropriate for a tuple
    - A database record, which has several fields of different types
- Even if you won't personally use tuples much, because they are so efficient, Python uses them a *lot* behind the scenes.

The second problem (of searching) is solved, in no small part by dictionaries.



# Lists vs. tuples

It's true that lists are mutable and tuples are immutable.  You can never change the length or values in a tuple. You can always change the values and length of a list.

HOWEVER, you don't choose between a list and tuple based on whether you plan to change the values. Rather, you choose whether to use a list or tuple based on what you're storing inside of it:

- If all of the elements are of the same type (integer, string, etc.), then use a list.
- If the elements are of different types, then use a tuple.

We don't choose tuples because they're safer, or cannot be changed.

If you have districts (all strings) and subdistricts (all strings), then you would likely use two lists. OR as we will see in the next few hours, you might use a dict in which the districts are the keys and the subdistricts are the values.

# What is a dictionary?

They aren't unique to Python. However, other languages call them other things:

- Hashes
- Hashmaps
- Hash tables
- Maps
- Associative arrays
- Key-value stores
- Name-value stores

Think of a list in which the indexes can be *anything* at all. That is, we won't use 0, 1, 2, 3, etc., but we can use integers (whatever we want) or strings (whatever we want).

Suddenly, we aren't asking for the letter at index 5. Rather, we're asking for the number at index `height`. We can determine what the indexes are, although we call it a "key" in the world of dictionaries.

If you think of a dictionary ("dict") as a two-column table in which the first column contains the key (i.e., the index) and the second column contains the value, then you basically have the right mental model.

A few rules for dicts:

- The keys can be any *immutable* type. That means, for the most part, keys will be integers and strings.
- The values can be of any type whatsoever, with no exception.
- The keys must be unique in a given dictionary. You cannot have the same key repeat!
- Every key has a value, and every value has a key.

How do we create a dict? How do we use a dict?

We create it using `{}`, with each key separated from its value with `:`. Pairs are separated with commas.

In [1]:
# this is a very simple dictionary, with three key-value pairs

d = {'a':10, 'b':20, 'c':30}

In [2]:
# what type of value is in d?

type(d)

dict

# Retrieving values from a dict

To retrieve a value from a dictionary, we use `[]`, just like with a string, list, or tuple. Inside of the `[]`, we put the key whose value we want to retrieve.

The key must be *precisely* the same as when it was stored. If we used a string, then the capitalization, spelling, and whitespace must all be identical.

If we ask for a key that doesn't exist, we'll get a `KeyError` error.

You always use a key to get a value. You cannot go in the other direction, namely using a value to get a key.

In [3]:
d['a']   # this will return the value associated with the key 'a' in our dictionary

10

In [4]:
d['b']

20

In [5]:
d['hello']   # is there a key 'hello' in d? No, so we'll get an error...

KeyError: 'hello'

# Where do I use dicts?

There is a huge number of problems in programming that involve two parallel sets of information, which we can use as keys and values:

- Month names and month numbers
- Month numbers and month names
- User IDs and user names
- User IDs and complete user records as tuples
- Filename and file contents


In [6]:
# another example of a dict
# month numbers and names

months = {1:'Jan', 2:'Feb', 3:'Mar', 4:'Apr', 5:'May'}

In [7]:
months[1]   # give me the value associated with months[1]

'Jan'

In [8]:
k = 4

months[k]   # I can use a variable to retriev

'Apr'

In [9]:
d

{'a': 10, 'b': 20, 'c': 30}

In [10]:
k = 'c'

d[k]

30

In [11]:
# what if I want to avoid errors?
# how can I check if a key is in a dict?

# we use "in" to look in the keys (not in the values)

'a' in d  # this means: is 'a' a key in d?

True

In [13]:
if 'a' in d:
    print(d['a'])
else:
    print(f'a is not a key in d')

10


In [14]:
if 'qrst' in d:
    print(d['qrst'])
else:
    print(f'qrst is not a key in d')

qrst is not a key in d


In [15]:
# let's create a dict, months, in which integers are the keys
# and strings are the values

months = {1:'Jan', 2:'Feb', 3:'Mar', 4:'Apr', 5:'May'}

In [16]:
months[1]   # use the key (1) to get the value

'Jan'

In [17]:
months['Jan']  # what if I try to retrieve via the value?

KeyError: 'Jan'

In [18]:
# Moreover, you didn't put quotes around your text:

months[Jan]   # it'll look for a variable named Jan, which doesn't exist

NameError: name 'Jan' is not defined

# Exercise: Restaurant

1. Define `total` to be 0. This will be the total bill for eating at the restaurant.
2. Define a dict, `menu`, whose keys are strings (the items on the menu) and whose values are integers (the prices of those items).
3. Ask the user, repeatedly, to order something on the menu. (Here, you'll use a `while True` loop.)
    - If the user enters an empty string, then stop asking (i.e., use `break` to exit the loop)
    - If the user's order is on the menu (i.e., if what they entered is a key in our `menu` dict), then add the price to `total`, and print the item, the price, and the new total
    - If the user's order is *not* on the menu (i.e., if their entered order is *not* a key, which you can check, with `in`), then scold them a bit.
4. At the end, print the total.

Example:

    Order: sandwich
    sandwich is 10, total is now 10
    Order: tea
    tea is 8, total is now 18
    Order: elephant
    We're all out of elephant today!
    Order: [ENTER]
    Total is 18



In [21]:
total = 0

menu = { 'sandwich':10  ,  'tea':8   ,  'apple':3   , 'cake':10   }

while True:    # infinite loop
    order = input('Order: ').strip()

    if order == '':   # did the user enter an empty string?
        break

    if order in menu:        # is the user's input string, order, a key in our dict?
        price = menu[order]  # get the price
        total += price
        print(f'{order} costs { price}, total is now {total}')

    else:    # if it wasn't a key in our menu dict
        print(f'Sorry, we are out of {order} today!')

print(f'Total is {total}')        

Order:  sandwich


sandwich costs 10, total is now 10


Order:  tea


tea costs 8, total is now 18


Order:  apple


apple costs 3, total is now 21


Order:  something else


Sorry, we are out of something else today!


Order:  


Total is 21


In [20]:
len(menu)   # how big is this menu?

4

In [22]:
# TT

total = 0
menu = { 'sandwich':10, 'tea': 8, 'coffee': 8}

while True:
    order = input('enter your order:').strip()
    if order in menu:
        total += menu[order]
        print(f'{order} is {menu[order]}, total is now {total}')
    elif order == '':
        break
    else:
        print(f'We\'re out of {order} today!')

print(f'Total is: {total}')

enter your order: sandwich


sandwich is 10, total is now 10


enter your order: 


Total is: 10


In [None]:
# CA

total = 0     # an integer, the total amount that the person has to pay
menu = {'soup': 10, 'tea': 2, 'beer': 5, 'steak': 20}
while True:
    order = input("What you want to order? ")   # a string, the item that the person wants to buy/eat
    if order in menu:      # if the user's request (order) is a key in the dict menu...
        order += total     # total += menu[order]
    elif order == '':
        print(total)
        break
    elif order != menu:    # order (a string) will never be equal to a dict (menu) 
        print("You need to decide!!")

In [None]:
# CA, rewritten 

total = 0     # an integer, the total amount that the person has to pay
menu = {'soup': 10, 'tea': 2, 'beer': 5, 'steak': 20}
while True:
    order = input("What you want to order? ")   # a string, the item that the person wants to buy/eat
    if order in menu:      # if the user's request (order) is a key in the dict menu...
        total += menu[order]
    elif order == '':
        print(total)
        break
    else:  # this means: the person's order is *not* a key in the "menu" dict
        print("You need to decide!!")

In [25]:
# SL

total=0 #Total of bill for eating at the restaurant
menu = {'pickles':10, 'balogna':15, 'drinks':8, 'fries':8, 'salad':15}

while True:
    order = input(f'Enter your order please from the following list {menu.keys()}')

    if order == '':
        break
    
    if order in menu:
        total += menu[order]
    else:
        print(f'We are out of {order} today')

print(f'Your total is {total}')

Enter your order please from the following list dict_keys(['pickles', 'balogna', 'drinks', 'fries', 'salad']) asdfafa


We are out of asdfafa today


Enter your order please from the following list dict_keys(['pickles', 'balogna', 'drinks', 'fries', 'salad']) 


Your total is 0


In [27]:
# CC

total = 0
menu = {"bread":1, "coffe":2, "muffin":3}

while True:
    new_order = input(f"Cosa ordini?")

    if new_order == '':
        break
    
    if new_order in menu:
        total += menu[new_order]
    else:
        print(f'Your order, {new_order}, is not in stock')
    print("Vuoi altro?")
print(total)

Cosa ordini? asdfafafafsa


0


In [29]:
# CK

total = 0

menu = {'omelet':5,'coffee':2,'cheesecake':6,'tacos':4,'icecream':3}

while True:
    order = input('please place your Order: ')
    if order == '':
        break
    if order in menu:   # if order is a key in menu, then we can use menu[order]
        total += menu[order]
        print(f'adding {menu[order]}, total is {total}')
        continue


    else:  # here, we know that order is *not* a key in menu -- so asking for menu[order] is asking for trouble
        print(f'{menu[order]} is not a valid order')
print(f'your total is: {total}')

please place your Order:  omelet


adding 5, total is 5


please place your Order:  icecream


adding 3, total is 8


please place your Order:  asdfasfaf


KeyError: 'asdfasfaf'

In [None]:
# EP

total = 0

menu = { "burger": 3.50, "fries": 2.50, "drink": 1.75 }

question = input("What would you like to order? ")

while question.lower() != "no": 
    if question in menu: 
        total += menu[question] 
        print(f"Adding {question} to your order.") 
    else: 
        print("Sorry, we don't have that.") 

    question = input("Anything else? (Type 'no' to finish) ")

print(f"Your total is ${total:.2f}")

In [30]:
# EP, rewritten

total = 0

menu = { "burger": 3.50, "fries": 2.50, "drink": 1.75 }

while True:  
    question = input('Enter order: ').strip()

    if question == '':   # this is how we can leave the loop
        break
    
    if question in menu: 
        total += menu[question] 
        print(f"Adding {question} to your order.") 
        
    else: 
        print("Sorry, we don't have that.") 

print(f"Your total is ${total:.2f}")

Enter order:  burger


Adding burger to your order.


Enter order:  asdfasfsafa


Sorry, we don't have that.


Enter order:  


Your total is $3.50


In [31]:
# KM

total = 0
menu = {
    "sandwich": 10,
    "tea": 8,
    "salad": 7,
    "soup": 6,
    "coffee": 5
}

while True:
    order = input("Order: ").strip().lower()
    
    if order == "":
        break
    
    if order in menu:
        total += menu[order]
        print(f"{order} is {menu[order]}, total is now {total}")
    else:
        print("We're all out of that today!")

print(f"Total is {total}")

Order:  sandwich


sandwich is 10, total is now 10


Order:  asdfsafsafa


We're all out of that today!


Order:  


Total is 10


# Next up

1. Modifying/mutating dicts
2. Using dicts to keep track of running totals

# Dictionaries are mutable!

Like lists, dicts are mutable, which means:

- We can change the value associated with a key
- We can add new key-value pairs
- We can remove existing key-value pairs

In [32]:
d = {'a':10, 'b':20, 'c':30}     # 3 key-value pairs

In [33]:
# how can I change the value associated with a key?

d['b'] = 12345   # this assigns the value 12345 instead of whatever is already there.
d

{'a': 10, 'b': 12345, 'c': 30}

In [34]:
# we can similarly use += to update the value based on what's there

d['c'] += 15    # this is the same as saying d['c'] = d['c'] + 15
d

{'a': 10, 'b': 12345, 'c': 45}

In [35]:
# how can I add a new key-value pair?
# with a list, I add a new element with list.append

# in a dict, it's far easier -- I just assign to the key and value
# yes, this looks *precisely* the same as updating a value on an existing key

d['new'] = 9876

In [36]:
d

{'a': 10, 'b': 12345, 'c': 45, 'new': 9876}

In [37]:
d['new'] += 15   

d

{'a': 10, 'b': 12345, 'c': 45, 'new': 9891}

In [38]:
# what if I use += with a brand-new key?

d['other'] += 10  # what will happen here?

KeyError: 'other'

In [39]:
# we can remove an existing key-value pair by invoking dict.pop
# we pass the key to the method, and the key-value pair is removed
# the value is returned

d.pop('a')   # this will return a's value and remove the pair
d

{'b': 12345, 'c': 45, 'new': 9891}

In [40]:
# SL asked, what if we have districts and subdistricts?
# we could have a dict in which the keys are strings and the values are lists of strings

d = {'dist1': ['sd1', 'sd2', 'sd3'],
     'dist2': ['sd4', 'sd5', 'sd6']}

d['dist1']

['sd1', 'sd2', 'sd3']

In [41]:
d['dist2']

['sd4', 'sd5', 'sd6']

In [42]:
# can I add a new subdistrict to dist2?

d['dist2'].append('sd7')

d

{'dist1': ['sd1', 'sd2', 'sd3'], 'dist2': ['sd4', 'sd5', 'sd6', 'sd7']}

In [45]:
# CK

a = 12345
d = {  'a' : a   }   # key is the string 'a', and the value is whatever was in the variable a 

d

{'a': 12345}

In [46]:
d['a'] +=d   # this means: add the dictionary d to the value of d['a']  -- you can't add a dict to an integer

TypeError: unsupported operand type(s) for +=: 'int' and 'dict'

In [47]:
print(d['a'])

12345


# Common dict paradigm: Accumulating

We've now seen that we can define a dict with keys and initial values (e.g., 0) and then add to those values over time. That's a pretty common use of dicts -- we don't, over the course of the program, add/remove keys. But we do update the values associated with them.

In [48]:
# odds and evens

counts = {'odds':0,
          'evens':0}

while True:
    s = input('Enter a number: ').strip()

    if s == '':   # if we got an empty string, stop asking
        break

    if not s.isdigit():     # can we turn this into an integer?
        print(f'{s} is not numeric; try again')
        continue

    n = int(s)   # get an integer based on s

    if n % 2 == 0:  # if dividing n by 2 has no remainder, it's even
        counts['evens'] += 1
    else:
        counts['odds'] += 1

print(counts)
        

Enter a number:  10
Enter a number:  15
Enter a number:  20
Enter a number:  22
Enter a number:  


{'odds': 1, 'evens': 3}


# Exercise: Vowels, digits, and others (dict edition)

1. Define a dict `counts` with three key-value pairs. The keys should be `vowels`, `digits`, and `others`. The values should all be 0.
2. Ask the user to enter a string.
3. Go through the string, one character at a time (with a `for` loop):
    - If it's a vowel, add 1 to the `vowels` value
    - If it's a digit, add 1 to the `digits` value
    - Otherwise, add 1 to `others`
4. Print the dict

In [49]:
counts = {'vowels':0, 'digits':0, 'others':0}

s = input('Enter a string: ').strip()

for one_character in s:
    if one_character in 'aeiou':
        counts['vowels'] += 1
    elif one_character.isdigit():
        counts['digits'] += 1
    else:
        counts['others'] += 1

print(counts)        

Enter a string:  hello everyone, 1234


{'vowels': 6, 'digits': 4, 'others': 10}


In [None]:
# SL

# this is a dict ('counts') with three key-value pairs
counts = {'vowels':0,  'digits':0, 'others':0}

string=input('Enter a string').lower()
for one_character in string:
    if one_character.lower () in 'aeiou':
        counts['vowels'] += 1
    elif one_character.isdigit():    # str.isdigit returns True/False
        counts['digits'] += 1
    else:
        counts['others'] += 1

print(counts)        

In [None]:
# NB

count = {'vowels':0, 'digits':0, 'others':0}

text = input("Enter a text: ").strip()

for s in text:
    if s.lower() in "aeiou":
        print(f"Found a vowel: {s}")
        count['vowels'] += 1
    elif s.isdigit():
        print(f"Found a digit: {s}")
        count['digits'] += 1
    else:
        print(f"Found other character: {s}")
        count['others'] += 1

print("*****************************************")
print(count)

In [50]:
# KM

counts = {'vowels':0,
          'digits':0,
          'others':0}


s = input('Enter a string: ').strip()

for one_character in s:
    if one_character in 'aeiou':
    	counts['vowels'] += 1
    elif one_character.isdigit():
    	counts['digits'] += 1
    else:
    	counts['others'] +=1
        
print(counts)

Enter a string:  hello? 123


h
e
l
l
o
?
 
1
2
3
{'vowels': 2, 'digits': 3, 'others': 5}


In [51]:
counts

{'vowels': 2, 'digits': 3, 'others': 5}

# Iterating over a dict

We've seen that we can use a `for` loop on many different types of values in Python:

- On a string, we get each character
- On a list, we get each element
- On a tuple, we get each element

What will happen if we iterate over a dict with a `for` loop?

In [52]:
# when you iterate over a dict, you get the *keys*!

for one_thing in counts:
    print(one_thing)

vowels
digits
others


In [53]:
# we can print all of the key-value pairs as follows:

for one_key in counts:
    print(f'{one_key}: {counts[one_key]}')

vowels: 2
digits: 3
others: 5


In [54]:
# I don't like to iterate over dicts this way, though
# because I need to mention the dict's name both in the "for" line and inside of the loop body
# isn't there a way for me to get both the key and the value with each iteration?

# yes: I can use the dict.items method, which returns a 2-element tuple (key, value) with each iteration

# in other words:

for key, value in counts.items():   # tuple unpacking in the for loop!
    print(f'{key}: {value}')

vowels: 2
digits: 3
others: 5


You might see people using the dict.keys() method, either in a for loop or with "in" to search.

**DO NOT USE `dict.keys()`!!**

# Demo: Vowels, digits, and others (dict edition -- keeping the characters)

1. Define a dict `counts` with three key-value pairs. The keys should be `vowels`, `digits`, and `others`. The values should all be `[]`, the empty list.
2. Ask the user to enter a string.
3. Go through the string, one character at a time (with a `for` loop):
    - If it's a vowel, append it to `counts['vowels']`
    - If it's a digit, append it to `counts['digits']`
    - Otherwise, append it to `counts['others']`
4. Print the dict

In [55]:
counts = {'vowels':[],
          'digits':[],
          'others':[]}

text = input('Enter text: ').strip()

for one_character in text:
    if one_character in 'aeiou':
        counts['vowels'].append(one_character)  # appends one_character to counts['vowels']
    elif one_character.isdigit():
        counts['digits'].append(one_character)  # appends one_character to counts['digits']
    else:
        counts['others'].append(one_character)  # appends one_character to counts['others']

for key, value in counts.items():
    print(f'{key}: {value}')       
        

Enter text:  hello? 123


vowels: ['e', 'o']
digits: ['1', '2', '3']
others: ['h', 'l', 'l', '?', ' ']


In [56]:
x = 10

In [57]:
s = 'abcdEFGH'

s.lower()

'abcdefgh'

# Jupyter

- If you want to install it on your own computer, I made some videos for O'Reilly about how to install Python and Jupyter, on their own and via VSCode
- If you don't want to install it on your computer, there are a few options, but Google Colab is probably the easiest to work with -- free and a bit old, but stable.

# Next up

1. The third paradigm for using dicts: Start with nothing!
2. How do dicts work?
3. Files -- reading from them and a little writing to them

# Dict paradigm 3: Start with nothing

Another common way to use dicts is to start with an empty dict ( `{}` ) and then add either keys (when you first encounter them) or values (if you want to update an existing one).

Here, you have to check whether a key is already in the dict:
- If so, then just update the value
- If not, then add a new key-value pair, with the value being an initial one -- 0, 1, `[]`, or the like

In [59]:
# Example: Count characters

counts = {}  # empty dict!

text = input('Enter text: ').strip()  # running strip means that we remove leading/trailing whitespace

for one_character in text:
    # if I have already seen this character, then add 1 to the count for it
    if one_character in counts:
        counts[one_character] += 1     # add 1 to the existing value

    # if this is a new character, then add it to the dict as the key, with 1 as the value
    else:
        counts[one_character] = 1      # set an initial value of 1

for key, value in counts.items():
    print(f'{key}: {value}')   

Enter text:  hello out there


h: 2
e: 3
l: 2
o: 2
 : 2
u: 1
t: 2
r: 1


In [60]:
counts

{'h': 2, 'e': 3, 'l': 2, 'o': 2, ' ': 2, 'u': 1, 't': 2, 'r': 1}

# Exercise: Rainfall

The goal of this exercise is to:
- Let the user enter a city name
    - If they enter an empty string, stop asking
- If we got a city name, ask for the amount of rain (in mm) that fell in that city
- We'll use a dict to keep track of how much rain fell in each city
    - The dict's keys will be strings (names of cities)
    - The dict's values will be integers (mm rain)
- If the user gives us the same city twice, the second time should add to the first

1. Define an empty dict, `rainfall`
2. Ask the user repeatedly to enter a city name
    - If we got an empty string, exit from the loop
3. Ask the user to get mm rain for that city
4. If we have seen this city before (i.e., its name is a key in the `rainfall` dict already), add to its existing rainfall
5. If this is a new city we haven't seen before (i.e., its name is not a key in `rainfall`), then add a new key-value pair -- the city name (key) and the mm rain (value)
6. At the end, iterate over `rainfall` and print every city and its total rainfall.

Example:

    City: Boston
    Rain: 5
    City: New York
    Rain: 4
    City: Boston
    Rain: 3
    City: [ENTER]
    Boston: 8
    New York: 4

In [64]:
rainfall = {}    # empty dict

while True:
    city_name = input('City: ').strip()

    if city_name == '':
        break

    mm_rain = input('Rain: ').strip()
    mm_rain = int(mm_rain)   # yes, it would be nice to check our values here -- we'll ignore it for now

    if city_name in rainfall:
        rainfall[city_name] += mm_rain   # we've seen this city before? add mm_rain to it
    else:
        rainfall[city_name] = mm_rain    # first time with this city's name? set up the new key-value pair

for key, value in rainfall.items():
    print(f'{key}: {value}')

City:  a
Rain:  5
City:  b
Rain:  4
City:  a
Rain:  3
City:  


a: 8
b: 4


In [66]:
# CK

rainfall = {}

while True:
    user_city = input("Please enter the city name: ")
    if user_city == '':
        break
    user_rainfall = int(input("Please enter the rainfall (in mm): "))
    
    if user_city not in rainfall:
        rainfall[user_city] = user_rainfall   # first time with a city, add the key-value pair
    elif user_city in rainfall:
        rainfall[user_city] = rainfall[user_city] + user_rainfall  # add the new value to the existing one
    else:
        print("Please enter a valid city name.")
print(rainfall)

Please enter the city name:  a
Please enter the rainfall (in mm):  5
Please enter the city name:  b
Please enter the rainfall (in mm):  4
Please enter the city name:  a
Please enter the rainfall (in mm):  3
Please enter the city name:  


{'a': 8, 'b': 4}


# Searching in a dict is faster than a list. How?

If you want to search a list for an element, you might need to go through the whole thing. 

Searching for a key in a dict is *far* faster. How is that? 

Also: Why do dict keys need to be immutable?

- When we append a value to a list, it goes to the end. Its location in the list is a function of when we appended it. Searching means running a `for` loop, looking for our value.
- When we add a new key-value pair to a dict, it is stored in a location based on the key! A special function, called `hash`, takes the key as an argument and returns an integer value. That value tells Python where to store the key-value pair in memory. When we want to retrieve a key-value pair, we give the key, which then is run through `hash`, and tells Python where to retrieve the key-value pair.

In [68]:
# NB

rainfall = {}

while True:
    city = input("Enter a City name: ").strip()
    rain = input("Enter a amount of rainfall in mm: ").strip()
    rain = int(rain)
    
    if city == '':
        break
    elif city in rainfall:
        rainfall[city] += rain
    else:
        rainfall[city] = rain

print(rainfall)

Enter a City name:  a
Enter a amount of rainfall in mm:  5
Enter a City name:  b
Enter a amount of rainfall in mm:  4
Enter a City name:  a
Enter a amount of rainfall in mm:  3
Enter a City name:  
Enter a amount of rainfall in mm:  100


{'a': 8, 'b': 4}


In [80]:
# KM

rainfall = {}

while True:

    s = input('Enter a city name: ').strip()

    if s == '':
        break

    r = input('Enter the rainfall for the city in mm: ').strip()
    r = int(r)

    if s in rainfall:
        rainfall[s] += r
    else:
        rainfall[s] = r

for k, v in rainfall.items():
    print(f'{k}: {v}') 

Enter a city name:  a
Enter the rainfall for the city in mm:  5
Enter a city name:  b
Enter the rainfall for the city in mm:  4
Enter a city name:  a
Enter the rainfall for the city in mm:  3
Enter a city name:  


a: 8
b: 4


In [81]:
# hash of 'a'

d = {'a':10, 'b':20, 'c':30}

print(hash('a'))

if 'a' in d:    # this runs hash('a'), which checks whether 'a' is a key in d

-8845774274989323072

# Files

If you are a normal computer user, then you think about files as containing text, numbers, charts, graphics, etc. 

If you're a programmer, though, then you think of a file as a way to store data structures such that we can turn a computer off and later load them into memory, or move them from one computer to another.

Our data structures -- lists, strings, dicts, etc. -- don't survive a computer's power outage or reboot. We take one or more data structures, store them on a disk in a file, and then load them back later.

There are many different kinds of files out there. We're going work with plain text files, because they're so easy to work with. File formats from Microsoft or Adobe (PDF) are binary formats, which are far more difficult to deal with.

How do we get access to a file? We use the `open` builtin function: It takes an argument of a string, the name of the file we want to open:

- If the string contains no `/` characters, then it refers to a file in the current directory.
- If the string *starts with* a `/` character, then this is an "absolute path," and refers to a file unambiugiously on the computer.
- If the string *contains* a `/`, but doesn't start with it, then it refers to a "relative path," and is relative to where we're running the program.

If I then say

    f = open(filename)

I now have a file object stored in `f`. This gives me access to the file's contents.    

In [82]:
f = open('/etc/passwd')   # absolute path, pointing to a famous file on every Unix/MacOS/Linux system.

In [83]:
# what is f?

f

<_io.TextIOWrapper name='/etc/passwd' mode='r' encoding='UTF-8'>

In [84]:
f.read()   # this returns a string from where the most recent read ended until the end of the file

'##\n# User Database\n# \n# Note that this file is consulted directly only when the system is running\n# in single-user mode.  At other times this information is provided by\n# Open Directory.\n#\n# See the opendirectoryd(8) man page for additional information about\n# Open Directory.\n##\nnobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false\nroot:*:0:0:System Administrator:/var/root:/bin/sh\ndaemon:*:1:1:System Services:/var/root:/usr/bin/false\n_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico\n_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false\n_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false\n_installassistant:*:25:25:Install Assistant:/var/empty:/usr/bin/false\n_lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false\n_postfix:*:27:27:Postfix Mail Server:/var/spool/postfix:/usr/bin/false\n_scsd:*:31:31:Service Configuration Service:/var/empty:/usr/bin/false\n_ces:*:32:32:Certificate Enrollment Service:/var/empty:/usr/bin/fa

The problem is: We can, in invoking `read` on our file object, accidentally crash our computer. What if the file is 2 TB in size? 

We need a better strategy for reading from a file

The solution is basically to read a little bit of the file into memory, one step at a time. The current line will be in a variable, but the entire file will never all be memory.


In [87]:
f = open('/etc/passwd')

# if you iterate over a file object in Python, you get one line with each iteration
# each line is a string ending with '\n'
# in the final iteration, the value is `''`.

for one_line in f:
    print(one_line.strip())

##
# User Database
#
# Note that this file is consulted directly only when the system is running
# in single-user mode.  At other times this information is provided by
# Open Directory.
#
# See the opendirectoryd(8) man page for additional information about
# Open Directory.
##
nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false
_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false
_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false
_installassistant:*:25:25:Install Assistant:/var/empty:/usr/bin/false
_lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false
_postfix:*:27:27:Postfix Mail Server:/var/spool/postfix:/usr/bin/false
_scsd:*:31:31:Service Configuration Service:/var/empty:/usr/bin/false
_ces:*:32:32:Certificate Enrollment Service:/var/empty:/usr/bin/false
_appstore:*:33:33:

In [88]:
# Even better (I would say)

for one_line in open('/etc/passwd'):
    print(one_line.strip())

##
# User Database
#
# Note that this file is consulted directly only when the system is running
# in single-user mode.  At other times this information is provided by
# Open Directory.
#
# See the opendirectoryd(8) man page for additional information about
# Open Directory.
##
nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false
_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false
_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false
_installassistant:*:25:25:Install Assistant:/var/empty:/usr/bin/false
_lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false
_postfix:*:27:27:Postfix Mail Server:/var/spool/postfix:/usr/bin/false
_scsd:*:31:31:Service Configuration Service:/var/empty:/usr/bin/false
_ces:*:32:32:Certificate Enrollment Service:/var/empty:/usr/bin/false
_appstore:*:33:33:

In [93]:
# how can I print the username for each user on this system?
# every user gets one line in /etc/passwd
# each line is broken into fields, separated by :
# the username is in the first slot of each line

for one_line in open('/etc/passwd'):   # go through /etc/passwd one line at a time
    if one_line[0] == '#':
        continue

    fields = one_line.split(':')       # split each record (each line) into a list of strings, based on : as the delimiter
    print(fields[0])                   # display the first field

nobody
root
daemon
_uucp
_taskgated
_networkd
_installassistant
_lp
_postfix
_scsd
_ces
_appstore
_mcxalr
_appleevents
_geod
_devdocs
_sandbox
_mdnsresponder
_ard
_www
_eppc
_cvs
_svn
_mysql
_sshd
_qtss
_cyrus
_mailman
_appserver
_clamav
_amavisd
_jabber
_appowner
_windowserver
_spotlight
_tokend
_securityagent
_calendar
_teamsserver
_update_sharing
_installer
_atsserver
_ftp
_unknown
_softwareupdate
_coreaudiod
_screensaver
_locationd
_trustevaluationagent
_timezone
_lda
_cvmsroot
_usbmuxd
_dovecot
_dpaudio
_postgres
_krbtgt
_kadmin_admin
_kadmin_changepw
_devicemgr
_webauthserver
_netbios
_warmd
_dovenull
_netstatistics
_avbdeviced
_krb_krbtgt
_krb_kadmin
_krb_changepw
_krb_kerberos
_krb_anonymous
_assetcache
_coremediaiod
_launchservicesd
_iconservices
_distnote
_nsurlsessiond
_displaypolicyd
_astris
_krbfast
_gamecontrollerd
_mbsetupuser
_ondemand
_xserverdocs
_wwwproxy
_mobileasset
_findmydevice
_datadetectors
_captiveagent
_ctkd
_applepay
_hidd
_cmiodalassistants
_analyticsd
_fps

# Next up

- Practice reading from files
- A little writing to files, too

# To read from a file in the "best" Python way:

- Use `open` to open the file, which lets us read from it by default
- We iterate over that file object, giving us one line from the file with each iteration
- We treat that line as a string (since it's a string), and we can do any string thing we want with it

In [96]:
# let's count the vowels in a file

filename = '/etc/passwd'
counts = 0

for one_line in open(filename):       # get each line from the file
    for one_character in one_line:    # get each character from the line
        if one_character in 'aeiou':  # is the current character a vowel? 
            counts += 1               # add 1 to the count

print(counts)

2117


In [97]:
# let's count how many times *each* vowel appears in the file

filename = '/etc/passwd'
counts = {'a':0, 'e':0, 'i':0, 'o':0, 'u':0}

for one_line in open(filename):       # get each line from the file
    for one_character in one_line:    # get each character from the line
        if one_character in counts:   # is the current character a vowel, aka a key in the "counts" dict?
            counts[one_character] += 1               # add 1 to the count for this vowel

print(counts)

{'a': 546, 'e': 701, 'i': 387, 'o': 277, 'u': 206}


# Closing files

We're opening files, which means that we're asking the operating system to, on our behalf, create an "agent" that allows us to work with the file. 

When, then, is the file closed? When is this "agent" returned to the operating system?

1. We can invoke the `.close` method on a file. That is a nice thing to do!
2. We can also wait until the program ends, when the memory will be returned and all objects destroyed, including closing our file.

From my perspective, option 2 is totally fine if you're writing a small program that just reads from a file.

If you're writing a program that works with many files, then you might well need to close them. Same if you're writing to a file.

In [98]:
# if you try to open a file, and it isn't there, you'll get an error

open('asdfadfasfas')

FileNotFoundError: [Errno 2] No such file or directory: 'asdfadfasfas'

In [99]:
!ls *.txt

linux-etc-passwd.txt  mini-access-log.txt  nums.txt  shoe-data.txt  wcfile.txt


# Exercise: How many requests?

`mini-access-log.txt` is a short logfile from an actual Web server I used to run. Each line starts with the IP address of the computer that made a request from my Web site.

Write a program that iterates over the file, and prints the IP address at the start of each line.

In [102]:
for one_line in open('mini-access-log.txt'):
    print(one_line.split()[0])

67.218.116.165
66.249.71.65
65.55.106.183
65.55.106.183
66.249.71.65
66.249.71.65
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
65.55.106.131
65.55.106.131
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
65.55.106.186
65.55.106.186
66.249.65.12
66.249.65.12
66.249.65.12
74.52.245.146
74.52.245.146
66.249.65.43
66.249.65.43
66.249.65.43
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
65.55.207.25
65.55.207.25
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
65.55.207.94
65.55.207.94
66.249.65.12
65.55.207.71
66.249.65.12
66.249.65.12
66.249.65.12
98.242.170.241
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38


In [105]:
# CC

for one_line in open("mini-access-log.txt"):
    fields = one_line.split()      
    print(fields[0])  

67.218.116.165
66.249.71.65
65.55.106.183
65.55.106.183
66.249.71.65
66.249.71.65
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
65.55.106.131
65.55.106.131
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
65.55.106.186
65.55.106.186
66.249.65.12
66.249.65.12
66.249.65.12
74.52.245.146
74.52.245.146
66.249.65.43
66.249.65.43
66.249.65.43
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
65.55.207.25
65.55.207.25
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
66.249.65.12
65.55.207.94
65.55.207.94
66.249.65.12
65.55.207.71
66.249.65.12
66.249.65.12
66.249.65.12
98.242.170.241
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38
66.249.65.38


# Exercise: Dict from a file

1. Define `counts`, an empty dict
2. Iterate over the file, one line at a time, grabbing the IP address from each line
3. If the IP address is already in `counts` as a key, add 1 to its value
4. If the IP address is *not* in `counts` as a key, then add it as a key, and have a value of 1

This will have the effect of counting how often each IP address appears in the file.

Iterate over `counts.items()`, printing each address and how often it appears.

In [107]:
counts = {}   

for one_line in open('mini-access-log.txt'):
    ip_address = one_line.split()[0] 

    if ip_address in counts:
        counts[ip_address] += 1    # if we've seen this IP address before, add 1 to its count
    else:
        counts[ip_address] = 1     # otherwise, add a new key-value pair, with a value of 1

for key, value in counts.items():
    print(f'{key}:\t{value}')  # \t == tab

67.218.116.165:	2
66.249.71.65:	3
65.55.106.183:	2
66.249.65.12:	32
65.55.106.131:	2
65.55.106.186:	2
74.52.245.146:	2
66.249.65.43:	3
65.55.207.25:	2
65.55.207.94:	2
65.55.207.71:	1
98.242.170.241:	1
66.249.65.38:	100
65.55.207.126:	2
82.34.9.20:	2
65.55.106.155:	2
65.55.207.77:	2
208.80.193.28:	1
89.248.172.58:	22
67.195.112.35:	16
65.55.207.50:	3
65.55.215.75:	2


In [113]:
# KM

counts = {}

for one_line in open('mini-access-log.txt'):  
     ip = one_line.split()[0]

     if ip in counts:
         counts[ip] += 1
     else:
         counts[ip] = 1

for k, v in counts.items():
     print(f'{k}: {v}') 

67.218.116.165: 2
66.249.71.65: 3
65.55.106.183: 2
66.249.65.12: 32
65.55.106.131: 2
65.55.106.186: 2
74.52.245.146: 2
66.249.65.43: 3
65.55.207.25: 2
65.55.207.94: 2
65.55.207.71: 1
98.242.170.241: 1
66.249.65.38: 100
65.55.207.126: 2
82.34.9.20: 2
65.55.106.155: 2
65.55.207.77: 2
208.80.193.28: 1
89.248.172.58: 22
67.195.112.35: 16
65.55.207.50: 3
65.55.215.75: 2


In [114]:
# NB

counts = {}
for each_line in open('mini-access-log.txt'):
    ip_addr = each_line.split()[0]
    if ip_addr in counts:
        counts[ip_addr] += 1
    else:
        counts[ip_addr] = 1

print(counts)

{'67.218.116.165': 2, '66.249.71.65': 3, '65.55.106.183': 2, '66.249.65.12': 32, '65.55.106.131': 2, '65.55.106.186': 2, '74.52.245.146': 2, '66.249.65.43': 3, '65.55.207.25': 2, '65.55.207.94': 2, '65.55.207.71': 1, '98.242.170.241': 1, '66.249.65.38': 100, '65.55.207.126': 2, '82.34.9.20': 2, '65.55.106.155': 2, '65.55.207.77': 2, '208.80.193.28': 1, '89.248.172.58': 22, '67.195.112.35': 16, '65.55.207.50': 3, '65.55.215.75': 2}


In [115]:
'a' + 5  # will this work?

TypeError: can only concatenate str (not "int") to str

In [116]:
'a' * 5  # will this work?

'aaaaa'

In [117]:
# rewrite (slightly) my solution

counts = {}   

for one_line in open('mini-access-log.txt'):
    ip_address = one_line.split()[0] 

    if ip_address in counts:
        counts[ip_address] += 1    # if we've seen this IP address before, add 1 to its count
    else:
        counts[ip_address] = 1     # otherwise, add a new key-value pair, with a value of 1

for key, value in counts.items():
    print(f'{key}:\t{value * "x"}')  # \t == tab

67.218.116.165:	xx
66.249.71.65:	xxx
65.55.106.183:	xx
66.249.65.12:	xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
65.55.106.131:	xx
65.55.106.186:	xx
74.52.245.146:	xx
66.249.65.43:	xxx
65.55.207.25:	xx
65.55.207.94:	xx
65.55.207.71:	x
98.242.170.241:	x
66.249.65.38:	xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
65.55.207.126:	xx
82.34.9.20:	xx
65.55.106.155:	xx
65.55.207.77:	xx
208.80.193.28:	x
89.248.172.58:	xxxxxxxxxxxxxxxxxxxxxx
67.195.112.35:	xxxxxxxxxxxxxxxx
65.55.207.50:	xxx
65.55.215.75:	xx


# Writing to files

If we want to write to a file, we need to `open` it, just like for reading. *BUT* we need to tell Python that we'll want to write to the file, rather than read from it.

By default, invoking `open` reads from a file. We can, however, pass a second argument, `'w'`, indicating that we want to write to a file, instead:

    open(filename, 'w')

WARNING:

If you invoke `open` with 'w' as the second arguments, one of two things will happen:

1. You'll get an error, saying that you cannot open the file for writing -- the directory doesn't exist, you don't have permissions, etc.
2. The file *is* opened for writing, and now exists with **zero content in it**. If you open a file for writing, and the file already exists, you remove any and all content in the file.

In [118]:
f = open('myfile.txt', 'w')

In [119]:
f.write('hello\n')   # write to the file -- notice we need to add \n at the end, which f.write doesn't do automatically

6

In [120]:
f.write('hello again\n')  # notice the number we get back, that's the number of characters written

12

In [121]:
# what does the file contain? NOTHING AT ALL!
# until we close the file, we canot be sure what was written to it, and what is still in memory,
# waiting to be written to it

# the computer puts our writing into a "buffer" in memory, and "flushes" the buffer when we tell it to,
# when it gets full, or when the file is closed.

f.close()

# Enter `with`

The `with` keyword in Python isn't only for working with files, but it's *mostly* for working with files. It basically says: When we finish with the (indented) block of code, we'll ask the value mentioned on the `with` line to do its normal end-of-`with` thing.

In the case of a file, the "end-of-`with`" thing is to flush + close itself.

In [122]:
with open('myfile.txt', 'w') as f:    # open the file and assign it to the variable "f"
    f.write('hello 1234\n')
    f.write('hello 56789\n')

    # here, at the end of the with block, we flush + close the file

In [123]:
!cat myfile.txt

hello 1234
hello 56789


In [None]:

ip_count = {}


with open("mini-access-log.txt", "r") as log_file:
    for one_line in log_file:
        fields = one_line.split()  
        if len(fields) > 0: 
            ip = fields[0]
            if ip in ip_count:
                ip_count[ip] += 1  
            else:
                ip_count[ip] = 1 


print("IP Address Counts:")
for ip, count in ip_count.items():
    print(f"{ip}: {count} times")


In [124]:
open('asdfsafas')

FileNotFoundError: [Errno 2] No such file or directory: 'asdfsafas'