# Agenda, week 3:

1. Q&A
2. Dictionaries
    - What they are?
    - How to create them
    - How to work with them
    - How to iterate over dicts
    - How dictionaries are implemented behind the scenes
    - Three paradigms for dictionary use
2. Files
    - How to work with files - -opening them and reading from them
    - How to take data in a file and turn it into Python data structures
    - How to write to files
    - The `with` construct, and why it's important when writing to files

# Where do we stand?

- Python uses lots of values, and each value has a different type
- We can assign these values to variables, and then reuse them
- The data structures we've used so far have been:
    - Integers and floats (numbers)
    - Strings (for text)
    - Lists and tuples (for sequences of values)

Many times, we don't want one piece of data. Many times, we want multiple pieces of data together, to be associated with one another.

Dictionaries are a very high speed, easy to work with, and flexible data structure that takes care of one of the most important and common ways that we want to combine data, and that is a name-value pair or a key-value pair.

# What is a dictionary?

What we call a dictionary in Python (or a `dict`) is not new to Python. It's in many programming langauges, but usually with other names:

- Hash tables
- Hashes
- Hash maps
- Maps
- Key-value stores
- Name-value stores
- Associative arrays

The basic idea is that we have *pairs* of data, not just individual values. In a dict, we have control over not only the values, but also the keys (i.e., the indexes) that we use to retrieve them. In this way, we can have a key-value pair that isn't a random integer that just happened to be assigned to it. We can control it.



In [2]:
# use curly braces to create a dict
# each key-value pair in a dict has key:value syntax
# each pair is separated from other pairs with ,

d = {}   # empty dictionary

In [3]:
d = {'a':10, 'b':20, 'c':30}  # dict with three key-value pairs

In [4]:
# how many key-value pairs are in d?

len(d)    # how many pairs

3

# Rules for dict keys and values

### Keys

- They can be anything at all, so long as they are immutable -- which basically means numbers and strings.
- In a given dict, the keys must be unique.
- Every key has a value, and every value has a key.

### Values

- Every value in a dict can be absolutely anything you want -- any type, any repetition, anything at all

In [5]:
# how can I retrieve a value via a key?

d['a']   # just like I retrieve from a list/tuple/string, I retrieve from a dict with []

10

In [6]:
d['b']

20

In [7]:
d['c']

30

In [8]:
# if I change the key's string even a little...

d['a ']

KeyError: 'a '

In [9]:
# how can I avoid that? How can I make sure that if I'm asking for a key, it exists?
# I can use the "in" operator, which returns True if a key is in the dict

'a' in d   # is 'a' a key in the dict d?

True

In [10]:
'a ' in d

False

You can get a value from the key. But you cannot get the key back from the value!

That's partly because of how dicts are designed, and partly because values can repeat, but keys cannot.


In [11]:
# can I use variables as the keys?
# of course!

k = 'a'

k in d

True

In [12]:
d[k]

10

# Where are dicts useful?

There are many places where we have key-value pairs in Python. Consider:

- A dict whose keys are the names of months (as strings) and whose values are the numbers (1-12)
- A dict whose keys are the numbers of months (1-12) and whose values are the names (as strings)
- A dict whose keys are user ID numbers and whose values are usernames
- A dict whose keys are user ID numbers and whose values are tuples with user info

# Exercise: Restaurant

1. Define a dict, `menu`, in which the keys are strings -- items on a menu at a restaurant. The associated values should be the prices for those menu items.
2. Define `total` to be 0.
3. Ask the user, repeatedly, to order something from the menu.
    - If the user entered an empty string, that indicates they no longer want to order. Break them out of this loop, and print `total`.
    - If the user entered a string, and that string is something on our menu (i.e., a key in our dict), then add the price to `total`,  and print the `total` on the screen, along with the item and its price.
    - If the string is *not* a key in `menu`, then scold the user and let them order something else.
4. Print `total`

Example:

    Order: sandwich
    sandwich is 15, total is 15
    Order: tea
    tea is 7, total is 22
    Order: elephant
    Sorry, we're fresh out of elephant today!
    Order: [ENTER]
    Your total is 22

In [13]:
menu = {'sandwich':15, 'tea':7, 'apple':4, 'cake':8}
total = 0

while True:
    order = input('Order: ').strip()

    if order == '':   # the user indicated that we want to stop asking
        break

    if order in menu:  
        price = menu[order]    # get the value for the key "order" in menu
        total += price
        print(f'{order} is {price}, total is now {total}')
    else:
        print(f'Sorry, we are fresh out of {order} today!')

print(total)        

Order:  sandwich


sandwich is 15, total is now 15


Order:  tea


tea is 7, total is now 22


Order:  hot dog


Sorry, we are fresh out of hot dog today!


Order:  


22


In [14]:
order = 'sandwich'
print(menu[order])

15


# Are dictionaries mutable?

Yes! Absolutely!

What does it mean for something to be mutable? When it comes to a list, we found:

- We can replace existing values
- We can add new elements to a list
- We can remove existing elements from a list

For a dict to be mutable, we would need to be able to:

- Replace existing values for a given key
- Add new key-value pairs
- Remove existing key-value pairs

We can do all three of these!

In [15]:
d = {'a':10, 'b':20, 'c':30}

# how can I replace an existing value in a dict?
# answer: assign a new value to the key

d['b'] = 999
d

{'a': 10, 'b': 999, 'c': 30}

In [16]:
# I want to increment the value of 'c' by 1

d['c'] += 1   # the same as saying d['c'] = d['c'] + 1
d

{'a': 10, 'b': 999, 'c': 31}

In [17]:
# how can I add a new key-value pair to my dict?
# answer: assign a new value to the key
# it's exactly the same syntax as we had for updating a value!

# if the key already exists, then we replace its value. But if the key is new, then we add a new key-value pair

d

{'a': 10, 'b': 999, 'c': 31}

In [18]:
d['x'] = 888
d

{'a': 10, 'b': 999, 'c': 31, 'x': 888}

In [19]:
d['x'] += 1
d

{'a': 10, 'b': 999, 'c': 31, 'x': 889}

In [20]:
# what about removing key-value pairs?

d.pop('x')  # this means: remove the key-value pair with a key 'x', and return the value associated with it

889

In [21]:
d

{'a': 10, 'b': 999, 'c': 31}

# Accumulating with dicts

We've seen in previous weeks that we can use lists to accumulate values over time. We can use dicts in similar ways. Very often, we'll start a program with an initial dict whose keys are set, and will not change, but whose values are empty -- 0, an empty list, or the like. In this way, our dict can over time accumulate data, but it can do that in a more sophisticated way than a list.

# Exercise: Vowels, digits, and others (dict edition)

1. Define a dict, `counts`, with three keys -- `vowels`, `digits`, and `others`. Their values should all be 0.
2. Ask the user to enter a string.
3. Go through the string, one character at a time.
    - If the character is a vowel, add 1 to the count for `vowels`
    - If the character is a digit, add 1 to the count for `digits`
    - If the character is neither, add 1 to the count for `others`
4. Print `counts`

In [28]:
counts = {'vowels':0,
          'digits':0,
          'others':0}
vowels = 'aeiou'

text = input('Enter text: ').strip()

for one_character in text.lower():
    if one_character in vowels:     # if one_character is a vowel...
        counts['vowels'] += 1       # ... update the value of counts['vowels'] with 1 + the current value
    elif one_character.isdigit():   # if one_character is a digit...
        counts['digits'] += 1       #  .... update the value of counts['digits'] with 1 + the current value
    else:
        counts['others'] += 1       # in all other cases, add 1 to counts['others']

print(counts)

Enter text:  THIS IS GREAT


{'vowels': 4, 'digits': 0, 'others': 9}


In [27]:
# KM

counts = {'vowels':0, 'digits':0, 'others':0}

word = input('Enter the character: ').strip()

for one_character in word:
    if one_character in 'aeiou':
        counts['vowels'] +=1
    
    elif one_character.isdigit():
        counts['digits'] +=1
    
    else:
        counts['others'] +=1

print(counts)

Enter the character:  hello!! 123


{'vowels': 2, 'digits': 3, 'others': 6}


# Next up

1. Accumulating in another way in our dicts
2. Accumulating the unknown
3. Iterating over our dicts

# Paradigms we've seen so far

1. Set up a dict at the start of your program, and treat it as a read-only database.
2. Set up a dict with keys and initial values. Over the course of the program, update the values, but don't change/add/remove any of the keys.
3. Set up an empty dict, and then accumulate both keys and values.

# Exercise: Vowels, digits, and others (dict edition, part 2)

1. Define a dict, `counts`, with three keys -- `vowels`, `digits`, and `others`. Their values should all be `[]`.
2. Ask the user to enter a string.
3. Go through the string, one character at a time.
    - If the character is a vowel, add `one_character` to the list that is the value for `vowels`
    - If the character is a digit, add `one_character` to the list that is the value for `digits`
    - If the character is neither, add `one_character` to the list that is the value for `others`
4. Print `counts`

In [29]:
counts = {'vowels': [ ],
          'digits': [ ],
          'others': [ ]}

text = input('Enter text: ').strip()

for one_character in text:
    if one_character in vowels:    
        counts['vowels'].append(one_character)
    elif one_character.isdigit(): 
        counts['digits'].append(one_character)
    else:
        counts['others'].append(one_character)

print(counts)        

Enter text:  hello!! 123


{'vowels': ['e', 'o'], 'digits': ['1', '2', '3'], 'others': ['h', 'l', 'l', '!', '!', ' ']}


In [None]:
# U1

counts = {'digits' : [], 'vowels' : [], 'others': []}  

characters = input ('Enter a word: ').strip()

for a in characters:
    if a in 'aeiou':
        counts['vowels'] +=a
    if a.isdigit():
        counts['digits'] +=a
    else:
        counts['others'] +=a
print(counts)

# Two different ways to add items to a list:

1. `list.append`, which takes whatever you give it, of any type, and adds it to the end of a list
2. `+=`, which takes an iterable value, runs a `for` loop on it, and invokes `list.append` on each element



In [30]:
mylist = [10, 20, 30]

mylist.append('hello')
mylist

[10, 20, 30, 'hello']

In [31]:
mylist += 'hello'
mylist

[10, 20, 30, 'hello', 'h', 'e', 'l', 'l', 'o']

# Accumulating the unknown

What if we want to have a dict that counts, or accumulates, data of various sorts? But we don't know what keys or values we'll get?


In [33]:
# Example: Let's count how often each character appears in a string
# I want a dict whose keys are the characters and whose values are the counts

# what will this dict be initialized with? 
# we can start with an empty dict. 
# When we encounter a character:
# - if we saw it before, add 1 to the count
# - if it's new, then add both the key and the value (a count of 1)

counts = {}   # empty dict

text = input('Enter text: ').strip()

for one_character in text:
    counts[one_character] += 1    # our ideal world

Enter text:  hello


KeyError: 'h'

In [34]:

counts = {}   # empty dict

text = input('Enter text: ').strip()

for one_character in text:
    if one_character in counts:   # if we've already seen this character (key) before:
        counts[one_character] += 1    # add 1 to it
    else:
        counts[one_character] = 1  # otherwise, add this new key-value pair

Enter text:  hello


In [35]:
counts

{'h': 1, 'e': 1, 'l': 2, 'o': 1}

# Exercise: Rainfall

The goal is to ask the user, repeatedly, for city names and then for the amount of rain that fell in that city. (We'll ask in two separate questions.) Over time, we'll accumulate data about rainfall in different cities. We'll do so in a dict whose keys are city names and whose values are integers, the mm rain that fell.

1. Create an empty dict, `rainfall`.
2. Ask the user for the name of the city about which we're reporting.
    - If we get an empty city name, stop asking, break out of the loop
3. If we did get a city name, ask a second question: What was the rainfall in that city?
    - Assume we get an integer here, if it makes the code much cleaner
4. Take the city name and rainfall amount:
    - if we have seen this city before, add the new rainfall to the existing amount
    - If this is a new city, then add the new key-value pair to the dict
5. Print `rainfall` for the user

Example:

    city: a
    rain: 5
    city: b
    rain: 4
    city: a
    rain: 3
    city: [ENTER]
    {'a':8, 'b':4}

In [36]:
s = "he's nice"

In [37]:
s = 'He said, "Hello"'

In [38]:
rainfall = {}

while True:
    city_name = input('City: ').strip()

    if city_name == '':
        break

    mm_rain = input('Rain: ').strip()
    mm_rain = int(mm_rain)

    if city_name in rainfall:
        rainfall[city_name] += mm_rain
    else:
        rainfall[city_name] = mm_rain

print(rainfall)        

City:  a
Rain:  5
City:  b
Rain:  4
City:  a
Rain:  3
City:  


{'a': 8, 'b': 4}


In [39]:
# what if we want to accumulate rainfall information for each city -- not sum it, 
# but keep each of the lists of mm_rain?

rainfall = {}

while True:
    city_name = input('City: ').strip()

    if city_name == '':
        break

    mm_rain = input('Rain: ').strip()
    mm_rain = int(mm_rain)

    if city_name in rainfall:
        rainfall[city_name].append(mm_rain)
    else:
        rainfall[city_name] = [mm_rain]   # list containing 1 integer, mm_rain

print(rainfall)        

City:  a
Rain:  5
City:  b
Rain:  4
City:  c
Rain:  3
City:  a
Rain:  2
City:  b
Rain:  1
City:  c
Rain:  5
City:  


{'a': [5, 2], 'b': [4, 1], 'c': [3, 5]}


# Next up

1. Iteration on dicts
2. How are dicts implemented?
3. Files!

Download the zipfile from https://files.lerner.co.il/exercise-files.zip 

In [40]:
d = {'a':10, 'b':20, 'c':30}

In [41]:
# the keys in this dict are 'a', 'b', and 'c'

# PRECISELY those keys -- no variations whatsoever are allowed

In [42]:
d['A']

KeyError: 'A'

In [45]:
d = {'Las Vegas': 3}

d['Las  Vegas']

KeyError: 'Las  Vegas'

# Iteration and dicts

If we want, we can iterate over a variety of different Python data structures:

- Strings give us their characters, one at a time
- Lists and tuples give us their elements, one at a time

What about dicts?

In [46]:
d = {'a':10, 'b':20, 'c':30}

In [48]:
# iterating over a dict gives you the keys

for one_item in d:
    print(one_item)

a
b
c


In [50]:
d = {}
d['c'] = 30
d['a'] = 10
d['b'] = 20

for one_item in d:
    print(one_item)

c
a
b


In [51]:
# if I want, I can then iterate over a dict and print the keys and value!

for key in d:
    print(f'{key}: {d[key]}')

c: 30
a: 10
b: 20


In [52]:
# this is (I think) a bit ugly

# meanwhile, there are some other methods we can use
# for example: we can invoke dict.keys(), which returns the keys of a dict

d.keys()

dict_keys(['c', 'a', 'b'])

In [53]:
# can I use "in" on d.keys()?

'a' in d.keys()

True

In [54]:
# can I use a for loop on d.keys()?

for key in d.keys():
    print(f'{key}: {d[key]}')

c: 30
a: 10
b: 20


You really should avoid using `d.keys()` unless you absolutely need to. There's almost never any reason to use it -- you can just search on the dict itself and iterate over the dict itself.

In [55]:
# what about values?
# there is a dict.values method, which returns the values

d.values()

dict_values([30, 10, 20])

In [56]:
10 in d.values()

True

In [57]:
# remember, normally "in" only works on the keys, not the values!

In [58]:
# my favorite way to iterate over a dict is actually a separate method called dict.items
# this returns a (key, value) tuple for each pair in the dict

for one_item in d.items():
    print(one_item)

('c', 30)
('a', 10)
('b', 20)


In [59]:
# let's use tuple unpacking to pull apart one_item

for one_item in d.items():
    key, value = one_item
    print(f'{key}: {value}')

c: 30
a: 10
b: 20


In [60]:
for key, value in d.items():
    print(f'{key}: {value}')

c: 30
a: 10
b: 20


In [61]:
rainfall

{'a': [5, 2], 'b': [4, 1], 'c': [3, 5]}

In [62]:
for key, value in rainfall.items():
    print(f'{key}: {value}')

a: [5, 2]
b: [4, 1]
c: [3, 5]


In [63]:
for key, value in rainfall.items():
    print(f'{key}: {value}, total is {sum(value)}, mean is {sum(value)/len(value)}')

a: [5, 2], total is 7, mean is 3.5
b: [4, 1], total is 5, mean is 2.5
c: [3, 5], total is 8, mean is 4.0


# Ordering in dicts

First and foremost: If you're thinking about the order of key-value pairs in a dict, you are probably thinking about things wrong. Don't think about this! Just retrieve the values based on the keys.

That said: The key-value pairs are stored in a dict in the order that we stored them there. When we set `d = {'a':10, 'b':20, 'c':30}`, we will get the keys in order `a, b, c` and the values in order `10, 20, 30`. 

But I earlier did something else:

```python
d = {}
d['c'] = 30
d['a'] = 10
d['b'] = 20
```

That means that when I iterate over the dict, I'll get `c, a, b` and `30, 10, 20`.

# Exercise: Odds and evens

1. Create a dict in which the keys are strings, `'odds'` and `'evens'`, and the values are empty lists.
2. Ask the user to enter a string containing integers separated by spaces.
3. Iterate over each int:
    - If it's not an int, then scold the user
    - If it is even, then append to the list for `evens`
    - If it's odd, then append to the list for `odds`
4. Iterate over the dict, printing each key and each value, plus the mean of all odd/even values.

In [64]:
counts = {'odds': [],
         'evens': []}

s = input('Enter numbers: ').strip()

for one_number in s.split():
    if not one_number.isdigit():
        print(f'{one_number} is not numeric; ignoring')
        continue

    n = int(one_number)

    if n % 2 == 1:        # if the remainder from dividing by 2 is 1, it's odd!
        counts['odds'].append(n)   # append it to the list at counts['odds']
    else:
        counts['evens'].append(n)

for key, value in counts.items():
    print(f'{key}: {value}, {sum(value) / len(value)}')

Enter numbers:  10 15 11 23 hello 31 36


hello is not numeric; ignoring
odds: [15, 11, 23, 31], 20.0
evens: [10, 36], 23.0


# How do dicts work?

A dictionary is a 'hash table,' meaning that the pair's storage location in Python's memory depends on the result of a function, called `hash`, and running it on the key.

Whatever we get back from `hash(key)` determines where the key-value pair is stored.

This explained a *lot*!

- Keys need to be unqiue, because we can't have two pairs in the same place
- Keys need to be immutable, but if we modify a key, then the result of `hash(key)` will be different
- Even the slightest differences between keys can result in not finding the key-value pair, because `hash(s)` and `hash(t)` are going to be wildly different.
- Searching with `in` in a dict, or retrieving with `[]` from a dict is SUPER DUPER fast.

# Files

We all use files all of the time.

But as programmers, what is a file? The answer is: a permanent record of data structures we had in memory. When we save a file, we convert data structures to a written format. And when we load a file, we turn that written format back into data structures.

If we want to read from a file, meaning that we want to turn it into data structures in Python, we'll somehow need to tell the OS/computer that we want to read from a file?

- We indicate to the OS that we want to read from a file
- It either gives us an error, or returns a "file handle" or "file object"
- That object is our agent, allowing us to work with the file

In order to do this in Python:

- Invoke the `open` function to open a file, handing it a filename (a string)
- That returns a file object
- On that file object, we can run methods that return the contents of the file

# Filenames

A filename is a string telling the computer where the file is that we want to work with

- If it's just a regular string without any `/` characters (or `\` on Windows) then the file should be in the current directory
- If it starts with non-`/` characters but has one or more inside the string, then it's a "relative" filename, meaning that it's underneath the current folder/directory
- If it starts with `/`, then that means we have an "absolute filename," telling the OS where to look relative to the start of all disks

In [65]:
# if I want to open a file for reading, I can just use "open"
# I like to use the Unix file /etc/passwd, which is on every Unix system (including Mac)

f = open('/etc/passwd')   # could be more explicit, saying open('/etc/passwd', 'r')

In [66]:
f

<_io.TextIOWrapper name='/etc/passwd' mode='r' encoding='UTF-8'>

In [67]:
# how can I now read from the file?
# option 1: f.read()
# the good news? This returns the file's contents as a string
# the bad news? This returns the file's contents as a string (even if the file is 20 TB in size!)

s = f.read()
print(s)

##
# User Database
# 
# Note that this file is consulted directly only when the system is running
# in single-user mode.  At other times this information is provided by
# Open Directory.
#
# See the opendirectoryd(8) man page for additional information about
# Open Directory.
##
nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false
_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false
_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false
_installassistant:*:25:25:Install Assistant:/var/empty:/usr/bin/false
_lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false
_postfix:*:27:27:Postfix Mail Server:/var/spool/postfix:/usr/bin/false
_scsd:*:31:31:Service Configuration Service:/var/empty:/usr/bin/false
_ces:*:32:32:Certificate Enrollment Service:/var/empty:/usr/bin/false
_appstore:*:33:33

In [68]:
# option 2: f.read(n)
# this returns a string of the next n characters from the file\
# the good news? We have control!
# the bad news: this has nothing to do with the lines of the file, which are often important
# other bad news: I have to keep reading until I get the empty string, indicating I'm done

f = open('/etc/passwd')
while True:
    s = f.read(100)

    if s == '':  # didn't get anything? break
        break

    print(s, end='')  # don't add a newline after printing


##
# User Database
# 
# Note that this file is consulted directly only when the system is running
# in single-user mode.  At other times this information is provided by
# Open Directory.
#
# See the opendirectoryd(8) man page for additional information about
# Open Directory.
##
nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false
_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false
_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false
_installassistant:*:25:25:Install Assistant:/var/empty:/usr/bin/false
_lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false
_postfix:*:27:27:Postfix Mail Server:/var/spool/postfix:/usr/bin/false
_scsd:*:31:31:Service Configuration Service:/var/empty:/usr/bin/false
_ces:*:32:32:Certificate Enrollment Service:/var/empty:/usr/bin/false
_appstore:*:33:33

In [70]:
# option 3: iterate over the file object
# with each iteration, we'll get a string -- up to and including the next \n character
# we're guaranteed not to use too much memory
# and the lines of the file usually make sense, too.

for one_line in open('/etc/passwd'):
    print(one_line.strip())

##
# User Database
#
# Note that this file is consulted directly only when the system is running
# in single-user mode.  At other times this information is provided by
# Open Directory.
#
# See the opendirectoryd(8) man page for additional information about
# Open Directory.
##
nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false
_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false
_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false
_installassistant:*:25:25:Install Assistant:/var/empty:/usr/bin/false
_lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false
_postfix:*:27:27:Postfix Mail Server:/var/spool/postfix:/usr/bin/false
_scsd:*:31:31:Service Configuration Service:/var/empty:/usr/bin/false
_ces:*:32:32:Certificate Enrollment Service:/var/empty:/usr/bin/false
_appstore:*:33:33:

In [73]:
# I want to print all of the usernames in /etc/passwd

for one_line in open('/etc/passwd'):
    if one_line[0] == '#':
        continue
    print(one_line.split(':')[0])

nobody
root
daemon
_uucp
_taskgated
_networkd
_installassistant
_lp
_postfix
_scsd
_ces
_appstore
_mcxalr
_appleevents
_geod
_devdocs
_sandbox
_mdnsresponder
_ard
_www
_eppc
_cvs
_svn
_mysql
_sshd
_qtss
_cyrus
_mailman
_appserver
_clamav
_amavisd
_jabber
_appowner
_windowserver
_spotlight
_tokend
_securityagent
_calendar
_teamsserver
_update_sharing
_installer
_atsserver
_ftp
_unknown
_softwareupdate
_coreaudiod
_screensaver
_locationd
_trustevaluationagent
_timezone
_lda
_cvmsroot
_usbmuxd
_dovecot
_dpaudio
_postgres
_krbtgt
_kadmin_admin
_kadmin_changepw
_devicemgr
_webauthserver
_netbios
_warmd
_dovenull
_netstatistics
_avbdeviced
_krb_krbtgt
_krb_kadmin
_krb_changepw
_krb_kerberos
_krb_anonymous
_assetcache
_coremediaiod
_launchservicesd
_iconservices
_distnote
_nsurlsessiond
_displaypolicyd
_astris
_krbfast
_gamecontrollerd
_mbsetupuser
_ondemand
_xserverdocs
_wwwproxy
_mobileasset
_findmydevice
_datadetectors
_captiveagent
_ctkd
_applepay
_hidd
_cmiodalassistants
_analyticsd
_fps

# Exercise: IP addresses

1. Create a dict, `counts`. The keys will be strings (IP addresses), and the values will be integers, the number of times that each IP address made a request to our server in this file
2. Iterate over this file, one line at a time. Grab the IP address from the front of each line:
    - If this is the first time we're seeing an IP address, add the address and 1 as a new key-value pair
    - Otherwise, add 1 to the existing value
3. Iterate over the keys and values in `counts`, and print each IP address and the number of times it accessed our server.

In [74]:
# Execute the Unix "head" command on mini-access-log.txt

!head mini-access-log.txt

67.218.116.165 - - [30/Jan/2010:00:03:18 +0200] "GET /robots.txt HTTP/1.0" 200 99 "-" "Mozilla/5.0 (Twiceler-0.9 http://www.cuil.com/twiceler/robot.html)"
66.249.71.65 - - [30/Jan/2010:00:12:06 +0200] "GET /browse/one_node/1557 HTTP/1.1" 200 39208 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
65.55.106.183 - - [30/Jan/2010:01:29:23 +0200] "GET /robots.txt HTTP/1.1" 200 99 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.183 - - [30/Jan/2010:01:30:06 +0200] "GET /browse/one_model/2162 HTTP/1.1" 200 2181 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
66.249.71.65 - - [30/Jan/2010:02:07:14 +0200] "GET /browse/browse_applet_tab/2593 HTTP/1.1" 200 10305 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.65 - - [30/Jan/2010:02:10:39 +0200] "GET /browse/browse_files_tab/2499?tab=true HTTP/1.1" 200 446 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.12 - - [30/J

In [78]:
counts = {}

for one_line in open('mini-access-log.txt'):
    ip_address = one_line.split()[0]   # grab the IP address

    if ip_address in counts:  # if it's already a key in counts
        counts[ip_address] += 1
    else:
        counts[ip_address] = 1

for key, value in counts.items():
    print(f'{key}\t{value}')

67.218.116.165	2
66.249.71.65	3
65.55.106.183	2
66.249.65.12	32
65.55.106.131	2
65.55.106.186	2
74.52.245.146	2
66.249.65.43	3
65.55.207.25	2
65.55.207.94	2
65.55.207.71	1
98.242.170.241	1
66.249.65.38	100
65.55.207.126	2
82.34.9.20	2
65.55.106.155	2
65.55.207.77	2
208.80.193.28	1
89.248.172.58	22
67.195.112.35	16
65.55.207.50	3
65.55.215.75	2


In [79]:
# you might remember

'a' + 3

TypeError: can only concatenate str (not "int") to str

In [80]:
'a' * 3

'aaa'

In [81]:
# slight variation on our program

counts = {}

for one_line in open('mini-access-log.txt'):
    ip_address = one_line.split()[0]   # grab the IP address

    if ip_address in counts:  # if it's already a key in counts
        counts[ip_address] += 1
    else:
        counts[ip_address] = 1

for key, value in counts.items():
    print(f'{key}\t{value * "x"}')

67.218.116.165	xx
66.249.71.65	xxx
65.55.106.183	xx
66.249.65.12	xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
65.55.106.131	xx
65.55.106.186	xx
74.52.245.146	xx
66.249.65.43	xxx
65.55.207.25	xx
65.55.207.94	xx
65.55.207.71	x
98.242.170.241	x
66.249.65.38	xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
65.55.207.126	xx
82.34.9.20	xx
65.55.106.155	xx
65.55.207.77	xx
208.80.193.28	x
89.248.172.58	xxxxxxxxxxxxxxxxxxxxxx
67.195.112.35	xxxxxxxxxxxxxxxx
65.55.207.50	xxx
65.55.215.75	xx


# Next up

1. More iterating over files (and what we can do with them)
2. Writing to files
3. The `with` construct, and how it helps us

In [82]:
!cat nums.txt

5
	10     
	20
  	3
		   	20        

 25


# Exercise: Sum numbers

1. Set the variable `total` to be 0
2. Iterate over `nums.txt`
3. If there is a number on that line, add it to `total`
4. Print `total`

In [87]:
total = 0

for one_line in open('nums.txt'):

    # ignore lines that only contain whitespace
    # if one_line.strip() == '':
    #     continue

    # ignore lines unless after stripping they are isdigit
    if not one_line.strip().isdigit():
        continue

    total += int(one_line.strip())

print(total)    

83


I still can not operating with files, I've  got error: > 2627 self.user_ns['_exit_code'] = system(self.var_expand(cmd, depth=1))

File /lib/python3.13/site-packages/IPython/utils/_process_emscripten.py:10, in system(cmd)
      9 def system(cmd):
---> 10     raise OSError("Not available")

OSError: Not available 

# Writing to files

If we want to write to a file, we need to indicate this when we `open` the file. By default, when we open a file, it's open for *reading*.

If you want to open a file for writing, you need to pass `'w'` as the second argument to open:

    f = open('myfile.txt', 'w')

There are two possible results for invoking `open` with `'w'`:

1. The file is opened for writing, and you can do so via the `f` variable. Also, any previous contents in that same file are now gone; the file contains 0 bytes/characters.
2. We get an error indicating that we couldn't open the file for writing.



In [88]:
f = open('myfile.txt', 'w')

f.write('this is the first line\n')  # f.write does *not* automatically end what we write with a newline!
f.write('and this is the second\n')
f.write('and this is the third!\n')

23

In [89]:
# I'll use the Unix command "cat" to view the file's contents

!cat myfile.txt

What's going on?

When we use `f.write`, we think that we're writing to the file. But really, we're asking the OS to write to the file, and it usually says "no." 

Instead, the OS usually writes values to a "buffer" in memory. That buffer is very fast, and when it fills up, we then "flush" it to the disk all at once.

We can manually flush the buffer with the `f.flush()` method.

Or we can tell the OS that we're done with this file, and we can invoke the `f.close()` method, which flushes before closing.

Or we can wait until our program (or Jupyter) exits, at which point all files are flushed and closed.

In [90]:
f.close()

In [91]:
!cat myfile.txt

this is the first line
and this is the second
and this is the third!


# What does this mean?

1. If you are writing to a file, then you should make sure that the file's contents are flushed and/or closed on a regular basis.
2. Another way, though, is to have Python automatically flush and close the file when you're done with it.

In [92]:
# this is the "with" construct

with open('myfile.txt', 'w') as f:    # here, we're assigning the open file to f -- just like before!
    f.write('**this is the first line\n')  # f.write does *not* automatically end what we write with a newline!
    f.write('**and this is the second\n')
    f.write('**and this is the third!\n')    

    # just before finishing the with block, Python will automatically flush + close our file

In [93]:
!cat myfile.txt

**this is the first line
**and this is the second
**and this is the third!


# Exercise: Dict to config

1. Define a simple dictionary with 3-5 key-value pairs.
2. Write the dict's key-value pairs to a text file, with one pair per line and the `=` between the two.

If my dict is

    {'a':10, 'b':20, 'c':30, 'd':40, 'e':50}

the file should look like

    a=10
    b=20
    c=30
    d=40
    e=50


In [96]:
d = {'a':10, 'b':20, 'c':30, 'd':40, 'e':50}

with open('config.txt', 'w') as f:     # open the file for writing, and guarantee flush+close
    for key, value in d.items():       # iterate over the dict, one key-value pair at a time
        f.write(f'{key}={value}\n')    # write the current key-value pair to to file

In [97]:
!cat config.txt

a=10
b=20
c=30
d=40
e=50


# When do we write to a file?

1. Logging
2. Writing dynamic configurations
3. Write a Git configuration for each new course I offer

# What about `with` and reading from files?

Whenever I demonstrate reading from files in a blog/article/YouTube video, I make sure to use `with` for reading, not just for writing. That's because I will otherwise get hate mail! 

We can use `with` when reading from files. I don't think it's very necessary, but it's not a *bad* thing.

In [99]:
with open('/etc/passwd', 'r') as f:
    for one_line in f:
        print(len(one_line), end=' ')

3 16 3 76 71 18 2 70 18 3 59 50 54 72 62 64 70 61 71 70 70 72 56 66 62 67 52 63 60 69 58 50 50 54 66 67 59 63 64 61 62 61 62 61 55 55 74 53 65 65 55 56 50 56 88 66 61 70 81 65 62 56 75 65 54 64 75 72 85 72 67 53 55 69 77 74 94 85 97 73 84 68 71 70 63 55 82 74 64 66 76 55 78 80 56 63 82 76 63 55 69 61 99 73 55 63 79 100 57 83 62 77 104 55 67 92 89 64 62 51 76 74 75 84 52 80 77 85 62 78 82 73 77 76 75 97 104 69 77 53 

In [101]:
f.closed

True

In [102]:
# .. says: I opened the file for writing like this:

f = open('myfile.txt', 'w') as f:

# "as" is an assignment operator, assigning a value to a variable
# you can only use it in some cases, such as in `with` So the following two lines are
# mostly identical:

f = open('myfile.txt', 'w')

with open('myfile.txt', 'w') as f:

# however, after the "with" block, f will be flushed + closed    


SyntaxError: invalid syntax (110120771.py, line 3)

# Next time

- Functions
- Writing them
- Invoking them
- Arguments and parameters
- Return values