# Week 3: Dictionaries and files

1. Recap of last week
2. Dictionaries
3. Text files -- reading and writing

# Recap 

1. Data structures!
    - Each data structure allows us to store and retrieve data in a different way
    - Numbers (integers and floats)
2. Sequences 
    - Strings: immutable (cannot be changed), contain characters
        - anything textual -- user input, displaying on the screen, reading from files, reading from the network, sending e-mail
        - `str.strip` -- removes whitespace from the sides
        - `str.isdigit` -- tells us if there are only digits in the string
        - `str.lower` -- returns a new string, based on ours, in lowercase
    - Lists: mutable (can be changed), contain anything
        - collections of data (traditionally of the same type) -- lists of users, lists of filenames, lists of folders, lists of IP addresses
        - `list.append` -- adds an element to the end of the list
        - `list.pop` -- removes the element from the end of a list
    - Tuples: immutable (cannot be changed), contain anything
        - collections of *differently*-typed data -- records or structs
    - All sequences can do:
        - Retrieve with `[i]`, where `i` is a numeric index
        - Get the length with `len(s)`, where `s` is a sequence
        - Retrieve a slice with `[start:stop]`
        - Search with `in`
        - Loop with `for`
3. Loops
    - `for` loops -- iterate over every element of a sequence
    - `for` loops on `range(n)` gives us `n` items, from 0 to `n`-1
    - `while` loops, which run so long as the condition is `True`
4. Strings to lists and back       
    - `str.split` -- returns a list of strings, based on the string
    - `str.join` -- returns a string, based on a "glue" string and a list of strings

# Parentheses and quotes

Python uses every type of quote and parentheses on the keyboard! They each have their own usages:

### Quotation marks
- Use `''` or `""` to start and end strings.  There is no difference between them, but it's common to use one when you have the other on the inside of the string.
- If you really need `'` or `"` inside of a string, just use a backslash beforehand, such as `'He\'s very nice'`.

## Parentheses

### `()` -- round parentheses 
- Calling functions 
- Calling classes/types
- Grouping
- Creating tuples

### `[]` -- square brackets
- Creating lists
- Retrieving individual items from strings, lists, tuples
- Retrieving slices from strings, lists, tuples

### `{}` -- curly braces
- Creating dicts
- What should be evaluated in an f-string


# Dictionaries

We've seen with lists and tuples that we can store whatever data we want there, but we need to retrieve it using the appropriate index.  Indexes start with 0 and go up to `len(s)` - 1.  

In dicts, we also have indexes and values. But in the case of dicts, the indexes are called "keys" and *we* decide what they are.  The values can still be anything.

Other languages also have "dicts" -- but they call them other things:
- Hash tables
- Hash maps
- Hashes
- Maps
- Key-value stores
- Name-value stores
- Associative arrays

What's especially nice about dicts is that the keys can be *ANY IMMUTABLE TYPE*, which normally means: numbers and strings.  You aren't restricted to using numbers, and they don't have to be in order.

We can use dicts anywhere we have a "mapping" between one set of values and another:
- Computer name -> IP address
- Username -> password
- User ID -> Username
- Postal code -> state/province

Keys must be unique! They cannot repeat themselves.

In [1]:
# to define a dict, use {}
# each key-value pair has a key and value, separated by :
# the pairs are separated by ,

d = {'a':1, 'b':2, 'c':3}   # creating a dict with three key-value pairs

In [15]:
# how big is this dict?  -- len returns the number of key-value pairs
len(d)

3

In [3]:
# to retrieve a value from the dict, we use its key
d['a']

1

In [4]:
d['b']

2

In [5]:
d['c']

3

In [6]:
# what happens if I try to retrieve a key that doesn't work?
d['x']

KeyError: 'x'

In [7]:
# I can search for a key using "in" -- this does *NOT* search in the values!
'a' in d

True

In [8]:
'x' in d

False

In [9]:
d

{'a': 1, 'b': 2, 'c': 3}

In [10]:
# don't do this, but Python doesn't care!
d = {'a':1, ' ':2, ' a ':3}

In [11]:
len(d)

3

In [12]:
d['a']

1

In [13]:
d[' ']

2

In [14]:
d[' a ']

3

In [16]:
d = {'a':1, 'b':2, 'c':3}

In [17]:
d = {'a':1, "a":2}  # there is no difference between 'a' and "a", just entering it
d

{'a': 2}

# Paradigm 1 for dict use: Mini-database

We can create a small dict at the start of our program, and then query it when the program runs. This could be for month names -> month numbers (or month numbers -> month names).

# Exercise: Restaurant 

1. Define a small dict, called `menu`, in which the keys are strings (entries on a menu) and the values are prices (integers).
2. Define `total` to be 0.
3. Ask the user repeatedly what they want to order.
    - If they enter an empty string, stop asking and give the total bill.
    - If they enter a string that's on our menu as an entry, tell them the price and the new total.
    - If they enter a string that's *NOT* on our menu as an entry, scold them appropriately!
4. Print the total at the end.

Hints:
- We can use a `while` loop to ask questions repeatedly, especially `while True`.
- We can check for an empty string if we compare with `''`
- We can break out of a loop with the `break` command
- Check for membership in a dict with `in`


In [18]:
menu = {'sandwich':10, 'tea':5, 'apple':1, 'cake':4}

menu['sandwich']

10

In [19]:
menu['tea']

5

In [20]:
'tea' in menu

True

In [21]:
# assigning the string 'tea' to the variable order
order = 'tea'

In [22]:
# now I can use the variable to search in the dict
# this will only search in the keys, not the value
# in other words: is the value currently in the variable order (i.e., 'tea') a key in the dict menu?
order in menu

True

In [23]:
# let's retrieve the value associated with the dict menu, key order
menu[order]

5

In [24]:
menu = {'sandwich':10, 'tea':5, 'apple':1, 'cake':4}
total = 0

while True:   # infinite loop!
    order = input('Order: ').strip()    # get the user's order, remove whitespace, assign to order
    
    if order == '':   # did we get an empty string? Stop asking
        break
        
    elif order in menu:         # is the user's order a key in the menu dict?
        price = menu[order]   # get the price of the user's order from menu
        total += price        # add this price to the total
        print(f'{order} costs {price}, total is now {total}')
    else:
        print(f'We are all out of {order} today!')
        
print(f'Total is {total}.')        

Order: sandwich
sandwich costs 10, total is now 10
Order: tea
tea costs 5, total is now 15
Order: cake
cake costs 4, total is now 19
Order: tea
tea costs 5, total is now 24
Order: elephant
We are all out of elephant today!
Order: 
Total is 24.


In [25]:
# fancy version of our program -- with an order history

menu = {'sandwich':10, 'tea':5, 'apple':1, 'cake':4}
total = 0
order_history = []

while True:   # infinite loop!
    order = input('Order: ').strip()    # get the user's order, remove whitespace, assign to order
    
    if order == '':   # did we get an empty string? Stop asking
        break
        
    elif order in menu:         # is the user's order a key in the menu dict?
        price = menu[order]   # get the price of the user's order from menu
        total += price        # add this price to the total
        order_history.append(order)
        print(f'{order} costs {price}, total is now {total}')
    else:
        print(f'We are all out of {order} today!')
        
print(f'Total is {total}.')        
for one_item in order_history:
    print(f'\t{one_item}')

Order: sandwich
sandwich costs 10, total is now 10
Order: tea
tea costs 5, total is now 15
Order: cake
cake costs 4, total is now 19
Order: apple
apple costs 1, total is now 20
Order: sandwich
sandwich costs 10, total is now 30
Order: tea
tea costs 5, total is now 35
Order: 
Total is 35.
	sandwich
	tea
	cake
	apple
	sandwich
	tea


# Dictionaries are mutable!

We can change dictionaries:
- Add a key-value pair
- Update/change a value
- Remove a key-value pair

In [26]:
d = {}        # empty dict
d['a'] = 10   # add a key-value pair via assignment -- there is no "append" method!
print(d)

d['a'] = 20   # if the key already exists, we update the existing value for the key
print(d)

{'a': 10}
{'a': 20}


In [27]:
d['b'] = 30
d['c'] = 40
d['a'] = 10

print(d)

{'a': 10, 'b': 30, 'c': 40}


In [28]:
# to remove a key-value pair, use "pop"
# we have to provide the key that we'll be removing
d.pop('b')  # removes 'b':its value, and returns its value

30

In [29]:
d

{'a': 10, 'c': 40}

In [30]:
# key-value pairs in modern Python are kept in chronological order of adding the keys
d = {}
d['x'] = 10
d['v'] = 20
d['q'] = 30
d['w'] = 40
d['y'] = 50

d

{'x': 10, 'v': 20, 'q': 30, 'w': 40, 'y': 50}

In [35]:
# the keys must be immutable
mylist = [10, 20, 30]

# let's try to add a new key-value pair to our dict,
# with a key of mylist and value of 10
d[mylist] = 10   # this doesn't work, because lists are MUTABLE and cannot be dict keys

TypeError: unhashable type: 'list'

In [32]:
# more examples of dictionaries

# usernames as keys, ID numbers as values
users = {'reuven':12345, 'admin':999, 'someone':456}
users['reuven']

12345

In [33]:
# month numbers as keys, month names as values
months = {1:'Jan', 2:'Feb', 3:'Mar', 4:'Apr', 5:'May'}
months[1]

'Jan'

In [34]:
months[4]

'Apr'

# Keys and values

Keys in a dict must be immutable. They thus can be:
- `True` and `False` (boring)
- Integers or floats
- Strings
- Tuples, so long as the tuples only contain immutable values

Values in a dict can be **ABSOLUTELY ANYTHING AT ALL**:
- integers
- strings
- lists
- tuples
- dicts
- functions
- modules
- classes
- YOU NAME IT!

Dicts are one-way streets -- you can get the value via the key, but *NOT* the other way around.

In [36]:
# tuple with immutable values:
# these can be used as dict keys 
# Consider: coordinates on an x,y axis!

t = (10, 20, 30)      # numbers are immutable
t = ('a', 'b', 'c')   # strings are immutable

In [37]:
# tuple with mutable values:

t = ([10, 20, 30], [40, 50, 60])  # lists are mutable
t = ({'a':1, 'b':2}, {'c':3, 'd':4})  # dicts are mutable

# Next up

- Accumulating known things
- Accumulating unknown things



In [38]:
# I want to keep track of a, b, and c

# start off my dict with these three keys, all values are 0
d = {'a':0, 'b':0, 'c':0}

d['a'] += 1   # add 1 to the current value of d['a']
d['b'] += 3   # add 3 to the current value of d['b']

d


{'a': 1, 'b': 3, 'c': 0}

In [41]:
# let's keep track of how often each high temp will be in Modi'in
d = {8:0, 9:0, 10:0, 11:0, 12:0, 13:0, 14:0, 15:0, 16:0, 17:0, 18:0}

d[14] += 1
d[14] += 1
d[14] += 1
d[10] += 1
d[13] += 1
d[10] += 1
d[12] += 1
d[8] += 1
d[11] += 1

d

{8: 1, 9: 0, 10: 2, 11: 1, 12: 1, 13: 1, 14: 3, 15: 0, 16: 0, 17: 0, 18: 0}

In [42]:
# what if I initialized d to be empty:

# let's keep track of how often each high temp will be in Modi'in
d = {}

d[14] += 1   # this means: d[14] = d[14] + 1
d[14] += 1
d[14] += 1
d[10] += 1
d[13] += 1
d[10] += 1
d[12] += 1
d[8] += 1
d[11] += 1

d

KeyError: 14

# Exercise: Vowels, digits, and others

1. Define a dict, `counts`, with three keys: `vowels`, `digits`, and `others`.  The value for each key should be 0.
2. Ask the user to enter a string.
3. Go through each character in the string:
    - If the character is a vowel (a, e, i, o, u) then add one to `vowels`
    - If the character is a digit, then add one to `digits`
    - If the character is neither, then add one to `others`
4. Print the resulting dictionary

Hints:
- You can check for membership in a string with `in`
- You can check if a string contains only digits 0-9 with `str.isdigit`


In [48]:
counts = {'vowels':0, 'digits':0, 'others':0}

s = input('Enter a string: ').strip()

for one_character in s:
    if one_character in 'aeiou':
        counts['vowels'] += 1
    elif one_character.isdigit():
        counts['digits'] += 1
    else:
        counts['others'] += 1
    
print(counts)    

Enter a string: abe123!?
{'vowels': 2, 'digits': 3, 'others': 3}


In [51]:
# I want to count the characters in a string
# meaning: ask the user to enter a string
# how often does each character show up?

counts = {}   # empty dict!

s = input('Enter a string: ').strip()

for one_character in s:

    # one_character -- the current character, in the loop, from s
    # if we ask "one_character in counts" -- we're checking if the character is a key in the counts dict
    # but here, we're saying "one_character not in counts", which means, True if it's *NOT* a key there
    # so: if the current character is *not* a key in the dict, then add it to the dict with a value of 1

    if one_character not in counts:  
        counts[one_character] = 1    # add the key-value pair the first time we see this key

    # if the current character *is* already a key counts, 
    # then just add 1 to the value associated with that key

    else:
        counts[one_character] += 1   # add 1 to the value the subsequent times we see the key
    
print(counts)    

Enter a string: hello
{'h': 1, 'e': 1, 'l': 2, 'o': 1}


# Three paradigms for working with dicts

1. We define the dict at the start of the program, and read from it as a database (e.g., menu)
2. We define the dict with keys and starting values (often 0), and add to those values over time (e.g., digits, vowels, and others)
3. We define an empty dict, and based on input, we add keys and values (e.g., character counter)

# Exercise: Rainfall

We're going to use a dict to keep track of total rainfall in cities. Which cities? Whichever ones the user decides to enter.  How much rain? Whatever the user enters. 

The dict, `rainfall`, will have city names (strings) as keys, and mm rain (integers) as values.

1. Define an empty dict, `rainfall`.
2. Ask the user, again and again, to enter a city name.
3. If they give us an empty string, stop asking and print `rainfall`.
4. Ask the user a 2nd question: How much rain fell, in mm?
5. If this is the first time encountering this city, then assign to our dict, with the city name as the key and `mm_rain` as the value.
6. If we have seen this city before, then add the value in `mm_rain` to the existing value.
7. Print the dict.

Example:

    City: Jerusalem
    Rain: 5
    City: Tel Aviv
    Rain: 4
    City: Jerusalem
    Rain: 3
    City: [ENTER]
    {'Jerusalem':8, 'Tel Aviv':4}

In [53]:
rainfall = {}

while True:
    city_name = input('City: ').strip()
    
    if city_name == '':  # got an empty city name? stop!
        break
        
    mm_rain = input('Rain: ').strip()  
    mm_rain = int(mm_rain)   # convert to an int, assuming users gave us good input
    
    # have we seen this city before?
    if city_name in rainfall:   # yes, we have seen it before!
        rainfall[city_name] += mm_rain
    else:                       # no, this is the first time seeing this city
        rainfall[city_name] = mm_rain
        
print(rainfall)        

City: a
Rain: 5
City: b
Rain: 4
City: a
Rain: 3
City: 
{'a': 8, 'b': 4}


In [54]:
# empty dict
d = {}

city_name = 'a'
mm_rain = 5

d[city_name] = 5    # this adds the key-value pair 'a' : 5 to the dict
d

{'a': 5}

# Next up:

1. Looping over dicts
2. How dicts work (and why we use them)
3. Working with files

If you are on your own computer, and you can download + install text files into the same directory as you're running Jupyter, then please download the zipfile mentioned in the resources.  It contains some text files we'll be using.

In [55]:
d = {'a':10, 'b':20, 'c':30}

# will this work? And if it does, what do I get with each iteration?
for one_item in d:
    print(one_item)  # looping over a dict gives me the keys

a
b
c


In [56]:
# one classic way to get all keys and values in a "for" loop
d = {'a':10, 'b':20, 'c':30}

for one_key in d:
    print(f'{one_key}: {d[one_key]}') 

a: 10
b: 20
c: 30


In [57]:
# dicts have a few methods to retrieve their parts

d.keys()   # returns all keys

dict_keys(['a', 'b', 'c'])

In [58]:
# don't iterate over d.keys()!  It's faster to just iterate over d

In [59]:
d.values()   # returns all values

dict_values([10, 20, 30])

In [60]:
# I can search in the values, if I want
# (relatively rare, but it does happen)

30 in d.values()

True

In [63]:
# I can print all the values... but I can't get the keys from the values

for one_value in d.values():
    print(one_value)

10
20
30


In [64]:
months.values()

dict_values(['Jan', 'Feb', 'Mar', 'Apr', 'May'])

In [65]:
for month_name in months.values():
    print(month_name)

Jan
Feb
Mar
Apr
May


In [66]:
# my favorite way to iterate over dicts is with the "items" method

for one_item in d.items():
    print(one_item)  # we get a 2-element tuple, with the key and value for each pair

('a', 10)
('b', 20)
('c', 30)


In [68]:
# this will work -- t[0] is the key and t[1] is the value
for t in d.items():
    print(f'{t[0]}: {t[1]}')

a: 10
b: 20
c: 30


In [69]:
# we can use unpacking
# unpacking means: we have a sequence on the right, and multiple variables on the left

mylist = [10, 20, 30]
x,y,z = mylist    # three variables = three elements in mylist -- so x=10, y=20, z=30

In [70]:
x

10

In [71]:
y

20

In [72]:
z

30

In [73]:
# I can use unpacking in my for loop!
for key, value in d.items():
    print(f'{key}: {value}')

a: 10
b: 20
c: 30


In [74]:
for month_number, month_name in months.items():
    print(f'{month_number}: {month_name}')

1: Jan
2: Feb
3: Mar
4: Apr
5: May


# How do dicts work?

Why do we need (want) dictionaries? Can't we just use lists?  What's the big advantage?

There are several:
- Keys are guaranteed to be unique -- I don't need to worry about duplicates
- My programs become more readable -- I'd prefer to use keys with meaning than numbers for indexes
- Searching for a key in a dict is *FAR, FAR* faster than in a list.  This means:
    - Checking if a key is in a dict is faster
    - Retrieving the value based on a key is faster

In [75]:
d = {'a':10, 'a':20, 'a':30, 'a':40, 'a':5}

In [76]:
d

{'a': 5}

In [77]:
d['a'] = 20

In [78]:
d['a'] = 30

In [79]:
mylist = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

70 in mylist  # how does Python search? With a for loop!

# how long might that search take? We might need to search every element in mylist
# the longer mylist is, the longer the search might take

# in CS theory, we call this O(n) -- meaning, the time it takes is proportional to the length of the list
# or: longer lists take longer to search

True

In [80]:
d = {'a':10, 'b':20, 'c':30}

'b' in d  # what's really going on here?

# Python runs a "hash function" on the keys -- hash('b')
# this number tells us where in memory the key 'b' should be stored
# if it's there, then we can get the value
# if it's not there, then we can report it's not in the dict

# no matter how many key-value pairs you have in the dict, searching
# takes exactly the same amount of time!

# That's known in CS theory as O(1) -- constant time

True

In [81]:
# why can't we use mutable data as dict keys?
# answer: because the hash function uses the key to know where the value is in memory
# if we change the data, then the location is suddenly wrong!



# Files!

Files allow us to store data when a program (or computer) isn't running, so that we don't have to enter all of that data each time.

When we talk about "files" in real life, we think about Word files, Excel files, PowerPoint files, PDF files, and HTML files.  A more common type of file is the "text file," containing no real formatting, just text.

We're going to talk about working with such (plain-text) files.

If our program is going to talk to a file on the filesystem (disk), we'll need to go through the operating system, which is the mediator in such things. We'll need an agent, or an object, to act on our behalf.

In many languages, we call such an object a "file handle." In Python, we call it a "file object," or (to be more precise) a "file-like object."  (There are many things that aren't files, but implement the functionality of a file, so that's why they're called file-like objects.)

So to read from a text file, we'll need to:
- Tell the OS we want to read from the file
- Get a file-like object back from Python, which was in touch with the OS
- Tell the file-like object we want to read from the file

In [87]:
# I'm going to use the /etc/passwd file from my Mac
# /etc/passwd is a traditional Unix file with usernames and (no longer) passwords.

f = open('/etc/passwd')  # (1) ask the OS to open a *read* connection to the file (2) return the file object

In [88]:
# what does the file object tell us about itself?
f

<_io.TextIOWrapper name='/etc/passwd' mode='r' encoding='UTF-8'>

In [89]:
# can I read from the file? YES!  I can run the read() method, which returns
# a string containing the entire contents of the file

# when you run f.read(), it starts from the end of the last read
# so if you call it twice in a row, the second time will return 0 bytes

s = f.read()   # this works, but it's usually a bad idea

In [90]:
len(s)

7630

In [91]:
# a much better way, and more idiomatic way, to read from a file in Python
# is to use ... a for loop
# if you iterate over a file, you'll get the next line with each iteration
# (Each line includes the trailing \n character)

# since each line read from the file includes the \n at the end
# and print adds \n when it displays something,
# printing /etc/passwd ends up being double spaced.

for one_line in open('/etc/passwd'):
    print(one_line)

##

# User Database

# 

# Note that this file is consulted directly only when the system is running

# in single-user mode.  At other times this information is provided by

# Open Directory.

#

# See the opendirectoryd(8) man page for additional information about

# Open Directory.

##

nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false

root:*:0:0:System Administrator:/var/root:/bin/sh

daemon:*:1:1:System Services:/var/root:/usr/bin/false

_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico

_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false

_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false

_installassistant:*:25:25:Install Assistant:/var/empty:/usr/bin/false

_lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false

_postfix:*:27:27:Postfix Mail Server:/var/spool/postfix:/usr/bin/false

_scsd:*:31:31:Service Configuration Service:/var/empty:/usr/bin/false

_ces:*:32:32:Certificate Enrollment Service:/var/empty:/usr/bin/fal

In [92]:
# remove the newline from the end of each line, and print 
for one_line in open('/etc/passwd'):
    print(one_line.strip())

##
# User Database
#
# Note that this file is consulted directly only when the system is running
# in single-user mode.  At other times this information is provided by
# Open Directory.
#
# See the opendirectoryd(8) man page for additional information about
# Open Directory.
##
nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false
_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false
_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false
_installassistant:*:25:25:Install Assistant:/var/empty:/usr/bin/false
_lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false
_postfix:*:27:27:Postfix Mail Server:/var/spool/postfix:/usr/bin/false
_scsd:*:31:31:Service Configuration Service:/var/empty:/usr/bin/false
_ces:*:32:32:Certificate Enrollment Service:/var/empty:/usr/bin/false
_appstore:*:33:33:

In [93]:
# what if I just want to print the usernames on the system?

for one_line in open('/etc/passwd'):
    fields = one_line.split(':')  # use : on each line to break things into a list of strings
    print(fields[0])    # print the first element in the list, aka the username

##

# User Database

# 

# Note that this file is consulted directly only when the system is running

# in single-user mode.  At other times this information is provided by

# Open Directory.

#

# See the opendirectoryd(8) man page for additional information about

# Open Directory.

##

nobody
root
daemon
_uucp
_taskgated
_networkd
_installassistant
_lp
_postfix
_scsd
_ces
_appstore
_mcxalr
_appleevents
_geod
_devdocs
_sandbox
_mdnsresponder
_ard
_www
_eppc
_cvs
_svn
_mysql
_sshd
_qtss
_cyrus
_mailman
_appserver
_clamav
_amavisd
_jabber
_appowner
_windowserver
_spotlight
_tokend
_securityagent
_calendar
_teamsserver
_update_sharing
_installer
_atsserver
_ftp
_unknown
_softwareupdate
_coreaudiod
_screensaver
_locationd
_trustevaluationagent
_timezone
_lda
_cvmsroot
_usbmuxd
_dovecot
_dpaudio
_postgres
_krbtgt
_kadmin_admin
_kadmin_changepw
_devicemgr
_webauthserver
_netbios
_warmd
_dovenull
_netstatistics
_avbdeviced
_krb_krbtgt
_krb_kadmin
_krb_changepw
_krb_kerberos
_krb_anonymous
_

In [96]:
# very common idiom for:
# (1) opening a file
# (2) going through each line
# (3) ignoring of the lines that we don't care about
# (4) breaking each line into parts
# (5) printing the part that's of interest to us

for one_line in open('/etc/passwd'):
    if not one_line.startswith('#'): # startswith asks: does the string start with this argument?
        fields = one_line.split(':')  # use : on each line to break things into a list of strings
        print(fields[0])    # print the first element in the list, aka the username

nobody
root
daemon
_uucp
_taskgated
_networkd
_installassistant
_lp
_postfix
_scsd
_ces
_appstore
_mcxalr
_appleevents
_geod
_devdocs
_sandbox
_mdnsresponder
_ard
_www
_eppc
_cvs
_svn
_mysql
_sshd
_qtss
_cyrus
_mailman
_appserver
_clamav
_amavisd
_jabber
_appowner
_windowserver
_spotlight
_tokend
_securityagent
_calendar
_teamsserver
_update_sharing
_installer
_atsserver
_ftp
_unknown
_softwareupdate
_coreaudiod
_screensaver
_locationd
_trustevaluationagent
_timezone
_lda
_cvmsroot
_usbmuxd
_dovecot
_dpaudio
_postgres
_krbtgt
_kadmin_admin
_kadmin_changepw
_devicemgr
_webauthserver
_netbios
_warmd
_dovenull
_netstatistics
_avbdeviced
_krb_krbtgt
_krb_kadmin
_krb_changepw
_krb_kerberos
_krb_anonymous
_assetcache
_coremediaiod
_launchservicesd
_iconservices
_distnote
_nsurlsessiond
_displaypolicyd
_astris
_krbfast
_gamecontrollerd
_mbsetupuser
_ondemand
_xserverdocs
_wwwproxy
_mobileasset
_findmydevice
_datadetectors
_captiveagent
_ctkd
_applepay
_hidd
_cmiodalassistants
_analyticsd
_fps

# Exercise: Summing numbers

In the zipfile I provided, there's a file called `nums.txt`. Each line of that file contains an optional integer, as well as some whitespace (meaning: `\n`, `\t`, and spaces).

1. Define `total` to be 0
2. Go through the file `nums.txt`, one line at a time
3. Check: Can we turn the current line, after removing spaces, into an integer?
4. If so, then do so, and add the int to `total`
5. Print `total`.


In [97]:
# On Unix, we can use "cat" to look at a file
# In Jupyter, we can use ! at the start of a line to execute programs
!cat nums.txt

5
	10     
	20
  	3
		   	20        

 25


In [103]:
total = 0
for one_line in open('nums.txt'):
    if one_line.strip() != '':         # after removing whitespace, do we have a non-empty string?
        total += int(one_line.strip())   # if so, print the integer we got from that line
print(total)        

83


In [104]:
# slight change, probably easier to understand

total = 0
for one_line in open('nums.txt'):
    if one_line.strip().isdigit():       # after removing whitespace, do we have digits only?
        total += int(one_line.strip())   # if so, print the integer we got from that line
print(total)        

83


# Next up

1. Reading from files, counting in dicts
2. Writing to files



In [105]:
f = open('/etc/passwd')

f.close()   # good to do when reading

In [106]:
# here, our file is closed automatically
# because the only reference to it is in the "for" loop
# when the loop exits/ends, the reference to the file object disappears,
# and the file can be closed automatically by Python

for one_line in open('/etc/passwd'):
    print(len(one_line))

3
16
3
76
71
18
2
70
18
3
59
50
54
72
62
64
70
61
71
70
70
72
56
66
62
67
52
63
60
69
58
50
50
54
66
67
59
63
64
61
62
61
62
61
55
55
74
53
65
65
55
56
50
56
88
66
61
70
81
65
62
56
75
65
54
64
75
72
85
72
67
53
55
69
77
74
94
85
97
73
84
68
71
70
63
55
82
74
64
66
76
55
78
80
56
63
82
76
63
55
69
61
99
73
55
63
79
100
57
83
62
77
104
55
67
65
89
60
51
53


In [1]:
# you can install friendly using pip on your computer (not inside of Jupyter!)
# pip install friendly

# after doing that, in Jupyter, you can say:
from friendly.jupyter import Friendly

friendly_traceback 0.5.11; friendly 0.5.5.
Type 'Friendly' for basic help.


In [2]:
name = 'Reuven'
print(namee)

In [4]:
# in Windows, be careful!  You need to double backslashes *or* use raw strings
open(r'c:\Users\reuven\whatever')

In [6]:
# only print the lines containing 'x'

for one_line in open('/etc/passwd'):
    if 'x' in one_line:
        print(one_line.strip())

_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
_postfix:*:27:27:Postfix Mail Server:/var/spool/postfix:/usr/bin/false
_mcxalr:*:54:54:MCX AppLaunch:/var/empty:/usr/bin/false
_sandbox:*:60:60:Seatbelt:/var/empty:/usr/bin/false
_usbmuxd:*:213:213:iPhone OS Device Helper:/var/db/lockdown:/usr/bin/false
_xserverdocs:*:251:251:macOS Server Documents Service:/var/empty:/usr/bin/false
_wwwproxy:*:252:252:WWW Proxy:/var/empty:/usr/bin/false
_nearbyd:*:268:268:Proximity and Ranging Daemon:/var/db/nearbyd:/usr/bin/false
_reportmemoryexception:*:269:269:ReportMemoryException:/var/db/reportmemoryexception:/usr/bin/false


In [8]:
look_for = input('Search for: ').strip()

for one_line in open('/etc/passwd'):
    if look_for in one_line:
        print(one_line.strip())

Search for: q
_mysql:*:74:74:MySQL Server:/var/empty:/usr/bin/false
_qtss:*:76:76:QuickTime Streaming Server:/var/empty:/usr/bin/false


In [11]:
# let's find usernames in linux-etc-passwd.txt (which was in my zipfile)

for one_line in open('linux-etc-passwd.txt'):
    if not one_line.startswith('#') and not one_line.startswith('\n'):
        fields = one_line.split(':')   # break the line apart
        print(fields[0])               # print the username, the first field

root
daemon
bin
sys
sync
games
man
lp
mail
news
uucp
proxy
www-data
backup
list
irc
gnats
nobody
syslog
messagebus
landscape
jci
sshd
user
reuven
postfix
colord
postgres
dovecot
dovenull
postgrey
debian-spamd
memcache
genadi
shira
atara
shikma
amotz
mysql
clamav
amavis
opendkim
gitlab-redis
gitlab-psql
git
opendmarc
dkim-milter-python
deploy
redis


In [12]:
# another way: pass a tuple of strings to startswith, and it looks for any/all of them

for one_line in open('linux-etc-passwd.txt'):
    if not one_line.startswith(('#', '\n')):
        fields = one_line.split(':')   # break the line apart
        print(fields[0])               # print the username, the first field

root
daemon
bin
sys
sync
games
man
lp
mail
news
uucp
proxy
www-data
backup
list
irc
gnats
nobody
syslog
messagebus
landscape
jci
sshd
user
reuven
postfix
colord
postgres
dovecot
dovenull
postgrey
debian-spamd
memcache
genadi
shira
atara
shikma
amotz
mysql
clamav
amavis
opendkim
gitlab-redis
gitlab-psql
git
opendmarc
dkim-milter-python
deploy
redis


In [13]:
# let's create a simple report!
# we'll use a file and a dictionary

# the file: mini-access-log.txt (an old Apache HTTP server logfile)

In [14]:
!head mini-access-log.txt

67.218.116.165 - - [30/Jan/2010:00:03:18 +0200] "GET /robots.txt HTTP/1.0" 200 99 "-" "Mozilla/5.0 (Twiceler-0.9 http://www.cuil.com/twiceler/robot.html)"
66.249.71.65 - - [30/Jan/2010:00:12:06 +0200] "GET /browse/one_node/1557 HTTP/1.1" 200 39208 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
65.55.106.183 - - [30/Jan/2010:01:29:23 +0200] "GET /robots.txt HTTP/1.1" 200 99 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.183 - - [30/Jan/2010:01:30:06 +0200] "GET /browse/one_model/2162 HTTP/1.1" 200 2181 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
66.249.71.65 - - [30/Jan/2010:02:07:14 +0200] "GET /browse/browse_applet_tab/2593 HTTP/1.1" 200 10305 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.65 - - [30/Jan/2010:02:10:39 +0200] "GET /browse/browse_files_tab/2499?tab=true HTTP/1.1" 200 446 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.12 - -

In [17]:
# how many times did each IP address make a request to my system?

counts = {}   # create an empty dict -- IP addresses will be keys, integers (counts) will be values

for one_line in open('mini-access-log.txt'):
    fields = one_line.split() # return a list of strings, based on one_line, using whitespace separators
    ip_address = fields[0]
    
    if ip_address in counts:   # have we seen this IP address before?
        counts[ip_address] += 1
    else:
        counts[ip_address] = 1 # first time seeing this IP? Set the count to 1
        
for key, value in counts.items():
    print(f'{key}: {value}')

67.218.116.165: 2
66.249.71.65: 3
65.55.106.183: 2
66.249.65.12: 32
65.55.106.131: 2
65.55.106.186: 2
74.52.245.146: 2
66.249.65.43: 3
65.55.207.25: 2
65.55.207.94: 2
65.55.207.71: 1
98.242.170.241: 1
66.249.65.38: 100
65.55.207.126: 2
82.34.9.20: 2
65.55.106.155: 2
65.55.207.77: 2
208.80.193.28: 1
89.248.172.58: 22
67.195.112.35: 16
65.55.207.50: 3
65.55.215.75: 2


# Writing to files

To write to a file, we'll need to:

- Open it in *write* mode
- Use the `write` method
- Close the file -- to ensure data is really written to disk

In [18]:
# manual method

# WARNING! When you open a file for writing, if the file already existed, then 
# it is immediately removed/destroyed/erased, and replaced with a new, zero-length file

f = open('myfile.txt', 'w')    # open for writing == we cannot read from it
f.write('abcd\n')              # unlike print, write doesn't automatically add \n at the end
f.write('efghi\n')             
f.close()                      # flushes the buffer + closes the file, guaranteeing all data is there

In [19]:
!cat myfile.txt

abcd
efghi


In [20]:
# better method, using "with"
# known as a "context manager" 

with open('myfile.txt', 'w') as f:
    f.write('abcd\n')             
    f.write('efghi\n')             
    # at the end of a "with" block, the file is automatically flushed+closed

In [21]:
# Use Python to look at the file!
for one_line in open('myfile.txt'):
    print(one_line.strip())

abcd
efghi


In [22]:
# to append to a file (i.e., write to its end, rather than overwrite everything), use the 'a' mode:

with open('myfile.txt', 'a') as f:
    f.write('*** abcd\n')             
    f.write('*** efghi\n')   
    
!cat myfile.txt    


abcd
efghi
*** abcd
*** efghi


In [23]:
# what if I want to change text in a file?

# (1) what do I want to change from and to?
# (2) Read the input file
# (3) Write to the output file

filename = 'myfile.txt'

change_from = 'g'
change_to = '!'

with open('mynewfile.txt', 'w') as outfile:
    for one_line in open(filename):
        outfile.write(one_line.replace(change_from, change_to))


In [24]:
!cat mynewfile.txt

abcd
ef!hi
*** abcd
*** ef!hi


# Exercise: Dict to file

In this exercise, we're going to take the contents of a dictionary and write them, one pair per line, to a file. The key and value will be separated by an `=` sign on each line of the file.

1. Define a simple dict.
2. Open a file for writing.
3. Go through each key-value pair in the dict, and write it to the file, in the style of `key=value`.


In [27]:
d = {'a':10, 'b':20, 'c':'hello'}

# I want to write this dict to a file

with open('mydict.txt', 'w') as f:   # defining variable f, assigned to our writable file object
    for key, value in d.items():     # we iterate over the dict's keys and values
        f.write(f'{key}={value}\n')   # write key=value to our file

# end of with block == flushing + closing the file
    

In [28]:
!cat mydict.txt

a=10
b=20
c=hello


# Next week: Functions!

- What are functions?
- Writing functions
- Calling functions
- Function parameters
