# Agenda: Dicts and files

1. Q&A
2. Recap of simple data structures
3. Dictionaries ("dicts")
    - Defining dicts
    - Retrieving from them
    - Searching in them
4. Dictionaries are mutable
    - How do we update a value
    - How do we add a new key-value pair
    - How do we remove a key-value pair?
5. Accumulating
6. Accumulating the unknown
7. Looping over dicts
8. How do dicts work?
9. Files
    - What are files?
    - Reading from files (the good way, and the bad way)
10. Writing to files
    - The `with` statement, and why it's important

# Recap of data structures

We've talked about several data structures so far, for working with our information:

- `True`/`False`
- Integers and floats
- Sequences (i.e., iterable containers for other data)
    - Strings
    - Lists
    - Tuples
 
In all sequences, the values are indexed, starting at 0. 

If we know the index for a particular value, we can retrieve it from that string/list/tuple.

But if we don't know the index? Then we can search! There is a `.index` method on all three of these data structures. Think about it, though -- that means running a `for` loop in order to find out data.

The good news is that sequences (especially lists) are really convenient and easy to work with! But (a) searching in them is slow and (b) the index doesn't really have any inherent meaning. It's just when we added something to the list.



# Dictionaries ("dicts")

These are, by far, the most powerful data structures in Python. They also exist in other languages. You might have heard of them as:

- Hash tables
- Hash maps
- Hashes
- Associative arrays
- Key-value stores
- Name-value stores

The basic idea, though, is that instead of storing a single item (as we do with a string, list, tuple), we're going to store a *pair* of
items, the key (which is what we call the index in a dict) and the value (which is... the value).

Description of dict syntax, and then we'll actually do it

- To create a dictionary, we'll use `{}`
- Inside of the `{}`, we'll have zero or more key-value pairs
- Each key and value are separated by a `:`
- Each pair is separated from other pairs with `,`

What can be a key? What can be a value? Some basic rules:
- Every key in a dict has a value, and every value has a key.
- The key can be any immutable type in Python -- we normally use integers and strings
- The value can be absolutely, positively any Python value -- int, string, float, list, dict, etc.
- Keys must be unique within a dict. There can be no key duplication.
- Values can repeat themselves, though.
- You can get a value via the key, but you cannot get a key via the value (at least, not easily).

In [1]:
d = {'a':10, 'b':20, 'c':30}   # this is a dict!

In [2]:
len(d)    # how many pairs are in this dict?

3

In [3]:
# how can I retrieve from a dict? Use [] with the key you want, just like with a list or tuple

d['a']  

10

In [4]:
key = 'a'
d[key]

10

In [5]:
# what if I request a key that doesn't exist?
d['z']

KeyError: 'z'

In [6]:
# I can search in the dict, to find out if it contains a key, with "in"
# VERY VERY IMPORTANT: 'in' only searches the keys, not the values
# and it must be an *exact* match

'a' in d

True

In [7]:
d['a']

10

In [8]:
'z' in d

False

In [9]:
'A' in d

False

In [10]:
d

{'a': 10, 'b': 20, 'c': 30}

# Exercise: Restaurant 

1. Define a dictionary in which the keys are strings (entrees on a menu) and the values are integers (prices of those items). You can have as many or as few things as you want, 3-5 is a good number. Assign this dict to `menu`.
2. Set `total` to be 0.
3. Ask the user, again and again, what they want to order:
    - If their order is an empty string, stop asking and print the total
    - If their order is a key in the dict, print its price, and add it to the total
    - If their order is *not* a key in the dict, then scold them
  
Example:

    Order: sandwich
    sandwich is 10, total is 10
    Order: apple
    apple is 5, total is 15
    Order: elephant
    Sorry, we're fresh out of elephant today!
    Order: [ENTER]
    Your total is 15

Hints/ideas:
- Define the dict
- Use `while` to get repeated input from the user
- Use `if` to check if the user gave you an empty string
- Use `in` to check if the user's input is a key in the dict

In [11]:
menu = {'sandwich':10, 'apple':5, 'tea':7, 'cake':12}

total = 0

while True:
    order = input('Order: ').strip()

    if order == '':   # did we get an empty response from the user? exit the loop!
        break

    if order in menu:  # is the user's input a key in the dict?
        price = menu[order] 
        total += price
        print(f'{order} costs {price}, total is now {total}')
    else:
        print(f'We are out of {order} today!')

print(f'Total is {total}')

Order:  sandwich


sandwich costs 10, total is now 10


Order:  apple


apple costs 5, total is now 15


Order:  dinosaur


We are out of dinosaur today!


Order:  


Total is 15


In [14]:
menu = {'sandwich':10, 'apple':5, 'tea':7, 'cake':12}
order = 'apple     '

if order in menu:
    price = menu[order]
    print(price)
else:
    print(f'{order} is not there')

apple      is not there


# Python tutor link for this exercise

https://pythontutor.com/render.html#code=menu%20%3D%20%7B'sandwich'%3A10,%20'apple'%3A5,%20'tea'%3A7,%20'cake'%3A12%7D%0A%0Atotal%20%3D%200%0A%0Awhile%20True%3A%0A%20%20%20%20order%20%3D%20input%28'Order%3A%20'%29.strip%28%29%0A%0A%20%20%20%20if%20order%20%3D%3D%20''%3A%20%20%20%23%20did%20we%20get%20an%20empty%20response%20from%20the%20user%3F%20exit%20the%20loop!%0A%20%20%20%20%20%20%20%20break%0A%0A%20%20%20%20if%20order%20in%20menu%3A%20%20%23%20is%20the%20user's%20input%20a%20key%20in%20the%20dict%3F%0A%20%20%20%20%20%20%20%20price%20%3D%20menu%5Border%5D%20%0A%20%20%20%20%20%20%20%20total%20%2B%3D%20price%0A%20%20%20%20%20%20%20%20print%28f'%7Border%7D%20costs%20%7Bprice%7D,%20total%20is%20now%20%7Btotal%7D'%29%0A%20%20%20%20else%3A%0A%20%20%20%20%20%20%20%20print%28f'We%20are%20out%20of%20%7Border%7D%20today!'%29%0A%0Aprint%28f'Total%20is%20%7Btotal%7D'%29&cumulative=false&curInstr=23&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%22sandwich%22,%22tea%22,%22asdfasfdsafa%22,%22%22%5D&textReferences=false

In [16]:
# SM

menu = {'sandwich':40,'coke':40,'chips':20}
total=0

while True:
    order=input('Can i have your order:')
   
    if order=='':
        break

    if order in menu:
        total += menu[order]
    else:
        print(f'please order something from menu')

print(f'your total is {total}')

Can i have your order: sandwich
Can i have your order: 


your total is 40


In this program, we used a dict as a read-only database. We set it up at the top of the program, and then never modified it, but we just retrieved from it in order to do our work.

You can imagine many cases of such dicts in programs:

- Month names to month numbers
- Month numbers to month names
- Employee ID numbers to employee database records

# Modifying dicts

If I want to update a value that is already in a dict, I can just assign to that key, and the value gets updated.

In [17]:
d = {'a':10, 'b':20, 'c':30}

d['a'] = 12345
d

{'a': 12345, 'b': 20, 'c': 30}

In [18]:
# I can even add 1 to the existing value, assuming it exists
d['a'] += 1   # this means: d['a'] = d['a'] + 1

d

{'a': 12346, 'b': 20, 'c': 30}

In [19]:
# what about adding a new key-value pair?
# what method do we use in dicts to add a new pair?
# answer: we don't! We just use assignment, precisely the same as we did above for updating the value

# if you assign to a dict key that does exist already, you update the value
# if you assign to a dict key that does *not* exist already, you add the new key-value pair

In [20]:
d = {}   # empty dict
d['a'] = 10
d['b'] = 20
d['c'] = 30

d

{'a': 10, 'b': 20, 'c': 30}

In [21]:
d['a'] += 5
d

{'a': 15, 'b': 20, 'c': 30}

In [22]:
d['z'] += 10   # what will happen here?

KeyError: 'z'

In [23]:
d = {'a':10, 'b':10, 'c':10}

In [24]:
d['a'] 

10

In [25]:
d['b']

10

In [26]:
d['c']

10

In [27]:
# if I try to add the same key twice, I end up updating the value
d['a'] = 999
d

{'a': 999, 'b': 10, 'c': 10}

In [28]:
d = {'a':10, 'a':20, 'a':30}

In [29]:
d

{'a': 30}

In [30]:
d

{'a': 30}

In [31]:
d['a'] = 5
d

{'a': 5}

In [32]:
# what about removing key-value pairs?
d.pop('a')  # this means: remove the pair whose key is 'a', and return the value

5

In [33]:
d

{}

In [34]:
person = {'first':'Reuven', 'last':'Lerner', 'email':'reuven@lerner.co.il', 'shoesize':46}

In [35]:
person['first']

'Reuven'

In [36]:
person['shoesize']

46

In [37]:
person['first'] = 'NewName'

In [38]:
person

{'first': 'NewName',
 'last': 'Lerner',
 'email': 'reuven@lerner.co.il',
 'shoesize': 46}

In [39]:
person['first'] = 15
person

{'first': 15, 'last': 'Lerner', 'email': 'reuven@lerner.co.il', 'shoesize': 46}

# Next up

- Accumulating in dicts
- Accumulating the unknown 

# Paradigm 2 for dictionaries

- Meaning: We create a dict with keys and empty/0 values 
- As the program progresses, we update the values but never add/remove keys

In [40]:
counts = {'odds':0, 'evens':0}

numbers = [10, 11, 15, 18]

# I want to know how many even numbers there are in the list, and how many odd numbers there are in the list

for one_number in numbers:
    if one_number % 2 == 0:   # if, dividing by 2, we get a remainder of 0, it's even
        counts['evens'] += 1   
    else:
        counts['odds'] += 1

In [41]:
counts

{'odds': 2, 'evens': 2}

# Python tutor link to odds/evens

https://pythontutor.com/render.html#code=counts%20%3D%20%7B'odds'%3A0,%20'evens'%3A0%7D%0A%0Anumbers%20%3D%20%5B10,%2011,%2015,%2018%5D%0A%0A%23%20I%20want%20to%20know%20how%20many%20even%20numbers%20there%20are%20in%20the%20list,%20and%20how%20many%20odd%20numbers%20there%20are%20in%20the%20list%0A%0Afor%20one_number%20in%20numbers%3A%0A%20%20%20%20if%20one_number%20%25%202%20%3D%3D%200%3A%20%20%20%23%20if,%20dividing%20by%202,%20we%20get%20a%20remainder%20of%200,%20it's%20even%0A%20%20%20%20%20%20%20%20counts%5B'evens'%5D%20%2B%3D%201%20%20%20%0A%20%20%20%20else%3A%0A%20%20%20%20%20%20%20%20counts%5B'odds'%5D%20%2B%3D%201&cumulative=false&curInstr=15&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false

# Exercise: Vowels, digits, and others (dict edition)

1. Define a dict whose keys are `vowels`, `digits`, and `others`, all with values of 0.
2. Ask the user to enter a string.
3. Go through the string, one character at a time:
    - If the character is a vowel (aeiou), add 1 to the vowels count
    - If the character is a digit (0-9), add 1 to the digits count
    - Otherwise, add 1 to the others count
4. Print the dict

Example:

    Enter text: hello!! 123
    {'vowels':2, 'digits':3, 'others':6}

In [42]:
counts = {'vowels':0, 'digits':0, 'others':0}

s = input('Enter text: ').strip()

for one_character in s:   # iterate over the string, one character at a time
    if one_character in 'aeiou':
        counts['vowels'] += 1    # update the vowel count
    elif one_character.isdigit():    # if the charater is a digit
        counts['digits'] += 1     # add 1 to the digit count
    else:
        counts['others'] += 1     # add 1 to the others count

print(counts)

Enter text:  hello!! 123


{'vowels': 2, 'digits': 3, 'others': 6}


In [44]:
# PP

d = {'vowel':0, 'digit':0, 'other':0}
s = input ('enter a string ').strip()
for one_char in s :
    if one_char in 'aeiou':     # use in rather than == 
        d['vowel']+=1
    elif one_char.isdigit():    # check on one_char, not s
        d['digit']+=1
    else:
        d['other']+=1
print(d)        

enter a string  hello!! 123


{'vowel': 2, 'digit': 3, 'other': 6}


In [45]:
d = {'a':10, 999:'q'}

d

{'a': 10, 999: 'q'}

In [46]:
d['a']  # I want to retrieve the value associated with the key 'a' (a string), so I use quotes

10

In [47]:
d[999]  # I want to retrieve the value associated with the key 999 (an integer), so no quotes

'q'

In [48]:
k = 'a'  
d[k]   # I want to retrieve the value associated with whatever value is in the variable k, so no quotes

10

In [49]:
d['k']   # this will look for the literate key 'k', not whatever is in the variable k

KeyError: 'k'

Does it mean anything that I say ["a"]  rather than ['a']?

Answer: No. They are 100% the same, as far as Python is concerned, and I'm a bad human who is inconsistent.

# Third dict paradigm: Accumulating the unknown 

In the previous exercise, we defined the keys for our dict at the start of the program, and we didn't add or remove keys.

In this paradigm, we start with an *empty dict*! We don't know what the keys will be. But we want to count them, or do something with them.

In other words: We don't know what our keys or values will be, but we know what to do with them when we get them.



In [51]:
# example: count characters
# I'll ask the user to enter a string
# I'll use a dict to count how often each character appears in the string

counts = {}   # empty dict!

s = input('Enter text: ').strip()  # remove whitespace at the start/end

for one_character in s:
    if one_character in counts:   # have we seen this character before?
        counts[one_character] += 1   # add 1 to the existing count, which we know we have 
    else:
        counts[one_character] = 1    # add the key-value pair, with a value of 1, for the first time

print(counts)

Enter text:  hello out there


{'h': 2, 'e': 3, 'l': 2, 'o': 2, ' ': 2, 'u': 1, 't': 2, 'r': 1}


# Three paradigms for dict use in your programs

1. We define the dict at the start of the program. We never modify it -- all keys and values remain the same. It is a read-only database to which we can refer for lots of useful information. Example: Restaurant, month numbers + names.
2. We define the dict with keys that will not change, and values that will. We update the values as we go through the program. This way, at the end of the program, we have a report in our dict of how often certain things happened. Example: Odds+evens, vowels/digits/others.
3. We define an empty dict. We don't know what the keys will be, and we don't know what the values will be. But we do know what we want to do with them. Example: Charater counter -- we don't know what characters we'll encounter. We could, in theory, create an absolutely massive dict with all characters known to people as the keys, and 0 for all values. But that's wasteful -- so we'll just wait to get the input, and modify our dict on the fly.

# Exercise: Rainfall

1. Define an empty dict, called `rainfall`. The keys in this dict will be cities. The values in the dict will be integers, mm of rain that fell in that city.
2. Ask the user, repeatedly, to enter the name of a city.
    - If the user enters an empty string, stop asking
3. If the user entered a city name, ask them how much rain fell there most recently.
4. If the city is new to the dict (i.e., isn't already a key in `rainfall`), then add the new key-value pair
5. If the city already exists in the dict, then add the new amount to the existing one.
6. When the user stops enter cities, print the entire dict

Example:

    City: a
    Rain: 5
    City: b
    Rain: 4
    City: a
    Rain: 3
    City: [ENTER]
    {'a': 8, 'b': 4}

In [None]:
rainfall = {}

while True:   # I don't know how many entries the user will give me
    city_name = input('Enter city: ').strip()

    if city_name == '':   # no input? break
        break

    mm_rain = input('Rain: ').strip()
    mm_rain = int(mm_rain)

    if city_name in rainfall:    # if we've already seen this city...
        rainfall[city_name] += mm_rain
    else:
        rainfall[city_name] = mm_rain    

# Python tutor link

https://pythontutor.com/render.html#code=rainfall%20%3D%20%7B%7D%0A%0Awhile%20True%3A%20%20%20%23%20I%20don't%20know%20how%20many%20entries%20the%20user%20will%20give%20me%0A%20%20%20%20city_name%20%3D%20input%28'Enter%20city%3A%20'%29.strip%28%29%0A%0A%20%20%20%20if%20city_name%20%3D%3D%20''%3A%20%20%20%23%20no%20input%3F%20break%0A%20%20%20%20%20%20%20%20break%0A%0A%20%20%20%20mm_rain%20%3D%20input%28'Rain%3A%20'%29.strip%28%29%0A%20%20%20%20mm_rain%20%3D%20int%28mm_rain%29%0A%0A%20%20%20%20if%20city_name%20in%20rainfall%3A%20%20%20%20%23%20if%20we've%20already%20seen%20this%20city...%0A%20%20%20%20%20%20%20%20rainfall%5Bcity_name%5D%20%2B%3D%20mm_rain%0A%20%20%20%20else%3A%0A%20%20%20%20%20%20%20%20rainfall%5Bcity_name%5D%20%3D%20mm_rain&cumulative=false&curInstr=23&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%22a%22,%225%22,%22b%22,%224%22,%22a%22,%223%22,%22%22%5D&textReferences=false

In [52]:
s = '\n'
len(s)

1

In [53]:
s = ''
len(s)

0

# Next up

- Looping over dicts
- How do dicts work?
- Then: We'll start with files

Download and unzip this file, containing data files we'll use later today: https://files.lerner.co.il/exercise-files.zip

# `for` loops

We've seen that we can use a `for` loop to iterate over a number of data structures:

- Iterate over a string: We get each character, one at a time
- Iterate over a list: We get each element, one at a time
- Iterate over a tuple: We get each element, one at a time

Can we iterate over a dict? If so, what do we get?



In [54]:
d = {'a':10, 'b':20, 'c':30}

In [56]:
# iterating over a dict gives you the keys, *NOT* the values
for one_item in d:
    print(one_item)

a
b
c


In [57]:
list(d)   # this will return a list of the keys

['a', 'b', 'c']

In [58]:
# if I want to print all of the key-value pairs in a dict, how can I?

for one_key in d:
    print(f'{one_key}: {d[one_key]}')

a: 10
b: 20
c: 30


In [59]:
# what if I want to iterate over the values?
# There is a "values" method on dicts

d.values()

dict_values([10, 20, 30])

In [60]:
for one_value in d.values():
    print(one_value)

10
20
30


In [61]:
20 in d.values()  

True

If there is a `values` method for dictionaries, maybe there's also a `keys` method?

Could I say

In [63]:
# You should almost NEVER be using the "keys" method
# it is slower to execute and less idiomatic than just using the dict itself

for one_key in d.keys():
    print(f'{one_key}: {d[one_key]}')

a: 10
b: 20
c: 30


# Getting keys and values

Wouldn't it be nice if we could get both keys and values together?

The answer: Yes! And Python provides us with an `items` method.

In [64]:
d = {'a':10, 'b':20, 'c':30}

In [66]:
# when you invoke d.items, you get each key-value pair as a two-element tuple

for one_item in d.items():
    print(one_item)

('a', 10)
('b', 20)
('c', 30)


In [67]:
# I can iterate over a sequence of 2-element tuples
# in a for loop by iterating over two loop variables

for key, value in d.items():
    print(f'{key}: {value}')

a: 10
b: 20
c: 30


# Exercise: Word lengths

1. Define an empty dict.
2. Ask the user to enter words, one at a time. If the user enters an empty string, stop asking.
3. Assgn to the dict (a) the word the user entered as a key and (b) the length of that word as a value
4. When the user has finished entering their words (by pressing enter), iterate over the dictionary, one key-value pair at a time, and print them all.

Example:

    Enter word: hello
    Enter word: goodbye
    Enter word: stop
    Enter word: already
    Enter word: [ENTER]
    hello: 5
    goodbye: 7
    stop: 4
    already: 7

In [68]:
words = {}   # empty dict

while True:
    s = input('Enter word: ').strip()

    if s == '':
        break
    
    words[s] = len(s)

print(words)

Enter word:  hello
Enter word:  goodbye
Enter word:  stop
Enter word:  already
Enter word:  


{'hello': 5, 'goodbye': 7, 'stop': 4, 'already': 7}


In [71]:
for key, value in words.items():
    print(f'{key}: {value}')

hello: 5
goodbye: 7
stop: 4
already: 7


In [None]:
# AC

empty={}
words= input("Enter a word: ").strip()
for word in words:b
    empty[word]=len(word)
print(empty)

In [None]:
# PP

d={}

while True:
    word = input ('enter word :').strip()
    length= len(word)
    if word =='':
        break
    if word in d:
        d[word]+=length
    else:
        d[word]=length
print(d)

# How dicts work

It's easy to think about how lists work (at a superficial level): We have elements, and as we add a new element to the end (with `append`), the index goes up. The first item has index 0, the second 1, the third 2, etc.

How can we search for a value in a list? We basically need to use a `for` loop.  If I want to know whether a value is in a list, then I have to go through it, one element at a time, until I either find the value or reach the end. In CS theory, we call this `O(n)`, meaning that the time it takes to find something in a list is proportional to the length of the list. 

A dictionary works completely differently: Where a key-value pair is stored in memory is determined by the key: Python runs a function, known as a "hash function," on the key. The number we get back tells Python where to store the key-value pair. When we then ask `'a' in d`, Python runs the hash function on `'a'`, gets a number back, and looks there in memory. It instantly knows whether the key is there or not. 

This means that no matter how many items you have in a dict, search and retrieval take the same amount of time. We call this `O(1)`, constant time. 

In [72]:
# if you have a list of 2-element tuples, that can be a dict

mylist = [('a', 10), ('b',20), ('c', 30)]

dict(mylist)

{'a': 10, 'b': 20, 'c': 30}

In [73]:
dict([10, 20, 30])

TypeError: cannot convert dictionary update sequence element #0 to a sequence

2024-04April-24.ipynb  README.md	     mini-access-log.txt  wcfile.txt
2024-05May-01.ipynb    README.md~	     nums.txt
2024-05May-08.ipynb    linux-etc-passwd.txt  shoe-data.txt


# Files

We use files every day as computer users. But what is a file, really? It's just a way to take a bunch of data structures from the computer's memory and store them such that when our computer is restarted, or has problems, we can recreate what we had and where.

There are lots of kinds of files out there: Excel files, Word files, PPT, PDF, GIF. JPEG, etc.

We are going to use plain ol' text files. Text files, which have no formatting, are used all over:

- Logging
- Configuration


# Working with files

If I want to read from a text file in my Python program, I cannot just talk to the disk directly. I have to ask the operating system to do so on my behalf. The OS will then give me a data structure that I can use to communicate with the file.

The data structure that the OS gives us to act as a go-between is sometimes known as a "file handle," but you can think of it as an agent or intermediary. It's often called a "file object," but nowadays in Python, we typically call them file-like objects, because there are many that aren't files, but which do implement the API.

To get a file object, I'll need to use the `open` builtin function. This takes a string as an argument, the name of a file. The filename can:

- Start with a `/`, meaning: Starts at the root directory
- Start with letters, and have a `/` inside, meaning: The file is under a subdirectory in the current directory
- Only letters, no `/`: In the current directory

The "Current directory" is wherever you're running Python.

When I `open` a file, I have to tell Python if I want to read from it or write to it. By default, the file is read-only. You can also specify many other possibilities.

In [78]:
f = open('/etc/passwd')   # I'm opening a classic file that's on all Unix/Linux systems -- with all usernames and info

In [79]:
# what is f?
type(f)

_io.TextIOWrapper

In [80]:
# how can I read the file's contents?
# I can use the "read" method
# that returns the entire contents of the file, from the current location (the start, if we just opened)
# through the end of the file.

f.read()

'##\n# User Database\n# \n# Note that this file is consulted directly only when the system is running\n# in single-user mode.  At other times this information is provided by\n# Open Directory.\n#\n# See the opendirectoryd(8) man page for additional information about\n# Open Directory.\n##\nnobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false\nroot:*:0:0:System Administrator:/var/root:/bin/sh\ndaemon:*:1:1:System Services:/var/root:/usr/bin/false\n_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico\n_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false\n_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false\n_installassistant:*:25:25:Install Assistant:/var/empty:/usr/bin/false\n_lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false\n_postfix:*:27:27:Postfix Mail Server:/var/spool/postfix:/usr/bin/false\n_scsd:*:31:31:Service Configuration Service:/var/empty:/usr/bin/false\n_ces:*:32:32:Certificate Enrollment Service:/var/empty:/usr/bin/fa

In [82]:
# a much, *much* better way to read from the file
# is with by iterating over the file!

# iterating over a file gives you, one iteration at a time, the lines of the file
# each line ends with \n

for one_line in open('/etc/passwd'):
    print(one_line, end='')

##
# User Database
# 
# Note that this file is consulted directly only when the system is running
# in single-user mode.  At other times this information is provided by
# Open Directory.
#
# See the opendirectoryd(8) man page for additional information about
# Open Directory.
##
nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false
_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false
_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false
_installassistant:*:25:25:Install Assistant:/var/empty:/usr/bin/false
_lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false
_postfix:*:27:27:Postfix Mail Server:/var/spool/postfix:/usr/bin/false
_scsd:*:31:31:Service Configuration Service:/var/empty:/usr/bin/false
_ces:*:32:32:Certificate Enrollment Service:/var/empty:/usr/bin/false
_appstore:*:33:33

# I want to see all of the usernames in my system

The username in `/etc/passwd` is always the first field of every record.  To see all of the usernames, I need to go through each line, break it up, grab the first element, and then print it on the screen.

In [84]:
for one_line in open('/etc/passwd'):
    if one_line.startswith('#'):
        continue    # go to the next iteration
    
    fields = one_line.split(':')
    print(fields[0])

nobody
root
daemon
_uucp
_taskgated
_networkd
_installassistant
_lp
_postfix
_scsd
_ces
_appstore
_mcxalr
_appleevents
_geod
_devdocs
_sandbox
_mdnsresponder
_ard
_www
_eppc
_cvs
_svn
_mysql
_sshd
_qtss
_cyrus
_mailman
_appserver
_clamav
_amavisd
_jabber
_appowner
_windowserver
_spotlight
_tokend
_securityagent
_calendar
_teamsserver
_update_sharing
_installer
_atsserver
_ftp
_unknown
_softwareupdate
_coreaudiod
_screensaver
_locationd
_trustevaluationagent
_timezone
_lda
_cvmsroot
_usbmuxd
_dovecot
_dpaudio
_postgres
_krbtgt
_kadmin_admin
_kadmin_changepw
_devicemgr
_webauthserver
_netbios
_warmd
_dovenull
_netstatistics
_avbdeviced
_krb_krbtgt
_krb_kadmin
_krb_changepw
_krb_kerberos
_krb_anonymous
_assetcache
_coremediaiod
_launchservicesd
_iconservices
_distnote
_nsurlsessiond
_displaypolicyd
_astris
_krbfast
_gamecontrollerd
_mbsetupuser
_ondemand
_xserverdocs
_wwwproxy
_mobileasset
_findmydevice
_datadetectors
_captiveagent
_ctkd
_applepay
_hidd
_cmiodalassistants
_analyticsd
_fps

In [85]:
!ls *.txt

linux-etc-passwd.txt  mini-access-log.txt  nums.txt  shoe-data.txt  wcfile.txt


# Exercise: Summing numbers

1. Define `total` to be 0.
2. Go through `nums.txt`, one line at a time:
    - Can you turn the current line into an integer? If so, then do so, and add to `total`.
    - If not then go onto the next line.


In [86]:
%pwd

'/Users/reuven/Courses/Current/OReilly-2024-spring-python'

In [90]:
total = 0

for one_line in open('nums.txt'):   # go through the file, one line at a time
    if one_line.strip().isdigit():  # remove whitespace from the sides, and check -- can we turn this into an int?
        n = int(one_line)           # get an int from it
        total += n

print(total)

83


# Next up

- More practice reading from files
- Practice (a little) writing to files

In [91]:
!cat nums.txt

5
	10     
	20
  	3
		   	20        

 25


In [92]:
open('nums.txt').read()

'5\n\t10     \n\t20\n  \t3\n\t\t   \t20        \n\n 25\n'

# Executing commands

Normally, code in Jupyter is in Python (or perhaps another language). But you can also start a line with `!` and issue a command to your operating system. If you're using Windows, then they will be Windows CMD commands. On Unix, they're different.

I often use the following Unix commands in Jupyter:
- `!ls`
- `!cat FILENAME` -- show the contents of FILENAME
- `!head FILENAME` -- show the first 5 lines of FILENAME

# Exercise: IP addresses

We have a file, `mini-access-log.txt`, which contains about 100 lines from an Apache Web server from many years ago. Each line in the file contains information about one request made to the Web server.

1. Create an empty dict, `counts`.
2. Go through the file, one line at a time.
3. On each line, grab the IP address from the start of the line.
4. Either add 1 to the existing count for that IP address, or add a new key-value pair -- the key is the IP address (string) and the value is 1.

In other words:
- The first time we see an IP address, we add it as a key  and 1 as the value
- Subsequent times, we add 1 to the existing value

5. When you're done, iterate over the dict and print its key-value pairs.

In [93]:
!head mini-access-log.txt

67.218.116.165 - - [30/Jan/2010:00:03:18 +0200] "GET /robots.txt HTTP/1.0" 200 99 "-" "Mozilla/5.0 (Twiceler-0.9 http://www.cuil.com/twiceler/robot.html)"
66.249.71.65 - - [30/Jan/2010:00:12:06 +0200] "GET /browse/one_node/1557 HTTP/1.1" 200 39208 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
65.55.106.183 - - [30/Jan/2010:01:29:23 +0200] "GET /robots.txt HTTP/1.1" 200 99 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.183 - - [30/Jan/2010:01:30:06 +0200] "GET /browse/one_model/2162 HTTP/1.1" 200 2181 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
66.249.71.65 - - [30/Jan/2010:02:07:14 +0200] "GET /browse/browse_applet_tab/2593 HTTP/1.1" 200 10305 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.65 - - [30/Jan/2010:02:10:39 +0200] "GET /browse/browse_files_tab/2499?tab=true HTTP/1.1" 200 446 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.12 - - [30/J

In [99]:
counts = {}

for one_line in open('mini-access-log.txt'):
    ip_address = one_line.split()[0]

    if ip_address in counts:      # if we've seen this address before
        counts[ip_address] += 1   # add 1 to its count
    else:
        counts[ip_address] = 1

for key, value in counts.items():
    print(f'{key}:\t{value}')

67.218.116.165:	2
66.249.71.65:	3
65.55.106.183:	2
66.249.65.12:	32
65.55.106.131:	2
65.55.106.186:	2
74.52.245.146:	2
66.249.65.43:	3
65.55.207.25:	2
65.55.207.94:	2
65.55.207.71:	1
98.242.170.241:	1
66.249.65.38:	100
65.55.207.126:	2
82.34.9.20:	2
65.55.106.155:	2
65.55.207.77:	2
208.80.193.28:	1
89.248.172.58:	22
67.195.112.35:	16
65.55.207.50:	3
65.55.215.75:	2


In [100]:
'x' + 5

TypeError: can only concatenate str (not "int") to str

In [101]:
'x' * 5

'xxxxx'

In [102]:
counts = {}

for one_line in open('mini-access-log.txt'):
    ip_address = one_line.split()[0]

    if ip_address in counts:      # if we've seen this address before
        counts[ip_address] += 1   # add 1 to its count
    else:
        counts[ip_address] = 1

for key, value in counts.items():
    print(f'{key}:\t{value * 'x'}')

67.218.116.165:	xx
66.249.71.65:	xxx
65.55.106.183:	xx
66.249.65.12:	xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
65.55.106.131:	xx
65.55.106.186:	xx
74.52.245.146:	xx
66.249.65.43:	xxx
65.55.207.25:	xx
65.55.207.94:	xx
65.55.207.71:	x
98.242.170.241:	x
66.249.65.38:	xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
65.55.207.126:	xx
82.34.9.20:	xx
65.55.106.155:	xx
65.55.207.77:	xx
208.80.193.28:	x
89.248.172.58:	xxxxxxxxxxxxxxxxxxxxxx
67.195.112.35:	xxxxxxxxxxxxxxxx
65.55.207.50:	xxx
65.55.215.75:	xx


In [104]:
# AC

counts={}
for line in open('mini-access-log.txt'):
    for ip in line.split()[0]:
        if ip in counts:
            counts[ip]+=1
        else:
            counts[ip]=1
print(counts)
    

{'6': 472, '7': 65, '.': 618, '2': 263, '1': 141, '8': 178, '5': 277, '4': 175, '9': 182, '0': 28, '3': 126}


# What about writing to files?

Writing to files is a bit trickier:

1. You can't easily both read from and write to a file at the same time. You have to choose. By default, when we open a file, it's for reading.
2. To open a file for writing, pass `'w'` as the second argument to `open`.
    - If you open a file for writing, it will immediately exist and contain 0 bytes.
    - If you open a file for writing, and it already existed, it doesn't any more.
3. You can write to a file by invoking the `write` method on a file object


In [105]:
f = open('myfile.txt', 'w')

f.write('hello\n')
f.write('goodbye\n')

8

In [106]:
!cat myfile.txt

# What's going on?

If every time we invoked `write` in our program, we really wrote data to the disk, our computer would be incredibly slow. We don't want our computer to wait for the disk, which is thousands of times slower.

The OS tells us that it wrote the file, but it didn't. Instead, it wrote the file to an area of memory that is restricted, known as a "buffer." When the buffer fills up, then (and only then) does the data get written to disk.

We can force the buffer to be written with the `flush` method on the file object.

We can also close the file, which automatically flushes it.

In [107]:
f.close()

In [108]:
!cat myfile.txt

hello
goodbye


# A better way -- `with`

If we use the `with` construct to write to our file, it automatically flushes + closes the file at the end. It's a great idea to use `with` whenever you are writing to a file.

This is known as using a "context manager." The file objects implement the "context manager protocol," so that we can use file objects in this way.

In [109]:
with open('myfile2.txt', 'w') as f:  
    f.write('1234\n')
    f.write('5678\n')
    # implicit flush + close

In [110]:
!cat myfile2.txt

1234
5678


In [111]:
with open('/etc/passwd') as f:
    for one_line in f:
        if one_line.startswith('#'):
            continue
        username = one_line.split(':')[0]
        print(username)

nobody
root
daemon
_uucp
_taskgated
_networkd
_installassistant
_lp
_postfix
_scsd
_ces
_appstore
_mcxalr
_appleevents
_geod
_devdocs
_sandbox
_mdnsresponder
_ard
_www
_eppc
_cvs
_svn
_mysql
_sshd
_qtss
_cyrus
_mailman
_appserver
_clamav
_amavisd
_jabber
_appowner
_windowserver
_spotlight
_tokend
_securityagent
_calendar
_teamsserver
_update_sharing
_installer
_atsserver
_ftp
_unknown
_softwareupdate
_coreaudiod
_screensaver
_locationd
_trustevaluationagent
_timezone
_lda
_cvmsroot
_usbmuxd
_dovecot
_dpaudio
_postgres
_krbtgt
_kadmin_admin
_kadmin_changepw
_devicemgr
_webauthserver
_netbios
_warmd
_dovenull
_netstatistics
_avbdeviced
_krb_krbtgt
_krb_kadmin
_krb_changepw
_krb_kerberos
_krb_anonymous
_assetcache
_coremediaiod
_launchservicesd
_iconservices
_distnote
_nsurlsessiond
_displaypolicyd
_astris
_krbfast
_gamecontrollerd
_mbsetupuser
_ondemand
_xserverdocs
_wwwproxy
_mobileasset
_findmydevice
_datadetectors
_captiveagent
_ctkd
_applepay
_hidd
_cmiodalassistants
_analyticsd
_fps

In [None]:
with open('/etc/passwd') as input_data:
    for one_line in input_data:
        if one_line.startswith('#'):
            continue
        username = one_line.split(':')[0]
        print(username)

# Exercise: Dict to config

1. Define a (small) dictionary.
2. Iterate over the dict, one key-value pair at a time.
3. Write each key-value pair to a file on one line, with a `:` between the key and the value.
4. Print the contents of the file.

In [112]:
d = {'a':10, 'b':20, 'c':30}

with open('config.txt', 'w') as f:    # open the file for writing
    for key, value in d.items():      # get each pair from the dict
        f.write(f'{key}:{value}\n')    

In [113]:
!cat config.txt

a:10
b:20
c:30


In [None]:
small={}
for key,value in small.items():
    with open('small.txt','a') as f:
        f.write(f'{key}: {value}\n')
with open('small.txt') as f:
    print(f.read())

# To use my text files with Jupyter:

1. Find out where Jupyter is running with the special `%pwd` command.
2. Download the zipfile from https://files.lerner.co.il (intro Python file) and put it in that directory.
3. Unzip it.
4. You should now be able to read from those files in Jupyter.