# Agenda

1. `*args` 
2. `**kwargs`
3. Keyword-only and positional-only arguments
4. Nested functions and closures
5. Functions as nouns
6. Comprehensions (list, set, dict, nested)
7. `lambda` and sorting and key functions

In [1]:
sum([10, 20, 30])

60

In [2]:
def mysum(numbers):
    total = 0
    
    for one_number in numbers:
        total += one_number
        
    return total

In [3]:
mysum([10, 20, 30])

60

In [4]:
mysum(10, 20, 30)

TypeError: mysum() takes 1 positional argument but 3 were given

In [5]:
def mysum(a=0, b=0, c=0, d=0, e=0):
    return a + b + c + d + e

In [6]:
mysum(10, 20, 30)

60

In [7]:
mysum(10, 20, 30, 40, 50)

150

In [8]:
mysum(10, 20, 30, 40, 50, 60)

TypeError: mysum() takes from 0 to 5 positional arguments but 6 were given

In [10]:
# When I use *args:
# - args (or whatever variable name we use) is a tuple
# - the contents of args will be all of the positional arguments that no other parameter took

def mysum(*numbers):   # "splat args"  == "*args"
    print(f'{numbers=}')
    total = 0
    
    for one_number in numbers:
        total += one_number
        
    return total

In [11]:
mysum(10, 20, 30)

numbers=(10, 20, 30)


60

In [12]:
def myfunc(a, b, *args):
    return f'{a=}, {b=}, {args=}'

In [13]:
myfunc()

TypeError: myfunc() missing 2 required positional arguments: 'a' and 'b'

In [14]:
myfunc(10, 20)

'a=10, b=20, args=()'

In [15]:
myfunc(10, 20, 30)

'a=10, b=20, args=(30,)'

In [16]:
myfunc(10, 20, 30, 40)

'a=10, b=20, args=(30, 40)'

In [18]:
#             b takes positional arguments *but* has a default value, 5
def myfunc(a, b=5, *args):
    return f'{a=}, {b=}, {args=}'

In [19]:
myfunc(10, 20, 30, 40, 50)

'a=10, b=20, args=(30, 40, 50)'

In [20]:
# how can I give a value to a, values to args, and skip over b?
# answer: you can't.

In [21]:
myfunc(10)

'a=10, b=5, args=()'

In [22]:
myfunc(10, args=(10, 20, 30))

TypeError: myfunc() got an unexpected keyword argument 'args'

# Order of parameters

- Mandatory positional (no defaults)
- Optional positional (with defaults)
- `*args` (a tuple, containing all positional arguments that nothing else grabbed)

In [26]:
def mysum(numbers):
    print(f'{numbers=}')
    total = 0
    
    for one_number in numbers:
        total += one_number
        
    return total

In [27]:
mysum.__code__.co_argcount

1

In [28]:
mysum.__code__.co_varnames

('numbers', 'total', 'one_number')

In [29]:
def mysum(*numbers):
    print(f'{numbers=}')
    total = 0
    
    for one_number in numbers:
        total += one_number
        
    return total

In [30]:
mysum.__code__.co_argcount

0

In [31]:
mysum.__code__.co_varnames

('numbers', 'total', 'one_number')

In [32]:
mysum.__code__.co_flags

71

In [33]:
bin(mysum.__code__.co_flags)

'0b1000111'

In [34]:
import dis
dis.show_code(mysum)

Name:              mysum
Filename:          <ipython-input-29-794b1ad59a55>
Argument count:    0
Positional-only arguments: 0
Kw-only arguments: 0
Number of locals:  3
Stack size:        3
Flags:             OPTIMIZED, NEWLOCALS, VARARGS, NOFREE
Constants:
   0: None
   1: 'numbers='
   2: 0
Names:
   0: print
Variable names:
   0: numbers
   1: total
   2: one_number


In [35]:
def mysum(numbers):
    print(f'{numbers=}')
    total = 0
    
    for one_number in numbers:
        total += one_number
        
    return total

In [36]:
dis.show_code(mysum)

Name:              mysum
Filename:          <ipython-input-35-b60cbc9749ad>
Argument count:    1
Positional-only arguments: 0
Kw-only arguments: 0
Number of locals:  3
Stack size:        3
Flags:             OPTIMIZED, NEWLOCALS, NOFREE
Constants:
   0: None
   1: 'numbers='
   2: 0
Names:
   0: print
Variable names:
   0: numbers
   1: total
   2: one_number


In [37]:
bin(mysum.__code__.co_flags)

'0b1000011'

In [38]:
def mysum(*numbers):
    print(f'{numbers=}')
    total = 0
    
    for one_number in numbers:
        total += one_number
        
    return total

In [39]:
mysum(10, 20, 30)

numbers=(10, 20, 30)


60

In [40]:
nums = [10, 20, 30, 40, 50]

mysum(nums)

numbers=([10, 20, 30, 40, 50],)


TypeError: unsupported operand type(s) for +=: 'int' and 'list'

In [41]:
nums = [10, 20, 30, 40, 50]

mysum(*nums)   # in the function call, putting a * before an iterable "unrolls" it

numbers=(10, 20, 30, 40, 50)


150

In [42]:
def add(a, b):
    return a + b

In [43]:
t = (10, 5)

In [44]:
add(t)

TypeError: add() missing 1 required positional argument: 'b'

In [45]:
add(*t)

15

# Exercise: all_lines

1. Define a function, `all_lines`, that takes one mandatory positional argument, `outfilename`.  This will be the name of a file into which you will write the output.
2. The function can then take any number of additional arguments, each of which will be the name of an input file. 
3. Write all of the lines from the input files into the output file -- first all of the lines from the 1st argument, then from the 2nd argument, etc., until all file contents have been written into `outfilename`.

In [47]:
for i in range(5):
    with open(f'file{i}.txt', 'w') as outfile:
        for index, one_word in enumerate('abc def ghi jkl mno'.split()):
            outfile.write(f'{i} {index} {one_word}\n')

In [48]:
!ls *.txt

file0.txt  file1.txt  file2.txt  file3.txt  file4.txt


In [49]:
!cat file0.txt

0 0 abc
0 1 def
0 2 ghi
0 3 jkl
0 4 mno


In [50]:
!cat file1.txt

1 0 abc
1 1 def
1 2 ghi
1 3 jkl
1 4 mno


In [52]:
def all_lines(outfilename, *args):
    with open(outfilename, 'w') as outfile:
        for one_filename in args:
            print(f'Now reading from {one_filename}')
            for one_line in open(one_filename):
                outfile.write(one_line)

In [53]:
all_lines('myoutput.txt', 'file0.txt', 'file1.txt', 'file2.txt', 'file3.txt', 'file4.txt')

Now reading from file0.txt
Now reading from file1.txt
Now reading from file2.txt
Now reading from file3.txt
Now reading from file4.txt


In [54]:
!cat myoutput.txt

0 0 abc
0 1 def
0 2 ghi
0 3 jkl
0 4 mno
1 0 abc
1 1 def
1 2 ghi
1 3 jkl
1 4 mno
2 0 abc
2 1 def
2 2 ghi
2 3 jkl
2 4 mno
3 0 abc
3 1 def
3 2 ghi
3 3 jkl
3 4 mno
4 0 abc
4 1 def
4 2 ghi
4 3 jkl
4 4 mno


In [55]:
import os
os.listdir('.')

['file2.txt',
 'file3.txt',
 'file1.txt',
 'file0.txt',
 'file4.txt',
 'Cisco - 2021-feb-22-advanced.ipynb',
 '.DS_Store',
 'mytypecheck.py',
 'cisco-2021-feb-22.zip',
 'mytypecheck.py~',
 '.mypy_cache',
 'Cisco — 2021 Feb 23.ipynb',
 '.ipynb_checkpoints',
 'myoutput.txt',
 '.git']

In [56]:
import glob
glob.glob('file*.txt')

['file2.txt', 'file3.txt', 'file1.txt', 'file0.txt', 'file4.txt']

In [59]:
all_lines('myoutput.txt', *glob.glob('file*.txt'))

Now reading from file2.txt
Now reading from file3.txt
Now reading from file1.txt
Now reading from file0.txt
Now reading from file4.txt


In [None]:
def all_lines(outfilename, *args):
    with open(outfilename, 'w') as outfile:  
        # outfile.__enter__()   -- for files, this does nothing
        for one_filename in args:
            print(f'Now reading from {one_filename}')
            for one_line in open(one_filename):  # the file is closed automatically, soon after the for loop exits
                outfile.write(one_line)
        # outfile.__exit__()  -- for files, this flushes + closes the file

# Order of parameters

- Mandatory positional (no defaults)
- Optional positional (with defaults)
- `*args` (a tuple, containing all positional arguments that nothing else grabbed)

In [60]:
def add(a, b):
    return a + b

add(a=10, b=5)

15

In [61]:
add(a=10, b=5, c=12345)

TypeError: add() got an unexpected keyword argument 'c'

In [62]:
# **kwargs is a dict, containing all of the keyword arguments
# that no other parameter got

In [63]:
def myfunc(a, b, **kwargs):
    return f'{a=}, {b=}, {kwargs=}'

In [65]:
myfunc(10, 20)

'a=10, b=20, kwargs={}'

In [66]:
myfunc(10, 20, 30)

TypeError: myfunc() takes 2 positional arguments but 3 were given

In [67]:
myfunc(10, 20, x=100, y=200, z=300)

"a=10, b=20, kwargs={'x': 100, 'y': 200, 'z': 300}"

In [68]:
def myfunc(a, b=2, **kwargs):
    return f'{a=}, {b=}, {kwargs=}'

In [69]:
myfunc(3, x=100, y=200)

"a=3, b=2, kwargs={'x': 100, 'y': 200}"

In [70]:
myfunc(a=3, b=4, x=100, y=200)

"a=3, b=4, kwargs={'x': 100, 'y': 200}"

In [72]:
dis.show_code(myfunc)

Name:              myfunc
Filename:          <ipython-input-68-e51580d2f1a0>
Argument count:    2
Positional-only arguments: 0
Kw-only arguments: 0
Number of locals:  3
Stack size:        6
Flags:             OPTIMIZED, NEWLOCALS, VARKEYWORDS, NOFREE
Constants:
   0: None
   1: 'a='
   2: ', b='
   3: ', kwargs='
Variable names:
   0: a
   1: b
   2: kwargs


In [73]:
def myfunc(a, *args, **kwargs):
    return f'{a=}, {args=}, {kwargs=}'

In [74]:
myfunc(10, 20, 30, 40, 50)

'a=10, args=(20, 30, 40, 50), kwargs={}'

In [75]:
myfunc(10, 20, 30, 40, 50, x=100, y=200, z=300)

"a=10, args=(20, 30, 40, 50), kwargs={'x': 100, 'y': 200, 'z': 300}"

# Why do we need `**kwargs`?

1. We have a function that can take lots of different parameters. Rather than define the function with many parameters (and defaults), we can just use `kwargs` and search through the keys and values in the dict for what we want.
2. We have a function that knows what it wants to do with keys and values, but doesn't know what keys or what values it'll get. It'll accept lots of keys and values, whatever comes it way, and then formats/prints/uses them in the standard way.

In [76]:
def myfunc():
    f = open('/etc/passwd')
    
myfunc()    

In [77]:
mylist = [10, 20, 30]
mylist.append(mylist)

In [78]:
mylist

[10, 20, 30, [...]]

In [79]:
len(mylist)

4

In [80]:
mylist[-1]

[10, 20, 30, [...]]

In [81]:
mylist is mylist[-1]

True

# Exercise: write_config

1. Write a function that takes one mandatory argument, `outfilename`, and any number of keyword arguments.
2. The keyword arguments should be written to the file, one pair per line, in the format of `key=value`.

In [82]:
def write_config(outfilename, **kwargs):
    with open(outfilename, 'w') as outfile:
        for key, value in kwargs.items():
            outfile.write(f'{key}={value}\n')

In [83]:
write_config('myconfig.txt', a=1, b=2, c=3, d=[10, 20, 30])

In [84]:
!cat myconfig.txt

a=1
b=2
c=3
d=[10, 20, 30]


In [85]:
d = {'a':1, 'b':2, 'c':3, 'd':[100, 200, 300]}

In [87]:
write_config('myconfig2.txt', **d)

In [88]:
!cat myconfig2.txt

a=1
b=2
c=3
d=[100, 200, 300]


In [89]:
def myfunc(a, b=5, *args):
    return f'{a=}, {b=}, {args=}'

myfunc(10, 20, 30, 40, 50)

'a=10, b=20, args=(30, 40, 50)'

In [90]:
myfunc(10)

'a=10, b=5, args=()'

In [91]:
# now b is a keyword-only argument
def myfunc(a, *args, b=5):
    return f'{a=}, {b=}, {args=}'

myfunc(10, 20, 30, 40, 50)

'a=10, b=5, args=(20, 30, 40, 50)'

In [92]:
myfunc(10, 20, 30, 40, 50, b=999)

'a=10, b=999, args=(20, 30, 40, 50)'

In [93]:
# b is still a keyword-only argument, and it's now mandatory!
def myfunc(a, *args, b):
    return f'{a=}, {b=}, {args=}'

myfunc(10, 20, 30, 40, 50)

TypeError: myfunc() missing 1 required keyword-only argument: 'b'

In [95]:
# b is keyword only, even though we don't have *args in this function

def myfunc(a, *, b):
    return f'{a=}, {b=}'

myfunc(10)

TypeError: myfunc() missing 1 required keyword-only argument: 'b'

In [96]:
myfunc(10, b=30)

'a=10, b=30'

In [97]:
myfunc(10, 20, 30, b=40)

TypeError: myfunc() takes 1 positional argument but 3 positional arguments (and 1 keyword-only argument) were given

In [98]:
myfunc(a=2, b=4)

'a=2, b=4'

# Order of parameters

- Positional-only arguments (before the `/`)
- Mandatory (no defaults, positional or keyword)
- Optional positional (with defaults)
- `*args` (a tuple, containing all positional arguments that nothing else grabbed)
- `*` by itself, if there isn't a `*args` parameter, separates positional from keyword-only
- Mandatory keyword-only arguments
- Optional keyword-only arguments (with defaults)
- `**kwargs` (gets all unclaimed keyword arguments)

In [99]:
len('abcd')

4

In [100]:
help(len)

Help on built-in function len in module builtins:

len(obj, /)
    Return the number of items in a container.



In [101]:
len(obj='abcd')

TypeError: len() takes no keyword arguments

In [102]:
def hello():
    return 'Hello!'

In [103]:
hello.__code__.co_code

b'd\x01S\x00'

In [104]:
dis.dis(hello)

  2           0 LOAD_CONST               1 ('Hello!')
              2 RETURN_VALUE


In [105]:
hello.__code__.co_consts

(None, 'Hello!')

In [106]:
def hello(name):
    return name

In [107]:
dis.dis(hello)

  2           0 LOAD_FAST                0 (name)
              2 RETURN_VALUE


In [108]:
x = 100

def hello():
    return x

In [109]:
dis.dis(hello)

  4           0 LOAD_GLOBAL              0 (x)
              2 RETURN_VALUE


In [110]:
def hello():
    global name
    name = 'hello'
    return name

In [111]:
dis.dis(hello)

  3           0 LOAD_CONST               1 ('hello')
              2 STORE_GLOBAL             0 (name)

  4           4 LOAD_GLOBAL              0 (name)
              6 RETURN_VALUE


In [112]:
def myfunc():
    print('Hello')

In [113]:
dis.dis(myfunc)

  2           0 LOAD_GLOBAL              0 (print)
              2 LOAD_CONST               1 ('Hello')
              4 CALL_FUNCTION            1
              6 POP_TOP
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE


In [114]:
def myfunc():
    print('hello')
    print('hello')
    print('hello')

In [115]:
dis.dis(myfunc)

  2           0 LOAD_GLOBAL              0 (print)
              2 LOAD_CONST               1 ('hello')
              4 CALL_FUNCTION            1
              6 POP_TOP

  3           8 LOAD_GLOBAL              0 (print)
             10 LOAD_CONST               1 ('hello')
             12 CALL_FUNCTION            1
             14 POP_TOP

  4          16 LOAD_GLOBAL              0 (print)
             18 LOAD_CONST               1 ('hello')
             20 CALL_FUNCTION            1
             22 POP_TOP
             24 LOAD_CONST               0 (None)
             26 RETURN_VALUE


In [116]:
myfunc.__code__.co_consts

(None, 'hello')

In [117]:
myfunc.__code__.co_code

b't\x00d\x01\x83\x01\x01\x00t\x00d\x01\x83\x01\x01\x00t\x00d\x01\x83\x01\x01\x00d\x00S\x00'

In [118]:
len(myfunc.__code__.co_code)

28

In [119]:
def hello(name):
    return f'Hello, {name}!'

In [120]:
dis.dis(hello)

  2           0 LOAD_CONST               1 ('Hello, ')
              2 LOAD_FAST                0 (name)
              4 FORMAT_VALUE             0
              6 LOAD_CONST               2 ('!')
              8 BUILD_STRING             3
             10 RETURN_VALUE


In [122]:
len(hello.__code__.co_code)

12

In [123]:
hello.__code__.co_code

b'd\x01|\x00\x9b\x00d\x02\x9d\x03S\x00'

# Remember:

1. When we use `def`, we're creating a function object and assigning it to a variable.
2. When we assign to a variable inside of a function, the variable is local.
3. We can return any Python data structure from a function.

In [124]:
def outer():
    def inner():
        return f'I am from inner!'
    return inner

In [125]:
f = outer()

In [126]:
type(f)

function

In [127]:
f()

'I am from inner!'

In [128]:
f2 = outer()

In [129]:
f2()

'I am from inner!'

In [138]:
# closure -- a function that retains access to its enclosing function's local variables

def outer(x):
    def inner(y):
        return f'Here, {x=} and {y=}'
    return inner

In [135]:
func1 = outer(10)
func2 = outer(20)

In [136]:
func1(5)

'Here, x=10 and y=5'

In [137]:
func2(6)

'Here, x=20 and y=6'

In [140]:
func1.__code__.co_freevars  # what variables come from the enclosing function?

('x',)

In [142]:
outer.__code__.co_cellvars  # what variables will be used by my inner functions?

('x',)

In [143]:
def outer(x):
    counter = 0  # local to outer, but available to inner
    
    def inner(y):
        return f'[{counter=}] Here, {x=} and {y=}'
    return inner

In [144]:
func1 = outer(10)

In [145]:
func1(5)

'[counter=0] Here, x=10 and y=5'

In [146]:
func1(6)

'[counter=0] Here, x=10 and y=6'

In [147]:
def outer(x):
    counter = 0  # local to outer, but available to inner
    
    def inner(y):
        counter += 1   # this is a local variable
        return f'[{counter=}] Here, {x=} and {y=}'
    return inner

In [148]:
func1 = outer(10)

In [149]:
func1(5)

UnboundLocalError: local variable 'counter' referenced before assignment

In [150]:
def outer(x):
    counter = 0  # local to outer, but available to inner
    
    def inner(y):
        nonlocal counter  # any assignment to counter goes to the outer scope
        counter += 1   
        return f'[{counter=}] Here, {x=} and {y=}'
    return inner

In [151]:
func1 = outer(10)

In [152]:
func1(5)

'[counter=1] Here, x=10 and y=5'

In [153]:
func1(6)

'[counter=2] Here, x=10 and y=6'

In [154]:
func1(7)

'[counter=3] Here, x=10 and y=7'

# Exercise: password generator generator

1. Write a function, `make_password_generator`, which takes one argument, a string.
2. It should return a function (`make_password`) that takes an integer as an argument.
3. When the inner function is called, it should return a string of the stated length, with each character taken from the outer function's argument.

It'll help to know that `random.choice(data)` returns one random element from the sequence `data`.

In [156]:
import random

def make_password_generator(s):
    def make_password(n):
        output = ''
        for i in range(n):
            output += random.choice(s)
        return output
    return make_password

make_alpha_password = make_password_generator('abcde')
pw1 = make_alpha_password(5)
pw2 = make_alpha_password(10)



In [157]:
pw1

'beade'

In [158]:
pw2

'cdccacadad'

In [159]:
make_symbol_password = make_password_generator('!@#$%^&*()_+')
pw3 = make_symbol_password(5)
pw4 = make_symbol_password(10)


In [160]:
pw3

'$#^_&'

In [161]:
pw4

'@)@_+$!@&&'

In [None]:
import random

def make_password_generator(s):
    def make_password(n):
        return ''.join([random.choice(s)
                        for i in range(n)])
    return make_password

make_alpha_password = make_password_generator('abcde')
pw1 = make_alpha_password(5)
pw2 = make_alpha_password(10)



In [163]:
def a():
    return "I'm in A!"

def b():
    return "I'm in B!"

while s := input('Enter a choice: ').strip():
    if s == 'a':
        print(a())
    elif s == 'b':
        print(b())
    else:
        print(f'{s} is not a valid option')

Enter a choice: a
I'm in A!
Enter a choice: b
I'm in B!
Enter a choice: c
c is not a valid option
Enter a choice: 


In [164]:
def a():
    return "I'm in A!"

def b():
    return "I'm in B!"

# dispatch table
ops = {'a':a,
       'b':b}

while s := input('Enter a choice: ').strip():
    if s in ops:
        print(ops[s]())
    else:
        print(f'{s} is not a valid option')

Enter a choice: a
I'm in A!
Enter a choice: b
I'm in B!
Enter a choice: c
c is not a valid option
Enter a choice: 


In [165]:
def hello(name):
    return f'Hello, {name}!'

In [167]:
globals()['hello']

<function __main__.hello(name)>

In [169]:
x = 10

'yes' if x == 10 else 'no'

'yes'

# Exercise: Calculator

1. Ask the user repeatedly to enter a math expression in prefix notation, meaning `+ 2 3`.
2. You should expect only two numbers, and the operators can be `+`, `-`, `/`, and `*`.
3. If the user enters one one of the expected operators, then invoke the appropriate function and print the result.  
4. If the user enters an unexpected operator, print an error message. (And let them try again.)
5. If the user enters an empty string, then stop asking.
6. Use a dispatch table (i.e., a dict with functions) to implement your calculator's functionality.

In [171]:
def add(a, b):
    return a + b

def sub(a, b):
    return a - b

def div(a, b):
    return a / b

def mul(a, b):
    return a * b

ops = {'+':add,
      '-':sub,
      '/':div,
      '*':mul}

while s := input('Enter math expression: ').strip():
    op, *numbers = s.split()
    
    int_numbers = []
    for one_number in numbers:
        int_numbers.append(int(one_number))
    
    if op in ops:
        print(ops[op](*int_numbers))
    else:
        print(f'No such operator {op}')

Enter math expression: + 2 2
4
Enter math expression: - 10 3
7
Enter math expression: 


In [173]:
def add(a, b):
    return a + b

def sub(a, b):
    return a - b

def div(a, b):
    return a / b

def mul(a, b):
    return a * b

ops = {'+':add,
      '-':sub,
      '/':div,
      '*':mul}

while s := input('Enter math expression: ').strip():
    op, *numbers = s.split()
    
    if op in ops:
        print(ops[op](*[int(one_number)
                        for one_number in numbers]))
    else:
        print(f'No such operator {op}')

Enter math expression: / 10 5
2.0
Enter math expression: * 5 3
15
Enter math expression: 


In [174]:
import operator

ops = {'+':operator.add,
      '-':operator.sub,
      '/':operator.truediv,
      '*':operator.mul}

while s := input('Enter math expression: ').strip():
    op, *numbers = s.split()
    
    if op in ops:
        print(ops[op](*[int(one_number)
                        for one_number in numbers]))
    else:
        print(f'No such operator {op}')

Enter math expression: * 10 5
50
Enter math expression: + 2 3
5
Enter math expression: 


# Functional programming

1. Treat all data as immutable.
2. Avoid assignment as much as possible.
3. Treat functions as nouns, not just verbs.

# Topics in functional programming

1. Comprehensions
   - List comprehensions
   - Dict comprehensions
   - Set comprehensions
   - Nested comprehensions
2. Sorting and key functions
3. `lambda`
4. `map`, `filter`, and `reduce`

In [175]:
numbers = list(range(10))
numbers

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [176]:
# I want a list of these numbers, squared

# traditionally, I would write:

output = []

for one_number in numbers:
    output.append(one_number ** 2)

In [177]:
output

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [178]:
# the better way is: list comprehensions

[one_number ** 2            # ANY VALID expression -- SELECT
 for one_number in numbers] # ANY VALID iteration  -- FROM

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [179]:
[print(one_number ** 2)            # ANY VALID expression -- SELECT
 for one_number in numbers] # ANY VALID iteration  -- FROM

0
1
4
9
16
25
36
49
64
81


[None, None, None, None, None, None, None, None, None, None]

In [182]:
s = '10 20 30 40 50'

# I want to sum these numbers
sum([int(one_number)
     for one_number in s.split()])

150

In [183]:
[int(one_number)
     for one_number in s.split()]

[10, 20, 30, 40, 50]

# Exercises: Comprehensions

1. Ask the user to enter a string. Using a comprehension, find out how many non-whitespace characters the string contains.
2. Ask the user to enter a sentence. Use a comprehension to return the sentence, but with each word capitalized. The result should be the same as running `str.title` on the string. But don't use `str.title`! You may use `str.capitalize`, if you want.

In [184]:
s = input('Enter a sentence: ').strip()

Enter a sentence: this is a very interesting sentence


In [185]:
len(s)

35

In [187]:
len(s.replace(' ', ''))

30

In [190]:
sum([len(one_word)
 for one_word in s.split()])

30

In [191]:
s

'this is a very interesting sentence'

In [192]:
s.title()

'This Is A Very Interesting Sentence'

In [193]:
s.capitalize()

'This is a very interesting sentence'

In [196]:
' '.join([one_word.capitalize()
          for one_word in s.split()])

'This Is A Very Interesting Sentence'

In [199]:
mylist = 'abcd efgh ij'.split()
mylist

['abcd', 'efgh', 'ij']

In [200]:
'*'.join(mylist)    # GLUE.join(ITERABLE)

'abcd*efgh*ij'

In [201]:
s.split()

['this', 'is', 'a', 'very', 'interesting', 'sentence']

In [203]:
'***'.join(s.split())

'this***is***a***very***interesting***sentence'

In [204]:
'*'.join([one_word
          for one_word in s.split()])

'this*is*a*very*interesting*sentence'

In [205]:
'*'.join([one_word.upper()
          for one_word in s.split()])

'THIS*IS*A*VERY*INTERESTING*SENTENCE'

In [206]:
'*'.join([one_word[0]
          for one_word in s.split()])

't*i*a*v*i*s'

In [207]:
'*'.join([one_word.capitalize()
          for one_word in s.split()])

'This*Is*A*Very*Interesting*Sentence'

In [208]:
' '.join([one_word.capitalize()
          for one_word in s.split()])

'This Is A Very Interesting Sentence'

In [210]:
[one_line.split(':')
 for one_line in open('/etc/passwd')]

[['##\n'],
 ['# User Database\n'],
 ['# \n'],
 ['# Note that this file is consulted directly only when the system is running\n'],
 ['# in single-user mode.  At other times this information is provided by\n'],
 ['# Open Directory.\n'],
 ['#\n'],
 ['# See the opendirectoryd(8) man page for additional information about\n'],
 ['# Open Directory.\n'],
 ['##\n'],
 ['nobody',
  '*',
  '-2',
  '-2',
  'Unprivileged User',
  '/var/empty',
  '/usr/bin/false\n'],
 ['root', '*', '0', '0', 'System Administrator', '/var/root', '/bin/sh\n'],
 ['daemon', '*', '1', '1', 'System Services', '/var/root', '/usr/bin/false\n'],
 ['_uucp',
  '*',
  '4',
  '4',
  'Unix to Unix Copy Protocol',
  '/var/spool/uucp',
  '/usr/sbin/uucico\n'],
 ['_taskgated',
  '*',
  '13',
  '13',
  'Task Gate Daemon',
  '/var/empty',
  '/usr/bin/false\n'],
 ['_networkd',
  '*',
  '24',
  '24',
  'Network Services',
  '/var/networkd',
  '/usr/bin/false\n'],
 ['_installassistant',
  '*',
  '25',
  '25',
  'Install Assistant',
  '/va

In [211]:
# get all usernames in /etc/passwd
[one_line.split(':')[0]
 for one_line in open('/etc/passwd')]

['##\n',
 '# User Database\n',
 '# \n',
 '# Note that this file is consulted directly only when the system is running\n',
 '# in single-user mode.  At other times this information is provided by\n',
 '# Open Directory.\n',
 '#\n',
 '# See the opendirectoryd(8) man page for additional information about\n',
 '# Open Directory.\n',
 '##\n',
 'nobody',
 'root',
 'daemon',
 '_uucp',
 '_taskgated',
 '_networkd',
 '_installassistant',
 '_lp',
 '_postfix',
 '_scsd',
 '_ces',
 '_appstore',
 '_mcxalr',
 '_appleevents',
 '_geod',
 '_devdocs',
 '_sandbox',
 '_mdnsresponder',
 '_ard',
 '_www',
 '_eppc',
 '_cvs',
 '_svn',
 '_mysql',
 '_sshd',
 '_qtss',
 '_cyrus',
 '_mailman',
 '_appserver',
 '_clamav',
 '_amavisd',
 '_jabber',
 '_appowner',
 '_windowserver',
 '_spotlight',
 '_tokend',
 '_securityagent',
 '_calendar',
 '_teamsserver',
 '_update_sharing',
 '_installer',
 '_atsserver',
 '_ftp',
 '_unknown',
 '_softwareupdate',
 '_coreaudiod',
 '_screensaver',
 '_locationd',
 '_trustevaluationagent',
 '

In [212]:
# get all usernames in /etc/passwd
[one_line.split(':')[0]                # expression
 for one_line in open('/etc/passwd')   # iteration
 if not one_line.startswith('#')]      # condition

['nobody',
 'root',
 'daemon',
 '_uucp',
 '_taskgated',
 '_networkd',
 '_installassistant',
 '_lp',
 '_postfix',
 '_scsd',
 '_ces',
 '_appstore',
 '_mcxalr',
 '_appleevents',
 '_geod',
 '_devdocs',
 '_sandbox',
 '_mdnsresponder',
 '_ard',
 '_www',
 '_eppc',
 '_cvs',
 '_svn',
 '_mysql',
 '_sshd',
 '_qtss',
 '_cyrus',
 '_mailman',
 '_appserver',
 '_clamav',
 '_amavisd',
 '_jabber',
 '_appowner',
 '_windowserver',
 '_spotlight',
 '_tokend',
 '_securityagent',
 '_calendar',
 '_teamsserver',
 '_update_sharing',
 '_installer',
 '_atsserver',
 '_ftp',
 '_unknown',
 '_softwareupdate',
 '_coreaudiod',
 '_screensaver',
 '_locationd',
 '_trustevaluationagent',
 '_timezone',
 '_lda',
 '_cvmsroot',
 '_usbmuxd',
 '_dovecot',
 '_dpaudio',
 '_postgres',
 '_krbtgt',
 '_kadmin_admin',
 '_kadmin_changepw',
 '_devicemgr',
 '_webauthserver',
 '_netbios',
 '_warmd',
 '_dovenull',
 '_netstatistics',
 '_avbdeviced',
 '_krb_krbtgt',
 '_krb_kadmin',
 '_krb_changepw',
 '_krb_kerberos',
 '_krb_anonymous',
 '_asse

In [213]:
numbers = [10, 20, 25, 35, 40, 100, 150, 155, 175]

[one_number ** 2
 for one_number in numbers]

[100, 400, 625, 1225, 1600, 10000, 22500, 24025, 30625]

In [215]:
# I only want (a) odd numbers (b) bigger than 100

[one_number ** 2
 for one_number in numbers
 if one_number % 2 and one_number > 100]

[24025, 30625]

In [216]:
# I only want (a) odd numbers (b) bigger than 100

[one_number ** 2
 for one_number in numbers
 if one_number % 2 
 if one_number > 100]

[24025, 30625]

In [217]:
!cat nums.txt

5
	10     
	20
  	3
		   	20        

 25


#  Exercise: Sum the numbers

Use a list comprehension to read through `nums.txt` and add the integers together. (The answer is 83.)

In [219]:
[int(one_line)
for one_line in open('nums.txt')]

ValueError: invalid literal for int() with base 10: '\n'

In [220]:
int('5')

5

In [221]:
int('    5       ')

5

In [222]:
int('')

ValueError: invalid literal for int() with base 10: ''

In [223]:
int()

0

In [224]:
[int(one_line)
for one_line in open('nums.txt')
if one_line.strip()]

[5, 10, 20, 3, 20, 25]

In [225]:
[int(one_line)
for one_line in open('nums.txt')
if one_line.strip().isdigit()]

[5, 10, 20, 3, 20, 25]

In [226]:
sum([int(one_line)
for one_line in open('nums.txt')
if one_line.strip()])

83

In [227]:
!head shoe-data.txt

Adidas	orange	43
Nike	black	41
Adidas	black	39
New Balance	pink	41
Nike	white	44
New Balance	orange	38
Nike	pink	44
Adidas	pink	44
New Balance	orange	39
New Balance	black	43


# Exercise: Shoe data

1. Use a list comprehension to create a list of dicts from `shoe-data.txt`.
2. Each of the lines has three columns, separated by tabs (`'\t'`).
3. Each dict (and there will be 100 of them in the end) will have three key-value pairs, with the keys being `brand`, `color`, `size`.

Suggestion: Use a function in your expression, rather than trying to do it all inline.

In [230]:
def line_to_dict(one_line):
    brand, color, size = one_line.strip().split('\t')
    return {'brand':brand,
           'color':color,
           'size':size}

[line_to_dict(one_line)
for one_line in open('shoe-data.txt')]

[{'brand': 'Adidas', 'color': 'orange', 'size': '43'},
 {'brand': 'Nike', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'black', 'size': '39'},
 {'brand': 'New Balance', 'color': 'pink', 'size': '41'},
 {'brand': 'Nike', 'color': 'white', 'size': '44'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '38'},
 {'brand': 'Nike', 'color': 'pink', 'size': '44'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '44'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '39'},
 {'brand': 'New Balance', 'color': 'black', 'size': '43'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '44'},
 {'brand': 'Nike', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '37'},
 {'brand': 'Adidas', 'color': 'black', 'size': '38'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '41'},
 {'brand': 'Adidas', 'color': 'white', 'size': '36'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '36'},
 {'brand': 'Nike', 'color': 'pink', 'size': '41'},
 {'brand': '

In [None]:
def line_to_dict(one_line):
    brand, color, size = one_line.strip().split('\t')
    return {'brand':brand,
           'color':color,
           'size':size}

[line_to_dict(one_line)
for one_line in open('shoe-data.txt')]

In [231]:
z = zip('abcd', [10, 20, 30, 40])

In [232]:
list(z)

[('a', 10), ('b', 20), ('c', 30), ('d', 40)]

In [251]:
z = zip('abcd', [10, 20, 30, 40])

In [252]:
list(z)

[('a', 10), ('b', 20), ('c', 30), ('d', 40)]

In [254]:
dict(list(z))

{}

In [None]:
def line_to_dict(one_line):
    brand, color, size = one_line.strip().split('\t')
    return {'brand':brand,
           'color':color,
           'size':size}

[line_to_dict(one_line)
for one_line in open('shoe-data.txt')]

In [255]:
dict

dict

In [257]:
dict(a=1, b=2)

{'a': 1, 'b': 2}

In [258]:
dict([('a', 1), ('b', 2)])

{'a': 1, 'b': 2}

In [259]:
dict(zip('abc', [10, 20, 30]))

{'a': 10, 'b': 20, 'c': 30}

In [260]:
def line_to_dict(one_line):
    return dict(zip(['brand', 'color', 'size'],
                   one_line.strip().split('\t')))

[line_to_dict(one_line)
for one_line in open('shoe-data.txt')]

[{'brand': 'Adidas', 'color': 'orange', 'size': '43'},
 {'brand': 'Nike', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'black', 'size': '39'},
 {'brand': 'New Balance', 'color': 'pink', 'size': '41'},
 {'brand': 'Nike', 'color': 'white', 'size': '44'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '38'},
 {'brand': 'Nike', 'color': 'pink', 'size': '44'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '44'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '39'},
 {'brand': 'New Balance', 'color': 'black', 'size': '43'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '44'},
 {'brand': 'Nike', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '37'},
 {'brand': 'Adidas', 'color': 'black', 'size': '38'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '41'},
 {'brand': 'Adidas', 'color': 'white', 'size': '36'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '36'},
 {'brand': 'Nike', 'color': 'pink', 'size': '41'},
 {'brand': '

# Next up:

1. Dict comprehensions
2. Set comprehensions
3. Nested comprehensions
4. Sorting and key functions
5. `lambda`

In [262]:
# (1) username and (2) user ID -- index 0 and index 2

[ [  one_line.split(':')[0], one_line.split(':')[2]   ]
 for one_line in open('/etc/passwd')
if not one_line.startswith('#')]

[['nobody', '-2'],
 ['root', '0'],
 ['daemon', '1'],
 ['_uucp', '4'],
 ['_taskgated', '13'],
 ['_networkd', '24'],
 ['_installassistant', '25'],
 ['_lp', '26'],
 ['_postfix', '27'],
 ['_scsd', '31'],
 ['_ces', '32'],
 ['_appstore', '33'],
 ['_mcxalr', '54'],
 ['_appleevents', '55'],
 ['_geod', '56'],
 ['_devdocs', '59'],
 ['_sandbox', '60'],
 ['_mdnsresponder', '65'],
 ['_ard', '67'],
 ['_www', '70'],
 ['_eppc', '71'],
 ['_cvs', '72'],
 ['_svn', '73'],
 ['_mysql', '74'],
 ['_sshd', '75'],
 ['_qtss', '76'],
 ['_cyrus', '77'],
 ['_mailman', '78'],
 ['_appserver', '79'],
 ['_clamav', '82'],
 ['_amavisd', '83'],
 ['_jabber', '84'],
 ['_appowner', '87'],
 ['_windowserver', '88'],
 ['_spotlight', '89'],
 ['_tokend', '91'],
 ['_securityagent', '92'],
 ['_calendar', '93'],
 ['_teamsserver', '94'],
 ['_update_sharing', '95'],
 ['_installer', '96'],
 ['_atsserver', '97'],
 ['_ftp', '98'],
 ['_unknown', '99'],
 ['_softwareupdate', '200'],
 ['_coreaudiod', '202'],
 ['_screensaver', '203'],
 ['_loc

In [264]:
# (1) username and (2) user ID -- index 0 and index 2

dict([ one_line.split(':')[:3:2]   # [start:end+1:step]
 for one_line in open('/etc/passwd')
if not one_line.startswith('#')])

{'nobody': '-2',
 'root': '0',
 'daemon': '1',
 '_uucp': '4',
 '_taskgated': '13',
 '_networkd': '24',
 '_installassistant': '25',
 '_lp': '26',
 '_postfix': '27',
 '_scsd': '31',
 '_ces': '32',
 '_appstore': '33',
 '_mcxalr': '54',
 '_appleevents': '55',
 '_geod': '56',
 '_devdocs': '59',
 '_sandbox': '60',
 '_mdnsresponder': '65',
 '_ard': '67',
 '_www': '70',
 '_eppc': '71',
 '_cvs': '72',
 '_svn': '73',
 '_mysql': '74',
 '_sshd': '75',
 '_qtss': '76',
 '_cyrus': '77',
 '_mailman': '78',
 '_appserver': '79',
 '_clamav': '82',
 '_amavisd': '83',
 '_jabber': '84',
 '_appowner': '87',
 '_windowserver': '88',
 '_spotlight': '89',
 '_tokend': '91',
 '_securityagent': '92',
 '_calendar': '93',
 '_teamsserver': '94',
 '_update_sharing': '95',
 '_installer': '96',
 '_atsserver': '97',
 '_ftp': '98',
 '_unknown': '99',
 '_softwareupdate': '200',
 '_coreaudiod': '202',
 '_screensaver': '203',
 '_locationd': '205',
 '_trustevaluationagent': '208',
 '_timezone': '210',
 '_lda': '211',
 '_cvmsro

In [265]:
# dict comprehension -- this creates *ONE* dictionary

{ one_line.split(':')[0]  : one_line.split(':')[2]
 for one_line in open('/etc/passwd')
if not one_line.startswith('#') }

{'nobody': '-2',
 'root': '0',
 'daemon': '1',
 '_uucp': '4',
 '_taskgated': '13',
 '_networkd': '24',
 '_installassistant': '25',
 '_lp': '26',
 '_postfix': '27',
 '_scsd': '31',
 '_ces': '32',
 '_appstore': '33',
 '_mcxalr': '54',
 '_appleevents': '55',
 '_geod': '56',
 '_devdocs': '59',
 '_sandbox': '60',
 '_mdnsresponder': '65',
 '_ard': '67',
 '_www': '70',
 '_eppc': '71',
 '_cvs': '72',
 '_svn': '73',
 '_mysql': '74',
 '_sshd': '75',
 '_qtss': '76',
 '_cyrus': '77',
 '_mailman': '78',
 '_appserver': '79',
 '_clamav': '82',
 '_amavisd': '83',
 '_jabber': '84',
 '_appowner': '87',
 '_windowserver': '88',
 '_spotlight': '89',
 '_tokend': '91',
 '_securityagent': '92',
 '_calendar': '93',
 '_teamsserver': '94',
 '_update_sharing': '95',
 '_installer': '96',
 '_atsserver': '97',
 '_ftp': '98',
 '_unknown': '99',
 '_softwareupdate': '200',
 '_coreaudiod': '202',
 '_screensaver': '203',
 '_locationd': '205',
 '_trustevaluationagent': '208',
 '_timezone': '210',
 '_lda': '211',
 '_cvmsro

In [266]:
{ fields[0]  : fields[2]
 for one_line in open('/etc/passwd')
if not one_line.startswith('#') and (fields := one_line.split(':'))}

{'nobody': '-2',
 'root': '0',
 'daemon': '1',
 '_uucp': '4',
 '_taskgated': '13',
 '_networkd': '24',
 '_installassistant': '25',
 '_lp': '26',
 '_postfix': '27',
 '_scsd': '31',
 '_ces': '32',
 '_appstore': '33',
 '_mcxalr': '54',
 '_appleevents': '55',
 '_geod': '56',
 '_devdocs': '59',
 '_sandbox': '60',
 '_mdnsresponder': '65',
 '_ard': '67',
 '_www': '70',
 '_eppc': '71',
 '_cvs': '72',
 '_svn': '73',
 '_mysql': '74',
 '_sshd': '75',
 '_qtss': '76',
 '_cyrus': '77',
 '_mailman': '78',
 '_appserver': '79',
 '_clamav': '82',
 '_amavisd': '83',
 '_jabber': '84',
 '_appowner': '87',
 '_windowserver': '88',
 '_spotlight': '89',
 '_tokend': '91',
 '_securityagent': '92',
 '_calendar': '93',
 '_teamsserver': '94',
 '_update_sharing': '95',
 '_installer': '96',
 '_atsserver': '97',
 '_ftp': '98',
 '_unknown': '99',
 '_softwareupdate': '200',
 '_coreaudiod': '202',
 '_screensaver': '203',
 '_locationd': '205',
 '_trustevaluationagent': '208',
 '_timezone': '210',
 '_lda': '211',
 '_cvmsro

In [267]:
{ one_line.split(':')[0]  : one_line.split(':')[1:]
 for one_line in open('/etc/passwd')
if not one_line.startswith('#') }

{'nobody': ['*',
  '-2',
  '-2',
  'Unprivileged User',
  '/var/empty',
  '/usr/bin/false\n'],
 'root': ['*', '0', '0', 'System Administrator', '/var/root', '/bin/sh\n'],
 'daemon': ['*', '1', '1', 'System Services', '/var/root', '/usr/bin/false\n'],
 '_uucp': ['*',
  '4',
  '4',
  'Unix to Unix Copy Protocol',
  '/var/spool/uucp',
  '/usr/sbin/uucico\n'],
 '_taskgated': ['*',
  '13',
  '13',
  'Task Gate Daemon',
  '/var/empty',
  '/usr/bin/false\n'],
 '_networkd': ['*',
  '24',
  '24',
  'Network Services',
  '/var/networkd',
  '/usr/bin/false\n'],
 '_installassistant': ['*',
  '25',
  '25',
  'Install Assistant',
  '/var/empty',
  '/usr/bin/false\n'],
 '_lp': ['*',
  '26',
  '26',
  'Printing Services',
  '/var/spool/cups',
  '/usr/bin/false\n'],
 '_postfix': ['*',
  '27',
  '27',
  'Postfix Mail Server',
  '/var/spool/postfix',
  '/usr/bin/false\n'],
 '_scsd': ['*',
  '31',
  '31',
  'Service Configuration Service',
  '/var/empty',
  '/usr/bin/false\n'],
 '_ces': ['*',
  '32',
  '3

In [268]:
!ls *.txt

file0.txt  file3.txt		 mini-access-log.txt  myoutput.txt
file1.txt  file4.txt		 myconfig.txt	      nums.txt
file2.txt  linux-etc-passwd.txt  myconfig2.txt	      shoe-data.txt


In [269]:
!cat myconfig.txt

a=1
b=2
c=3
d=[10, 20, 30]


# Exercise: dict comprehension

Use a dict comprehension to read from a config file, and turn it into name-value pairs in a dict. Note that whatever values you read will be strings, regardless of what they look like or originally were.

In [272]:
{one_line.strip().split('=')[0]  : one_line.strip().split('=')[1]
 for one_line in open('myconfig.txt')}

{'a': '1', 'b': '2', 'c': '3', 'd': '[10, 20, 30]'}

In [273]:
# this is dangerous!

{one_line.strip().split('=')[0]  : eval(one_line.strip().split('=')[1])
 for one_line in open('myconfig.txt')}

{'a': 1, 'b': 2, 'c': 3, 'd': [10, 20, 30]}

In [274]:
# eval overlaps 75% with evil!

# Set comprehension

- Returns a new set
- Just like a list comprehension, but uses `{}`
- Don't put `:` between things, or it becomes a dict comprehension

In [275]:
s = '10 20 30 40 50 10 20 30 40 50'

# list comprehension
sum([int(one_number)
     for one_number in s.split()])

300

In [276]:
s = '10 20 30 40 50 10 20 30 40 50'

# set comprehension
sum({int(one_number)
     for one_number in s.split()})

150

In [278]:
!head -20 /etc/passwd

##
# User Database
# 
# Note that this file is consulted directly only when the system is running
# in single-user mode.  At other times this information is provided by
# Open Directory.
#
# See the opendirectoryd(8) man page for additional information about
# Open Directory.
##
nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false
_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false
_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false
_installassistant:*:25:25:Install Assistant:/var/empty:/usr/bin/false
_lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false
_postfix:*:27:27:Postfix Mail Server:/var/spool/postfix:/usr/bin/false
_scsd:*:31:31:Service Configuration Service:/var/empty:/usr/bin/false


In [283]:
# What are the different shells in /etc/passwd?

{one_line.strip().split(':')[-1]
 for one_line in open('/etc/passwd')
 if not one_line.startswith('#')}

{'/bin/bash', '/bin/sh', '/usr/bin/false', '/usr/sbin/uucico'}

# Exercises with comprehensions

1. Create a set of the different IP addresses in `mini-access-log.txt`.
2. Find the 3 most common IP addresses in `mini-access-log.txt`.


In [284]:
!head mini-access-log.txt

67.218.116.165 - - [30/Jan/2010:00:03:18 +0200] "GET /robots.txt HTTP/1.0" 200 99 "-" "Mozilla/5.0 (Twiceler-0.9 http://www.cuil.com/twiceler/robot.html)"
66.249.71.65 - - [30/Jan/2010:00:12:06 +0200] "GET /browse/one_node/1557 HTTP/1.1" 200 39208 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
65.55.106.183 - - [30/Jan/2010:01:29:23 +0200] "GET /robots.txt HTTP/1.1" 200 99 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.183 - - [30/Jan/2010:01:30:06 +0200] "GET /browse/one_model/2162 HTTP/1.1" 200 2181 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
66.249.71.65 - - [30/Jan/2010:02:07:14 +0200] "GET /browse/browse_applet_tab/2593 HTTP/1.1" 200 10305 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.65 - - [30/Jan/2010:02:10:39 +0200] "GET /browse/browse_files_tab/2499?tab=true HTTP/1.1" 200 446 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.12 - -

In [287]:
{one_line.split()[0]
for one_line in open('mini-access-log.txt')}

{'208.80.193.28',
 '65.55.106.131',
 '65.55.106.155',
 '65.55.106.183',
 '65.55.106.186',
 '65.55.207.126',
 '65.55.207.25',
 '65.55.207.50',
 '65.55.207.71',
 '65.55.207.77',
 '65.55.207.94',
 '65.55.215.75',
 '66.249.65.12',
 '66.249.65.38',
 '66.249.65.43',
 '66.249.71.65',
 '67.195.112.35',
 '67.218.116.165',
 '74.52.245.146',
 '82.34.9.20',
 '89.248.172.58',
 '98.242.170.241'}

In [289]:
from collections import Counter

c = Counter([one_line.split()[0]
         for one_line in open('mini-access-log.txt')])
c

Counter({'67.218.116.165': 2,
         '66.249.71.65': 3,
         '65.55.106.183': 2,
         '66.249.65.12': 32,
         '65.55.106.131': 2,
         '65.55.106.186': 2,
         '74.52.245.146': 2,
         '66.249.65.43': 3,
         '65.55.207.25': 2,
         '65.55.207.94': 2,
         '65.55.207.71': 1,
         '98.242.170.241': 1,
         '66.249.65.38': 100,
         '65.55.207.126': 2,
         '82.34.9.20': 2,
         '65.55.106.155': 2,
         '65.55.207.77': 2,
         '208.80.193.28': 1,
         '89.248.172.58': 22,
         '67.195.112.35': 16,
         '65.55.207.50': 3,
         '65.55.215.75': 2})

In [291]:
for key, value in c.items():
    print(f'{key:20}: {value}')

67.218.116.165      : 2
66.249.71.65        : 3
65.55.106.183       : 2
66.249.65.12        : 32
65.55.106.131       : 2
65.55.106.186       : 2
74.52.245.146       : 2
66.249.65.43        : 3
65.55.207.25        : 2
65.55.207.94        : 2
65.55.207.71        : 1
98.242.170.241      : 1
66.249.65.38        : 100
65.55.207.126       : 2
82.34.9.20          : 2
65.55.106.155       : 2
65.55.207.77        : 2
208.80.193.28       : 1
89.248.172.58       : 22
67.195.112.35       : 16
65.55.207.50        : 3
65.55.215.75        : 2


In [294]:
for key, value in c.items():
    print(f'{key:20}: {"x" * (value // 2)}')

67.218.116.165      : x
66.249.71.65        : x
65.55.106.183       : x
66.249.65.12        : xxxxxxxxxxxxxxxx
65.55.106.131       : x
65.55.106.186       : x
74.52.245.146       : x
66.249.65.43        : x
65.55.207.25        : x
65.55.207.94        : x
65.55.207.71        : 
98.242.170.241      : 
66.249.65.38        : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
65.55.207.126       : x
82.34.9.20          : x
65.55.106.155       : x
65.55.207.77        : x
208.80.193.28       : 
89.248.172.58       : xxxxxxxxxxx
67.195.112.35       : xxxxxxxx
65.55.207.50        : x
65.55.215.75        : x


In [296]:
c.most_common(3)

[('66.249.65.38', 100), ('66.249.65.12', 32), ('89.248.172.58', 22)]

In [297]:
mylist = [[10, 20, 30], [40, 45, 50, 55, 60, 65], 
          [70, 75, 80, 85, 90], [100, 110, 115, 120]]

mylist

[[10, 20, 30],
 [40, 45, 50, 55, 60, 65],
 [70, 75, 80, 85, 90],
 [100, 110, 115, 120]]

In [298]:
[one_number
 for one_number in mylist]

[[10, 20, 30],
 [40, 45, 50, 55, 60, 65],
 [70, 75, 80, 85, 90],
 [100, 110, 115, 120]]

In [299]:
[one_number
 for one_sublist in mylist
 for one_number in one_sublist]

[10, 20, 30, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 100, 110, 115, 120]

In [300]:
# I find this to be unreadable!
[one_number for one_sublist in mylist for one_number in one_sublist]

[10, 20, 30, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 100, 110, 115, 120]

In [301]:
[(x,y)
 for x in range(10)
 for y in range(10)]

[(0, 0),
 (0, 1),
 (0, 2),
 (0, 3),
 (0, 4),
 (0, 5),
 (0, 6),
 (0, 7),
 (0, 8),
 (0, 9),
 (1, 0),
 (1, 1),
 (1, 2),
 (1, 3),
 (1, 4),
 (1, 5),
 (1, 6),
 (1, 7),
 (1, 8),
 (1, 9),
 (2, 0),
 (2, 1),
 (2, 2),
 (2, 3),
 (2, 4),
 (2, 5),
 (2, 6),
 (2, 7),
 (2, 8),
 (2, 9),
 (3, 0),
 (3, 1),
 (3, 2),
 (3, 3),
 (3, 4),
 (3, 5),
 (3, 6),
 (3, 7),
 (3, 8),
 (3, 9),
 (4, 0),
 (4, 1),
 (4, 2),
 (4, 3),
 (4, 4),
 (4, 5),
 (4, 6),
 (4, 7),
 (4, 8),
 (4, 9),
 (5, 0),
 (5, 1),
 (5, 2),
 (5, 3),
 (5, 4),
 (5, 5),
 (5, 6),
 (5, 7),
 (5, 8),
 (5, 9),
 (6, 0),
 (6, 1),
 (6, 2),
 (6, 3),
 (6, 4),
 (6, 5),
 (6, 6),
 (6, 7),
 (6, 8),
 (6, 9),
 (7, 0),
 (7, 1),
 (7, 2),
 (7, 3),
 (7, 4),
 (7, 5),
 (7, 6),
 (7, 7),
 (7, 8),
 (7, 9),
 (8, 0),
 (8, 1),
 (8, 2),
 (8, 3),
 (8, 4),
 (8, 5),
 (8, 6),
 (8, 7),
 (8, 8),
 (8, 9),
 (9, 0),
 (9, 1),
 (9, 2),
 (9, 3),
 (9, 4),
 (9, 5),
 (9, 6),
 (9, 7),
 (9, 8),
 (9, 9)]

In [303]:
[one_number
 for one_sublist in mylist
 if len(one_sublist) > 3
 for one_number in one_sublist]

[40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 100, 110, 115, 120]

In [304]:
# produces a list with the odd elements of mylist's longer sublists
[one_number
 for one_sublist in mylist
 if len(one_sublist) > 3
 for one_number in one_sublist
 if one_number % 2]

[45, 55, 65, 75, 85, 115]

In [305]:
!head movies.dat

1::Toy Story (1995)::Animation|Children's|Comedy
2::Jumanji (1995)::Adventure|Children's|Fantasy
3::Grumpier Old Men (1995)::Comedy|Romance
4::Waiting to Exhale (1995)::Comedy|Drama
5::Father of the Bride Part II (1995)::Comedy
6::Heat (1995)::Action|Crime|Thriller
7::Sabrina (1995)::Comedy|Romance
8::Tom and Huck (1995)::Adventure|Children's
9::Sudden Death (1995)::Action
10::GoldenEye (1995)::Action|Adventure|Thriller


In [306]:
!wc movies.dat

  3883  15672 171308 movies.dat


# Exercise: Movie categories

Use a list comprehension (and associated tools) to find the 5 most common movie categories in `movies.dat`.  Note that most movies have more than one category -- just count them multiple times.

In [308]:
[one_line
 for one_line in open('movies.dat', encoding='latin-1')]

["1::Toy Story (1995)::Animation|Children's|Comedy\n",
 "2::Jumanji (1995)::Adventure|Children's|Fantasy\n",
 '3::Grumpier Old Men (1995)::Comedy|Romance\n',
 '4::Waiting to Exhale (1995)::Comedy|Drama\n',
 '5::Father of the Bride Part II (1995)::Comedy\n',
 '6::Heat (1995)::Action|Crime|Thriller\n',
 '7::Sabrina (1995)::Comedy|Romance\n',
 "8::Tom and Huck (1995)::Adventure|Children's\n",
 '9::Sudden Death (1995)::Action\n',
 '10::GoldenEye (1995)::Action|Adventure|Thriller\n',
 '11::American President, The (1995)::Comedy|Drama|Romance\n',
 '12::Dracula: Dead and Loving It (1995)::Comedy|Horror\n',
 "13::Balto (1995)::Animation|Children's\n",
 '14::Nixon (1995)::Drama\n',
 '15::Cutthroat Island (1995)::Action|Adventure|Romance\n',
 '16::Casino (1995)::Drama|Thriller\n',
 '17::Sense and Sensibility (1995)::Drama|Romance\n',
 '18::Four Rooms (1995)::Thriller\n',
 '19::Ace Ventura: When Nature Calls (1995)::Comedy\n',
 '20::Money Train (1995)::Action\n',
 '21::Get Shorty (1995)::Action|C