# Agenda

- Scoping (LEGB rule)
    - local vs. global variables
    - builtins
- Inner functions + closures
- storing functions
- comprehensions
    - list, set, dict
- passing functions as arguments
- `lambda`

# Scoping

What variable exists when? What value is available when?

Python's scoping rules are *very* straightforward. You just have to follow them to understand what's happening with variables. **BUT** the rules are very different from other languages.

In [1]:
x = 100

print(f'x = {x}')  # Python asks: Is x global?  Yes, we get 100

x = 100


# Python has four scopes

- `L` Local -- we start here when we're inside of a function body
- `E` Enclosing
- `G` Global -- we start here when we're *outside* of a function body
- `B` Builtin

In [2]:
globals()   # returns a dict of all global variables -- variable names are keys, variable values are values

{'__name__': '__main__',
 '__doc__': 'Automatically created module for IPython interactive environment',
 '__package__': None,
 '__loader__': None,
 '__spec__': None,
 '__builtin__': <module 'builtins' (built-in)>,
 '__builtins__': <module 'builtins' (built-in)>,
 '_ih': ['',
  "x = 100\n\nprint(f'x = {x}')",
  'globals()   # returns a dict of all global variables -- variable names are keys, variable values are values'],
 '_oh': {},
 '_dh': ['/Users/reuven/Courses/Current/Cisco-2021-11Nov-advanced'],
 'In': ['',
  "x = 100\n\nprint(f'x = {x}')",
  'globals()   # returns a dict of all global variables -- variable names are keys, variable values are values'],
 'Out': {},
 'get_ipython': <bound method InteractiveShell.get_ipython of <ipykernel.zmqshell.ZMQInteractiveShell object at 0x112a0b2b0>>,
 'exit': <IPython.core.autocall.ZMQExitAutocall at 0x112a0be20>,
 'quit': <IPython.core.autocall.ZMQExitAutocall at 0x112a0be20>,
 '_': '',
 '__': '',
 '___': '',
 '_i': "x = 100\n\nprint(f'x = {

In [3]:
# is x a global? We can find out!

'x' in globals()

True

In [4]:
globals()['x']

100

In [5]:
# Local, (Enclosing), Global, Builtin

x = 100

def myfunc():
    print(f'In myfunc, x = {x}') # Is x local? No.  Is x global? Yes, 100

print(f'Before, x = {x}')  # Python asks: Is x global? Yes, 100
myfunc()
print(f'After, x = {x}')   # Python asks: Is x global? Yes, 100

Before, x = 100
In myfunc, x = 100
After, x = 100


In [6]:
# to see a function's local variables, check __code__.co_varnames

myfunc.__code__.co_varnames

()

In [7]:
# Local, (Enclosing), Global, Builtin

x = 100

def myfunc():
    x = 200
    print(f'In myfunc, x = {x}') # is x local? Yes, 200

print(f'Before, x = {x}')  # is x global? Yes, 100
myfunc()
print(f'After, x = {x}')   # is x global? Yes, 100

Before, x = 100
In myfunc, x = 200
After, x = 100


In [8]:
# Local, (Enclosing), Global, Builtin

x = 100

def myfunc():
    print(f'In myfunc, x = {x}') # is x local? Yes. Value is ........ boom!
    x = 200  # hoisting problem -- if you assign to a variable in a function, that variable is local NO MATTER WHERE YOU ASSIGN

print(f'Before, x = {x}')  # is x global? yes -- 100
myfunc()
print(f'After, x = {x}')   

Before, x = 100


UnboundLocalError: local variable 'x' referenced before assignment

In [9]:
myfunc.__code__.co_varnames

('x',)

In [11]:
# Local, (Enclosing), Global, Builtin

x = 100

def myfunc():
    x = x + 1
    print(f'In myfunc, x = {x}') 

print(f'Before, x = {x}')
myfunc()
print(f'After, x = {x}')   

Before, x = 100


UnboundLocalError: local variable 'x' referenced before assignment

In [16]:
# Local, (Enclosing), Global, Builtin

x = 100

def myfunc():
    global x   # when compiling the function, *DON'T* mark x as local!
    x = 200    # assign to the global variable x! (if there is no global x, this creates one)
    print(f'In myfunc, x = {x}')  # is x local? No. Is x global? yes -- 200

print(f'Before, x = {x}')  
myfunc()
print(f'After, x = {x}')   # is x global? yes, 200

Before, x = 100
In myfunc, x = 200
After, x = 200


In [15]:
myfunc.__code__.co_varnames

()

In [17]:
for i in range(5):
    n = i**2

In [18]:
# does n still exist?
n

16

In [19]:
# does i still exist?
i

4

In [20]:
# Local, (Enclosing), Global, Builtin

y = [10, 20, 30]

def myfunc():
    y[0] = '!'  # is y local? no. is y global? yes.  then it runs y.__setitem__(0, '!')
    print(f'In myfunc, {y=}')  # set y[0] to '!', which is true for y, the global variable

print(f'Before, {y=}')    # is y global? Yes, [10, 20, 30]
myfunc()
print(f'After, {y=}')     # is y global? Yes, ['!', 20, 30]

Before, y=[10, 20, 30]
In myfunc, y=['!', 20, 30]
After, y=['!', 20, 30]


In [21]:
myfunc.__code__.co_varnames

()

# Does Python have keywords?

Yes. `def`, `for`, `while`, `class`, `and`, `or`, `in`.... all of these are keywords. You cannot assign to them. You cannot change them.  They're part of the language.

But what about some other words, like `str`, `len`, `sum`?  Are those keywords? **NO**.  Can I assign to them?  Unfortunately, YES.  Those words exist in a final, default namespace known as "builtins," available as `__builtin__`.

In [22]:
sum([10, 20, 30])  # is sum global? No. Is it in builtins? Yes.

60

In [24]:
sum = 5  # defined a global variable "sum"

In [25]:
sum([10, 20, 30])  # is sum global? Yes!  Its value is 5.

TypeError: 'int' object is not callable

In [26]:
# how can I get out of this?
del(sum)   # looks scary, but I'm really only deleting the name in the global namespace

In [27]:
sum([10, 20, 30])

60

In [28]:
x = 5

# Summary of what we know about functions

1. Functions are objects, just like everything else in Python.
2. We can create a function inside of another function.  
3. We can return a function from another function -- because we can return any object from a function.
4. When we use `def`, we (a) define a new function object and (b) assign it to a variable.

The result of this combination of rules leads us to... inner functions!

In [29]:
def outer():
    def inner():  # "inner" is a local variable inside of "outer"
        print('Hello from inner!')
    return inner

outer.__code__.co_varnames

('inner',)

In [30]:
f = outer()

In [31]:
type(f)

function

In [32]:
f

<function __main__.outer.<locals>.inner()>

In [33]:
# how do I run a function? ()
f()

Hello from inner!


In [37]:
# Local, Enclosing, Global, Builtin
# Enclosing means: Local variables from the enclosing function are still available to our inner function

def outer(x):
    def inner(y): 
        return f'Hello from inner, {x=} and {y=}!'
    return inner

f = outer(10)

In [38]:
outer.__code__.co_varnames

('x', 'inner')

In [39]:
f(20)

'Hello from inner, x=10 and y=20!'

In [41]:
# Local, Enclosing, Global, Builtin
# Enclosing means: Local variables from the enclosing function are still available to our inner function

def outer(x):   # closure -- a function that is returned by another function, and has access to the outer function's variables
    def inner(y): 
        return f'Hello from inner, {x=} and {y=}!'
    return inner

# Make two separate instances of "inner", each with its own separate enclosing scope with x= something else
f1 = outer(10)
f2 = outer(15)

In [42]:
f1(3)

'Hello from inner, x=10 and y=3!'

In [43]:
f2(3)

'Hello from inner, x=15 and y=3!'

# Exercise: Password maker maker

1. Write a function, `make_password_maker`, which takes a string argument.  That string contains all of the characters from which we might want to make a password.
2. `make_password_maker`, when invoked with a string, returns a function.  The returned function will take a single integer argument.  3. When called, the returned (inner) function returns a string with a password randomly selected from the characters in our string (i.e., outer function's parameter).

```python
make_alpha_password = make_password_maker('abcdefghij')
print(make_alpha_password(5))  # get a 5-character password, randomly taken from a-j
print(make_alpha_password(20)) # get a 20-character password, randomly taken from a-j

make_symbol_password = make_password_maker('!@#$%^&*()')
print(make_symbol_password(5))  # get a 5-character password
print(make_symbol_password(20)) # get a 20-character password
```

To get a random character from a string (or any Python sequence), use `random.choice(s)`, which returns one element.

In [44]:
import random

def make_password_maker(s):
    def make_password(n):
        output = ''
        for i in range(n):
            output += random.choice(s)
        return output
    return make_password

In [45]:
make_alpha_password = make_password_maker('abcdefghij')
print(make_alpha_password(5))  # get a 5-character password, randomly taken from a-j
print(make_alpha_password(20)) # get a 20-character password, randomly taken from a-j

make_symbol_password = make_password_maker('!@#$%^&*()')
print(make_symbol_password(5))  # get a 5-character password
print(make_symbol_password(20)) # get a 20-character password


ejjeb
igiiicijegaadifjhgbe
)@#%#
^^&(%@*@^$*#@*(*)*!(


In [51]:
def greet():
    counter = 0
    def inner(name):
        nonlocal counter  # don't make counter local. Rather, access (and update) the local variable in the enclosing scope
        counter += 1
        return f'{counter} Hello, {name}!'
    return inner

hello = greet()

print(hello('a'))
print(hello('b'))

1 Hello, a!
2 Hello, b!


In [52]:
hello('c')

'3 Hello, c!'

In [53]:
hello('d')

'4 Hello, d!'

In [54]:
def greet(s):
    counter = 0
    def inner(name):
        nonlocal counter  
        counter += 1
        return f'{counter} {s}, {name}!'
    return inner

In [55]:
hello = greet('hello')
goodbye = greet('goodbye')

In [56]:
hello('out there')

'1 hello, out there!'

In [57]:
hello('again')


'2 hello, again!'

In [58]:
hello('I am probably annoying you with all of these hellos!')

'3 hello, I am probably annoying you with all of these hellos!!'

In [59]:
goodbye('whoever')

'1 goodbye, whoever!'

In [60]:
def hi(name):
    return f'Hi, {name}!'

In [62]:
# each time I run a function, I get a new frame / stack frame with local variables
# normally, the end of a function's run is also the end of the frame -- the local storage goes away
# but because our inner function refers to its local variables, the outer frame cannot go away

In [63]:
hello('a')

'4 hello, a!'

In [64]:
goodbye('b')

'2 goodbye, b!'

In [66]:
hello.__code__.co_freevars  # these are the two variables we're going to use from the enclosing function

('counter', 's')

In [67]:
greet.__code__.co_cellvars  # these are the variables that our inner function will refer to 

('counter', 's')

In [68]:
def a():
    return 'Hello from A!'

def b():
    return 'Hello from B!'

while True:
    s = input('Enter a choice: ').strip()
    
    if not s:
        break
        
    if s == 'a':
        print(a())
    elif s == 'b':
        print(b())
    else:
        print(f'No such choice {s}')

Enter a choice:  a


Hello from A!


Enter a choice:  b


Hello from B!


Enter a choice:  c


No such choice c


Enter a choice:  


In [69]:
def a():
    return 'Hello from A!'

def b():
    return 'Hello from B!'

# dispatch table -- choose a function from a dict

# store the functions as values in a dict
funcs = {'a':a,  # keys are strings, values are funcs
         'b':b} 

while True:
    s = input('Enter a choice: ').strip()
    
    if not s:
        break
        
    if s in funcs:  # is s a key in our dict?
        print(funcs[s]())  # retrieve the function funcs[s], then execute it with ()
    else:
        print(f'No such choice {s}')

Enter a choice:  a


Hello from A!


Enter a choice:  b


Hello from B!


Enter a choice:  c


No such choice c


Enter a choice:  


# Exercise: Calculator

1. Define two functions, `add` and `sub`, which can add and subtract two numbers.
2. Put them in a dictionary as values.
3. Ask the user, repeatedly, to enter a math expression with either `+` or `-`.
4. Choose the appropriate function, and print the result.
5. If the user tries to enter an illegal expression or use an operator we don't support, scold them a bit.

Example:

    Enter expression: 2 + 3
    2 + 3 = 5
    Enter expression: 10 - 8
    10 - 8 = 2
    Enter expression: [ENTER]
    

In [70]:
def add(first, second):
    return first + second

def sub(first, second):
    return first - second

funcs = {'+':add,
         '-':sub}

while s := input('Enter expression: ').strip():
    n1, op, n2 = s.split()
    
    n1 = int(n1)
    n2 = int(n2)
    
    if op in funcs:
        result = funcs[op](n1, n2)
        
    else:
        result = f'No such operator {op}'
        
    print(f'{n1} {op} {n2} = {result}')
    

Enter expression:  2 + 3


2 + 3 = 5


Enter expression:  10 - 8


10 - 8 = 2


Enter expression:  10 * 5


10 * 5 = No such operator *


Enter expression:  


In [71]:
import operator  # module containing functions that do Python's operator things

funcs = {'+':operator.add,
         '-':operator.sub,
        '*':operator.mul,
        '/':operator.truediv}

while s := input('Enter expression: ').strip():
    n1, op, n2 = s.split()
    
    n1 = int(n1)
    n2 = int(n2)
    
    if op in funcs:
        result = funcs[op](n1, n2)
        
    else:
        result = f'No such operator {op}'
        
    print(f'{n1} {op} {n2} = {result}')
    

Enter expression:  2 + 500


2 + 500 = 502


Enter expression:  2 * 2345


2 * 2345 = 4690


Enter expression:  


# Functional programming

1. Comprehensions
    - List comprehensions
    - Set comprehensions
    - Dict comprehensions
    - Nested comprehensions
2. Functions as arguments to other functions
    - Sorting
    - What kinds of functions can we use?
    - Writing functions that take other functions
3. Other functional techniques
    - `lambda`
    - `map` and `filter`
    - `reduce`
    

In [73]:
numbers = range(10)

# I want a list with each element from "numbers" squared (i.e., to the 2nd power)

output = []

for one_number in numbers:
    output.append(one_number ** 2)
    
output      # unfortunately, this works!

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [75]:
# list comrehension -- achieves the same goals

numbers = range(10)

# this is a list comprehension! -- square brackets mean a list
# the output list contains the result of running our expression on each element of numbers
output = [one_number ** 2 for one_number in numbers]

output

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

# When should I use a comprehension?

- When you have an input sequence
- You want to get a list back, based on that input sequence
- You can express the difference between the input and output as a single Python expression

In [77]:
[one_number ** 2             # expression -- SELECT
 for one_number in numbers]  # iteration  -- FROM

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [78]:
# never use print inside of a comprehension
# the point of a comprehension is the return value from our expression

[print(one_number)
  for one_number in numbers]

0
1
2
3
4
5
6
7
8
9


[None, None, None, None, None, None, None, None, None, None]

In [79]:
mylist = ['a', 'bc', 'def', 'g']

'*'.join(mylist)    # glue.join(string_sequence)

'a*bc*def*g'

In [80]:
mylist = [10, 20, 30]

'*'.join(mylist)

TypeError: sequence item 0: expected str instance, int found

In [81]:
# I have a list of ints
# I want a list of strings
# I can convert int->string with "str"

[str(one_item)
 for one_item in mylist]

['10', '20', '30']

In [83]:
# create a new list with the comprehension,
# and pass that new list (of strings) to '*'.join

'*'.join([str(one_item)
          for one_item in mylist])          

'10*20*30'

In [85]:
words = 'this is a bunch of words'


In [86]:
# the method str.capitalize returns a string with all lowercase, except for the first character,
# which is capitalized
words.capitalize()

'This is a bunch of words'

In [87]:
# the method str.title returns a string with each word's first character capitalized,
# and the rest lowercase
words.title()

'This Is A Bunch Of Words'

In [90]:
# I'm going to pretend that str.title doesn't exist.
# can I use str.capitalize instead?

# Input is a list of strings (words.split())
# Output is a single string
# Convert from a string to a capitalized string with str.capitalize... then use join to combine them

' '.join([one_word.capitalize()
  for one_word in words.split()])

'This Is A Bunch Of Words'

# Exercises with comprehensions

1. Ask the user to enter some integers, separated by spaces. Add them together, and print the result.
2. Ask the user to enter a sentence. Count the number of non-whitespace characters in the sentence. (Don't use `str.replace`)

In [91]:
s = input('Enter some integers: ').strip()

sum(s.split())  # you cannot add together a list of strings to 0

Enter some integers:  10 20 30


TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [95]:
# I have a list of strings (s.split())
# I want a list of integers (to feed to sum)
# I can move from string -> int running int()

s = input('Enter some integers: ').strip()

sum([int(one_item)
 for one_item in s.split()])

Enter some integers:  10 15 20 35


80

In [96]:
# enter a sentence, and count the non-whitespace characters

s = input('Enter a sentence: ').strip()

print(len(s))

Enter a sentence:  this is a test


14


In [98]:
# I have a list of strings
# I want a list of integers, so that I can run sum() on it
# I can run len() on each item

s = input('Enter a sentence: ').strip()

sum([len(one_word)
 for one_word in s.split()])

Enter a sentence:  this is a test


11

In [100]:
# this returns a list of strings, with each string being a line in the file
[one_line
  for one_line in open('/etc/passwd')]

['##\n',
 '# User Database\n',
 '# \n',
 '# Note that this file is consulted directly only when the system is running\n',
 '# in single-user mode.  At other times this information is provided by\n',
 '# Open Directory.\n',
 '#\n',
 '# See the opendirectoryd(8) man page for additional information about\n',
 '# Open Directory.\n',
 '##\n',
 'nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false\n',
 'root:*:0:0:System Administrator:/var/root:/bin/sh\n',
 'daemon:*:1:1:System Services:/var/root:/usr/bin/false\n',
 '_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico\n',
 '_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false\n',
 '_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false\n',
 '_installassistant:*:25:25:Install Assistant:/var/empty:/usr/bin/false\n',
 '_lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false\n',
 '_postfix:*:27:27:Postfix Mail Server:/var/spool/postfix:/usr/bin/false\n',
 '_scsd:*:31:31:Service Configuration Servi

In [103]:
# list of strings -- the usernames in /etc/passwd

# when I use "if" in a list comprehension, the output might be shorter than the input

[one_line.split(':')[0]                 # expression -- SELECT
  for one_line in open('/etc/passwd')   # iteration  -- FROM
  if not one_line.startswith('#')]      # condition  -- WHERE

['nobody',
 'root',
 'daemon',
 '_uucp',
 '_taskgated',
 '_networkd',
 '_installassistant',
 '_lp',
 '_postfix',
 '_scsd',
 '_ces',
 '_appstore',
 '_mcxalr',
 '_appleevents',
 '_geod',
 '_devdocs',
 '_sandbox',
 '_mdnsresponder',
 '_ard',
 '_www',
 '_eppc',
 '_cvs',
 '_svn',
 '_mysql',
 '_sshd',
 '_qtss',
 '_cyrus',
 '_mailman',
 '_appserver',
 '_clamav',
 '_amavisd',
 '_jabber',
 '_appowner',
 '_windowserver',
 '_spotlight',
 '_tokend',
 '_securityagent',
 '_calendar',
 '_teamsserver',
 '_update_sharing',
 '_installer',
 '_atsserver',
 '_ftp',
 '_unknown',
 '_softwareupdate',
 '_coreaudiod',
 '_screensaver',
 '_locationd',
 '_trustevaluationagent',
 '_timezone',
 '_lda',
 '_cvmsroot',
 '_usbmuxd',
 '_dovecot',
 '_dpaudio',
 '_postgres',
 '_krbtgt',
 '_kadmin_admin',
 '_kadmin_changepw',
 '_devicemgr',
 '_webauthserver',
 '_netbios',
 '_warmd',
 '_dovenull',
 '_netstatistics',
 '_avbdeviced',
 '_krb_krbtgt',
 '_krb_kadmin',
 '_krb_changepw',
 '_krb_kerberos',
 '_krb_anonymous',
 '_asse

In [104]:
!cat nums.txt

5
	10     
	20
  	3
		   	20        

 25


# Exercise: Sum the numbers

Use a list comprehension to sum the numbers in `nums.txt`.
- You should read the file line by line (not all at once)
- Only one line lacks any integer
- No line contains more than one integer

In [105]:
[one_line
 for one_line in open('nums.txt')]

['5\n',
 '\t10     \n',
 '\t20\n',
 '  \t3\n',
 '\t\t   \t20        \n',
 '\n',
 ' 25\n']

In [106]:
[int(one_line)
 for one_line in open('nums.txt')]

ValueError: invalid literal for int() with base 10: '\n'

In [107]:
int('5')

5

In [108]:
int('    5    ')

5

In [109]:
int('\n\n5\n\n')

5

In [110]:
int()

0

In [111]:
int(' ')

ValueError: invalid literal for int() with base 10: ' '

In [112]:
[int(one_line)
 for one_line in open('nums.txt')
 if one_line.strip()]  # only calculate the int if we have something left after removing whitespace

[5, 10, 20, 3, 20, 25]

In [113]:
[int(one_line)
 for one_line in open('nums.txt')
 if one_line.strip().isdigit()]  # only calculate the int if we have digits left after removing whitespace

[5, 10, 20, 3, 20, 25]

In [114]:
sum([int(one_line)
 for one_line in open('nums.txt')
 if one_line.strip().isdigit()])

83

In [115]:
!head shoe-data.txt

Adidas	orange	43
Nike	black	41
Adidas	black	39
New Balance	pink	41
Nike	white	44
New Balance	orange	38
Nike	pink	44
Adidas	pink	44
New Balance	orange	39
New Balance	black	43


# Exercise: Shoes to dicts

1. I want to create a list of dicts, based on the file `shoe-data.txt`.
2. In the file, we have 100 lines, each of which contains three columns -- a brand, a color, and a size.  The columns are separated by tab characters (`'\t'`).
3. Use a list comprehension to turn each line into a dict with three keys -- `brand`, `color`, and `size`.  You can keep the size as a string, if you want; it won't make a difference.
4. I suggest writing a function that returns a dict, which you'll then call for each line in the file.

Note: We don't want one big dictionary. We want a list of 100 dicts, each of which has the same three keys.

In [120]:
def line_to_dict(one_line):
    fields = one_line.strip().split('\t')
    
    return {'brand':fields[0],
            'color':fields[1],
            'size':fields[2]}

[line_to_dict(one_line)
  for one_line in open('shoe-data.txt')]

[{'brand': 'Adidas', 'color': 'orange', 'size': '43'},
 {'brand': 'Nike', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'black', 'size': '39'},
 {'brand': 'New Balance', 'color': 'pink', 'size': '41'},
 {'brand': 'Nike', 'color': 'white', 'size': '44'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '38'},
 {'brand': 'Nike', 'color': 'pink', 'size': '44'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '44'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '39'},
 {'brand': 'New Balance', 'color': 'black', 'size': '43'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '44'},
 {'brand': 'Nike', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '37'},
 {'brand': 'Adidas', 'color': 'black', 'size': '38'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '41'},
 {'brand': 'Adidas', 'color': 'white', 'size': '36'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '36'},
 {'brand': 'Nike', 'color': 'pink', 'size': '41'},
 {'brand': '

In [121]:
def line_to_dict(one_line):
    brand, color, size = one_line.strip().split('\t')
    
    return {'brand':brand,
            'color':color,
            'size':size}

[line_to_dict(one_line)
  for one_line in open('shoe-data.txt')]

[{'brand': 'Adidas', 'color': 'orange', 'size': '43'},
 {'brand': 'Nike', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'black', 'size': '39'},
 {'brand': 'New Balance', 'color': 'pink', 'size': '41'},
 {'brand': 'Nike', 'color': 'white', 'size': '44'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '38'},
 {'brand': 'Nike', 'color': 'pink', 'size': '44'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '44'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '39'},
 {'brand': 'New Balance', 'color': 'black', 'size': '43'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '44'},
 {'brand': 'Nike', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '37'},
 {'brand': 'Adidas', 'color': 'black', 'size': '38'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '41'},
 {'brand': 'Adidas', 'color': 'white', 'size': '36'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '36'},
 {'brand': 'Nike', 'color': 'pink', 'size': '41'},
 {'brand': '

In [122]:
# I can create a dict with a list of tuples

dict([('a', 1), ('b', 2), ('c', 3)])

{'a': 1, 'b': 2, 'c': 3}

In [123]:
# one way to create a list of tuples is with "zip"

s = 'abc'
mylist = [1,2,3]

# zip creates a list of tuples
# tuple 0 has index 0 for each input
# tuple 1 has index 1 for each input 
# ...
# tuple n has index n for each input

list(zip(s, mylist))

[('a', 1), ('b', 2), ('c', 3)]

In [124]:
def line_to_dict(one_line):
    return dict(zip(['brand', 'color', 'size'],
                    one_line.strip().split('\t')))
    
[line_to_dict(one_line)
  for one_line in open('shoe-data.txt')]

[{'brand': 'Adidas', 'color': 'orange', 'size': '43'},
 {'brand': 'Nike', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'black', 'size': '39'},
 {'brand': 'New Balance', 'color': 'pink', 'size': '41'},
 {'brand': 'Nike', 'color': 'white', 'size': '44'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '38'},
 {'brand': 'Nike', 'color': 'pink', 'size': '44'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '44'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '39'},
 {'brand': 'New Balance', 'color': 'black', 'size': '43'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '44'},
 {'brand': 'Nike', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '37'},
 {'brand': 'Adidas', 'color': 'black', 'size': '38'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '41'},
 {'brand': 'Adidas', 'color': 'white', 'size': '36'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '36'},
 {'brand': 'Nike', 'color': 'pink', 'size': '41'},
 {'brand': '

In [125]:
s = {10, 20, 30, 10, 20, 30, 20, 30, 10, 40, 20, 30}
s

{10, 20, 30, 40}

In [126]:
mylist = [10, 20, 30, 10, 20, 30, 40, 20, 30, 40]

# how can I create a set from mylist?

set(mylist)

{10, 20, 30, 40}

In [128]:
{*mylist}  # this creates a set based on the elements of mylist

{10, 20, 30, 40}

In [129]:
{mylist}

TypeError: unhashable type: 'list'

In [133]:
# sum the unique numbers that the user gave us

s = input('Enter integers, separated by spaces: ').strip()

sum(set([int(one_number)
 for one_number in s.split()]))

Enter integers, separated by spaces:  10 20 30 10 20 30


60

In [134]:
# we can create a *set comprehension*
# just like a list comprehension, except:
# (1) we use {}
# (2) all elements must be hashable
# (3) we return a set

s = input('Enter integers, separated by spaces: ').strip()

sum({int(one_number)
 for one_number in s.split()})

Enter integers, separated by spaces:  10 20 30 10 20 30


60

# Exercise: Unix shells

1. In the zipfile I gave you is the file `linux-etc-passwd.txt`. I want to get a set, showing the different shells that Unix users have on my system.
2. The shell is the final field of each record.


In [135]:
!head linux-etc-passwd.txt

# This is a comment
# You should ignore me
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin


In [138]:
[one_line 
 for one_line in open('linux-etc-passwd.txt')
 if not one_line.startswith('#') and not one_line.startswith('\n')]

['root:x:0:0:root:/root:/bin/bash\n',
 'daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin\n',
 'bin:x:2:2:bin:/bin:/usr/sbin/nologin\n',
 'sys:x:3:3:sys:/dev:/usr/sbin/nologin\n',
 'sync:x:4:65534:sync:/bin:/bin/sync\n',
 'games:x:5:60:games:/usr/games:/usr/sbin/nologin\n',
 'man:x:6:12:man:/var/cache/man:/usr/sbin/nologin\n',
 'lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin\n',
 'mail:x:8:8:mail:/var/mail:/usr/sbin/nologin\n',
 'news:x:9:9:news:/var/spool/news:/usr/sbin/nologin\n',
 'uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin\n',
 'proxy:x:13:13:proxy:/bin:/usr/sbin/nologin\n',
 'www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin\n',
 'backup:x:34:34:backup:/var/backups:/usr/sbin/nologin\n',
 'list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin\n',
 'irc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin\n',
 'gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin\n',
 'nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin\n',
 'syslog:x:101:

In [140]:
[one_line 
 for one_line in open('linux-etc-passwd.txt')
 if not one_line.startswith('#')   # two "if" lines are the same as saying "and" for the two conditions
 if not one_line.startswith('\n')]

['root:x:0:0:root:/root:/bin/bash\n',
 'daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin\n',
 'bin:x:2:2:bin:/bin:/usr/sbin/nologin\n',
 'sys:x:3:3:sys:/dev:/usr/sbin/nologin\n',
 'sync:x:4:65534:sync:/bin:/bin/sync\n',
 'games:x:5:60:games:/usr/games:/usr/sbin/nologin\n',
 'man:x:6:12:man:/var/cache/man:/usr/sbin/nologin\n',
 'lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin\n',
 'mail:x:8:8:mail:/var/mail:/usr/sbin/nologin\n',
 'news:x:9:9:news:/var/spool/news:/usr/sbin/nologin\n',
 'uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin\n',
 'proxy:x:13:13:proxy:/bin:/usr/sbin/nologin\n',
 'www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin\n',
 'backup:x:34:34:backup:/var/backups:/usr/sbin/nologin\n',
 'list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin\n',
 'irc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin\n',
 'gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin\n',
 'nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin\n',
 'syslog:x:101:

In [144]:
# all of the different shells in my etc-passwd file

{one_line.strip().split(':')[-1]
 for one_line in open('linux-etc-passwd.txt')
 if not one_line.startswith(('#', '\n'))}

{'/bin/bash',
 '/bin/false',
 '/bin/nologin',
 '/bin/sh',
 '/bin/sync',
 '/usr/sbin/nologin'}

In [147]:
# instead: how many times is each shell used?
# use Counter + a list comprehension (not a set!)

from collections import Counter

Counter([one_line.strip().split(':')[-1]
 for one_line in open('linux-etc-passwd.txt')
 if not one_line.startswith(('#', '\n'))])

Counter({'/bin/bash': 12,
         '/usr/sbin/nologin': 17,
         '/bin/sync': 1,
         '/bin/false': 15,
         '/bin/sh': 3,
         '/bin/nologin': 1})

In [149]:
# dict comprehensions: Each dict comprehension creates *one* dictionary
# (1) Uses {}
# (2) Has : between key expression + value expression
# (3) the key must be hashable
# (4) output is one dict

words = 'this is a bunch of words'

{ one_word  :  len(one_word)
  for one_word in words.split() }

{'this': 4, 'is': 2, 'a': 1, 'bunch': 5, 'of': 2, 'words': 5}

In [153]:
# let's create a dict of our passwd file, with keys being usernames and values being user IDs (index 2)

{one_line.split(':')[0]  : one_line.split(':')[2]
 for one_line in open('linux-etc-passwd.txt')
 if not one_line.startswith(('#', '\n'))}

{'root': '0',
 'daemon': '1',
 'bin': '2',
 'sys': '3',
 'sync': '4',
 'games': '5',
 'man': '6',
 'lp': '7',
 'mail': '8',
 'news': '9',
 'uucp': '10',
 'proxy': '13',
 'www-data': '33',
 'backup': '34',
 'list': '38',
 'irc': '39',
 'gnats': '41',
 'nobody': '65534',
 'syslog': '101',
 'messagebus': '102',
 'landscape': '103',
 'jci': '955',
 'sshd': '104',
 'user': '1000',
 'reuven': '1001',
 'postfix': '105',
 'colord': '106',
 'postgres': '107',
 'dovecot': '108',
 'dovenull': '109',
 'postgrey': '110',
 'debian-spamd': '111',
 'memcache': '113',
 'genadi': '1002',
 'shira': '1003',
 'atara': '1004',
 'shikma': '1005',
 'amotz': '1006',
 'mysql': '114',
 'clamav': '115',
 'amavis': '116',
 'opendkim': '117',
 'gitlab-redis': '999',
 'gitlab-psql': '998',
 'git': '1007',
 'opendmarc': '118',
 'dkim-milter-python': '119',
 'deploy': '1008',
 'redis': '112'}

In [154]:
!ls *.txt

linux-etc-passwd.txt  myconfig.txt  outfile.txt
mini-access-log.txt   nums.txt	    shoe-data.txt


In [155]:
# how can I flip a dict? (keys -> values, and values -> keys)

d = {'a':1, 'b':2, 'c':3}

{one_value : one_key
  for one_key, one_value in d.items()}

{1: 'a', 2: 'b', 3: 'c'}

In [156]:
d = {'a':1, 'b':2, 'c':3, 'd':3, 'e':2}

{one_value : one_key
  for one_key, one_value in d.items()}

{1: 'a', 2: 'e', 3: 'd'}

# Exercise: Count vowels dict

1. Write a function, `count_vowels`, which takes a string and returns the number of vowels (a, e, i, o, and u) in the string.
2. Ask the user to enter a sentence.
3. Use a dict comprehension and your function to return a dict in which the keys are the unique words, and the values are the vowel counts for each word.

In [157]:
def count_vowels(s):
    total = 0
    for one_letter in s:
        if one_letter in 'aeiou':
            total += 1
    return total

sentence = input('Enter a sentence: ').strip()

{  one_word   : count_vowels(one_word)
 for one_word in sentence.split()                            
}

Enter a sentence:  this is a fantastic and extremely ridiculous test of my functionality


{'this': 1,
 'is': 1,
 'a': 1,
 'fantastic': 3,
 'and': 1,
 'extremely': 3,
 'ridiculous': 5,
 'test': 1,
 'of': 1,
 'my': 0,
 'functionality': 5}

In [159]:
def count_vowels(s):
    return len([1
                for one_letter in s
                if one_letter.lower() in 'aeiou'])

sentence = input('Enter a sentence: ').strip()

{  one_word   : count_vowels(one_word)
 for one_word in sentence.split()                            
}

Enter a sentence:  this is another attempt


{'this': 1, 'is': 1, 'another': 3, 'attempt': 2}

# Next up

1. Nested comprehensions
2. Passing functions as arguments
3. `lambda` and friends
4. Modules 

In [160]:
# Resume at 13:20 Paris Time

In [161]:
# dict comprehension -- gets us usernames (index 0) and user IDs (index 2) from linux-etc-passwd.txt

{one_line.split(':')[0]  : one_line.split(':')[2]
 for one_line in open('linux-etc-passwd.txt')
 if not one_line.startswith(('#', '\n'))}

{'root': '0',
 'daemon': '1',
 'bin': '2',
 'sys': '3',
 'sync': '4',
 'games': '5',
 'man': '6',
 'lp': '7',
 'mail': '8',
 'news': '9',
 'uucp': '10',
 'proxy': '13',
 'www-data': '33',
 'backup': '34',
 'list': '38',
 'irc': '39',
 'gnats': '41',
 'nobody': '65534',
 'syslog': '101',
 'messagebus': '102',
 'landscape': '103',
 'jci': '955',
 'sshd': '104',
 'user': '1000',
 'reuven': '1001',
 'postfix': '105',
 'colord': '106',
 'postgres': '107',
 'dovecot': '108',
 'dovenull': '109',
 'postgrey': '110',
 'debian-spamd': '111',
 'memcache': '113',
 'genadi': '1002',
 'shira': '1003',
 'atara': '1004',
 'shikma': '1005',
 'amotz': '1006',
 'mysql': '114',
 'clamav': '115',
 'amavis': '116',
 'opendkim': '117',
 'gitlab-redis': '999',
 'gitlab-psql': '998',
 'git': '1007',
 'opendmarc': '118',
 'dkim-milter-python': '119',
 'deploy': '1008',
 'redis': '112'}

In [167]:
{fields[0]  : fields[2]
 for one_line in open('linux-etc-passwd.txt')
 if not one_line.startswith(('#', '\n'))
 if (fields := one_line.split(':'))}  # we split one_line, assigning the list to fields... the result is sent to "if"

{'root': '0',
 'daemon': '1',
 'bin': '2',
 'sys': '3',
 'sync': '4',
 'games': '5',
 'man': '6',
 'lp': '7',
 'mail': '8',
 'news': '9',
 'uucp': '10',
 'proxy': '13',
 'www-data': '33',
 'backup': '34',
 'list': '38',
 'irc': '39',
 'gnats': '41',
 'nobody': '65534',
 'syslog': '101',
 'messagebus': '102',
 'landscape': '103',
 'jci': '955',
 'sshd': '104',
 'user': '1000',
 'reuven': '1001',
 'postfix': '105',
 'colord': '106',
 'postgres': '107',
 'dovecot': '108',
 'dovenull': '109',
 'postgrey': '110',
 'debian-spamd': '111',
 'memcache': '113',
 'genadi': '1002',
 'shira': '1003',
 'atara': '1004',
 'shikma': '1005',
 'amotz': '1006',
 'mysql': '114',
 'clamav': '115',
 'amavis': '116',
 'opendkim': '117',
 'gitlab-redis': '999',
 'gitlab-psql': '998',
 'git': '1007',
 'opendmarc': '118',
 'dkim-milter-python': '119',
 'deploy': '1008',
 'redis': '112'}

In [170]:
{fields[0]  : fields[2]
 for one_line in open('linux-etc-passwd.txt')
 if not one_line.startswith(('#', '\n'))
 if (username, passwd, id_number, *rest := one_line.split(':'))}  # we split one_line, assigning the list to fields... the result is sent to "if"

SyntaxError: invalid syntax (1707263041.py, line 4)

In [171]:
mylist = [[10, 20, 25, 30], 
          [40, 50, 60, 70, 80],
          [90, 95, 100],
          [105, 110, 115, 120, 125, 130]]

mylist

[[10, 20, 25, 30],
 [40, 50, 60, 70, 80],
 [90, 95, 100],
 [105, 110, 115, 120, 125, 130]]

In [172]:
# how can I get a flattened list (all elements at the top-level list -- no sublists) back from mylist?

output = []

for one_sublist in mylist:
    for one_element in one_sublist:
        output.append(one_element)
        
output    

18

In [173]:
[one_sublist
 for one_sublist in mylist]

[[10, 20, 25, 30],
 [40, 50, 60, 70, 80],
 [90, 95, 100],
 [105, 110, 115, 120, 125, 130]]

In [177]:
# nested comprehension -- great for nested, iterable data structures when I want output
# for each inner element, not each outer element

[one_element
 for one_sublist in mylist
 for one_element in one_sublist]

[10, 20, 25, 30, 40, 50, 60, 70, 80, 90, 95, 100, 105, 110, 115, 120, 125, 130]

In [176]:
# this does the same thing, but is (in my opinion) unreadable
[one_element for one_sublist in mylist for one_element in one_sublist]

[10, 20, 25, 30, 40, 50, 60, 70, 80, 90, 95, 100, 105, 110, 115, 120, 125, 130]

In [178]:
[one_element
 for one_sublist in mylist
 if len(one_sublist) > 3
 for one_element in one_sublist]

[10, 20, 25, 30, 40, 50, 60, 70, 80, 105, 110, 115, 120, 125, 130]

In [179]:
# get me odd elements from the sublists of mylist, from sublists with at least 4 elements

[one_element
 for one_sublist in mylist
 if len(one_sublist) > 3
 for one_element in one_sublist
 if one_element % 2  ]  # meaning: odd items

[25, 105, 115, 125]

In [180]:
!head movies.dat

1::Toy Story (1995)::Animation|Children's|Comedy
2::Jumanji (1995)::Adventure|Children's|Fantasy
3::Grumpier Old Men (1995)::Comedy|Romance
4::Waiting to Exhale (1995)::Comedy|Drama
5::Father of the Bride Part II (1995)::Comedy
6::Heat (1995)::Action|Crime|Thriller
7::Sabrina (1995)::Comedy|Romance
8::Tom and Huck (1995)::Adventure|Children's
9::Sudden Death (1995)::Action
10::GoldenEye (1995)::Action|Adventure|Thriller


# Exercise: Movie genres

Use a comprehension to read from `movies.dat`, and determine what are the 5 most common genres associated with movies.

Note: If a movie has multiple genres, then each of the genres should be counted separately.

In [185]:
from collections import Counter

c = Counter([one_field
 for one_line in open('movies.dat')
 for one_field in one_line.split('::')[-1].strip().split('|')])

c

Counter({'Animation': 105,
         "Children's": 251,
         'Comedy': 1200,
         'Adventure': 283,
         'Fantasy': 68,
         'Romance': 471,
         'Drama': 1603,
         'Action': 503,
         'Crime': 211,
         'Thriller': 492,
         'Horror': 343,
         'Sci-Fi': 276,
         'Documentary': 127,
         'War': 143,
         'Musical': 114,
         'Mystery': 106,
         'Film-Noir': 44,
         'Western': 68})

In [186]:
c.most_common()

[('Drama', 1603),
 ('Comedy', 1200),
 ('Action', 503),
 ('Thriller', 492),
 ('Romance', 471),
 ('Horror', 343),
 ('Adventure', 283),
 ('Sci-Fi', 276),
 ("Children's", 251),
 ('Crime', 211),
 ('War', 143),
 ('Documentary', 127),
 ('Musical', 114),
 ('Mystery', 106),
 ('Animation', 105),
 ('Fantasy', 68),
 ('Western', 68),
 ('Film-Noir', 44)]

In [187]:
c.most_common(5)

[('Drama', 1603),
 ('Comedy', 1200),
 ('Action', 503),
 ('Thriller', 492),
 ('Romance', 471)]

In [195]:
c = Counter([one_field
 for one_line in open('movies.dat')
 for one_field in one_line.split('::')[-1].strip().split('|')])

for key, value in c.items():
    print(f'{key:.<14}{value:.>6}')

Animation........105
Children's.......251
Comedy..........1200
Adventure........283
Fantasy...........68
Romance..........471
Drama...........1603
Action...........503
Crime............211
Thriller.........492
Horror...........343
Sci-Fi...........276
Documentary......127
War..............143
Musical..........114
Mystery..........106
Film-Noir.........44
Western...........68


In [199]:
c = Counter([one_field
 for one_line in open('movies.dat')
 for one_field in one_line.split('::')[-1].strip().split('|')])

for key, value in c.items():
    print(f'{key:.<14}{(value // 20) * "x"}')

Animation.....xxxxx
Children's....xxxxxxxxxxxx
Comedy........xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Adventure.....xxxxxxxxxxxxxx
Fantasy.......xxx
Romance.......xxxxxxxxxxxxxxxxxxxxxxx
Drama.........xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Action........xxxxxxxxxxxxxxxxxxxxxxxxx
Crime.........xxxxxxxxxx
Thriller......xxxxxxxxxxxxxxxxxxxxxxxx
Horror........xxxxxxxxxxxxxxxxx
Sci-Fi........xxxxxxxxxxxxx
Documentary...xxxxxx
War...........xxxxxxx
Musical.......xxxxx
Mystery.......xxxxx
Film-Noir.....xx
Western.......xxx


# Functions as arguments

Functions are objects. And any object can be passed as an argument to another function. So we can design our functions to accept other functions as arguments, and thus let the caller determine how we're going to proceed.

In [200]:
words = 'This is a bunch of words for my Python course with Cisco'.split()
words

['This',
 'is',
 'a',
 'bunch',
 'of',
 'words',
 'for',
 'my',
 'Python',
 'course',
 'with',
 'Cisco']

In [202]:
# one way to sort is with list.sort:
# avoid using list.sort -- because it changes the list itself! (not very functional)

words.sort()  
words

['Cisco',
 'Python',
 'This',
 'a',
 'bunch',
 'course',
 'for',
 'is',
 'my',
 'of',
 'with',
 'words']

In [203]:
# rather, try to use "sorted" -- a builtin function that works with *any* iterable,
# and returns a new list of sorted items (doesn't change the original list)

words = 'This is a bunch of words for my Python course with Cisco'.split()
sorted(words)

['Cisco',
 'Python',
 'This',
 'a',
 'bunch',
 'course',
 'for',
 'is',
 'my',
 'of',
 'with',
 'words']