# Agenda

1. Data structures (Monday)
    - Built-in data strucures (behind the scenes)
    - Advanced data structures (`namedtuple`, `Counter`, etc. from the `collections` module)
2. Functions (Monday + Tuesday)
    - Function objects
    - Parameters
    - Mapping arguments to parameters
    - LEGB rule for variable lookup + scoping
    - Inner functions + closures
    - Storing functions as objects
3. Functional programming (Tuesday)
    - Comprehensions
    - Passing functions as arguments to other functions
    - `lambda` and similar functional-programming systems
4. Modules + packages (Tuesday + Wednesday)
    - How modules work
    - Packages vs. modules
    - PyPI
5. Objects (Wednesday)
    - What are objects?
    - Classes, methods, instances
    - Inheritance
    - Attributes -- one of the most important things you can learn in Python!
    - ICPO rule for attribute lookup
    - Magic methods
    - Properties
    - Descriptors    
6. Iterators + generators (Thursday)
    - Iterator protocol
    - Adding iteration to a class
    - Generator functions
    - Generator comprehensions
7. Decorators (Thursday)
    - What are they?
    - Writing decorators 
8. Concurrency (threads + processes) (Thursday)
    - Multithreading in Python
    - Multiprocessing 

In [1]:
import sys
sys.version

'3.10.0 (default, Oct 13 2021, 06:45:00) [Clang 13.0.0 (clang-1300.0.29.3)]'

In [4]:
name = 'Reuven'
print(naem)

NameError: name 'naem' is not defined

# Data structures

In [5]:
x = None

In [7]:
print(x)

None


In [8]:
x

In [9]:
type(None)

NoneType

In [10]:
x = None
y = None
z = None

In [12]:
# is x the same as None?

x == None    # not Pythonic!

True

In [13]:
# None is a singleton.  Every None is the same None.

In [14]:
x = None
y = None

x is y   # this asks: are x and y both referring to precisely the same object?

True

In [15]:
# really, "is" is checking
# every object has an id number in Python

id(x) == id(y)    

True

In [16]:
id(x)

4475355152

In [17]:
id(y)

4475355152

In [18]:
# the ID of an object is... its location in memory.

In [19]:
hex(id(y))

'0x10ac08010'

In [20]:
x = 100
y = 100

x == y

True

In [21]:
x is y  # are these the same object?

True

In [22]:
x = 10000
y = 10000

x == y

True

In [23]:
x is y

False

In [24]:
# if you're using "==" to check values and "is" to see if objects are the same object, then this doesn't cause any trouble!
# but... many people use "is" because they think it's more aesthetic, or faster, or nicer... 

# Reuven's rule of `is`

Only use it with `None`.

In [25]:
s = 'abcd'
t = 'abcd'

s == t

True

In [26]:
s is t

True

In [27]:
s = 'ab.cd'
t = 'ab.cd'

s == t

True

In [28]:
s is t

False

In [29]:
s = 'abcde' * 100_000
t = 'abcde' * 100_000

s == t

True

In [30]:
s is t

False

# What's going on with strings as `==` vs `is`?

Every time we use a variable in Python, Python turns that variable name into a string. It then uses that string to look up the variable in its internal dictionary of variables and values. (We can see that dict by calling the `globals()` function.)

In order to speed this process up, and not create a huge number of strings that are then thrown away, Python caches all strings that are both short and legal variable names.  This caching is run by the `sys.intern` function.  The first time it sees a string, it creates the string.  Subsequent times, it reuses the same string.

In [31]:
x = None

if x is None:
    print('Yes, it is None!')
else:
    print('No, it is not None!')

Yes, it is None!


In [32]:
x = None

if x:
    print('Yes, it is True-ish!')
else:
    print('No, it is False-ish!')

No, it is False-ish!


In [33]:
None == False

False

# What is `True` in Python?

Every expression, in a boolean context, is `True` in Python... except for:

- `None`
- `False`
- 0 (of any numeric type)
- anything empty (meaning: `''`, `[]`, `()`, `{}`)

In [34]:
while True:
    name = input('Enter your name: ').strip()
    
    if not name:   # if we got an empty string
        break
        
    print(f'Hello, {name}!')

Enter your name:  world


Hello, world!


Enter your name:  Reuven


Hello, Reuven!


Enter your name:  


In [35]:
# Assignment in Python is *not* an expression
# Meaning: Assignment doesn't return any value from it

while name = input('Enter your name: ').strip():
    
    print(f'Hello, {name}!')

SyntaxError: invalid syntax. Maybe you meant '==' or ':=' instead of '='? (3166419494.py, line 1)

In [36]:
# As of Python 3.8, we have the "assignment expression" operator, :=
# also known as "the walrus"

# input returns a string
# str.strip returns a string
# that string is assigned to name
# because of :=, the assignment returns a string (whatever was assigned to "name")
# then "while" looks to its right, and sees a string, and turns it into a boolean: False for empty strings, True for all others

while name := input('Enter your name: ').strip():
    
    print(f'Hello, {name}!')

Enter your name:  world


Hello, world!


Enter your name:  asdfsaf


Hello, asdfsaf!


Enter your name:  Reuven


Hello, Reuven!


Enter your name:  


In [38]:
# please don't do this
if x := 5:
    pass

In [39]:
x

5

# Numbers!

Python has three types of numbers:

- Integers
- Floats
- Complex

In [40]:
x = 100  # this is an integer
type(x)

int

In [41]:
# what is the largest int we can get in Python?
# or: How many bits are in Python integers?

# Answer: There is no limit, bits are irrelevant

In [43]:
import sys

x = 0
sys.getsizeof(x)   # 24 bytes for an integer! Zero!

24

In [44]:
x = 1
sys.getsizeof(x)

28

In [45]:
x = 1000
sys.getsizeof(x)

28

In [46]:
x = x ** 1000 
sys.getsizeof(x)

1356

In [47]:
x = x ** 1000
sys.getsizeof(x)

1328796

In [48]:
s = '12345'

int(s)   # creates a new instance of int, based on s

12345

In [49]:
# no no no  s is actually a hex number

int(s, 16)  # intepret s as hex

74565

In [50]:
int(s, 8) # interpret as octal

5349

In [51]:
# floats 

type(1)

int

In [52]:
type(1.0)

float

In [53]:
0.1 + 0.2

0.30000000000000004

# Dealing with floats 

1. OK, floats are inaccurate. Nothing to do about it.
2. Use the `round` function to round everything: `round(x, 2)` will return `x` with only 2 digits after the decimal point.
3. Use integers, which don't have this issue, and then just talk about cents/pence.  So you'd use 100 instead of 1.00, and avoid the problem.
4. Use BCD (binary coded decimals).  In other words: Store decimal numbers, and do decimal math. (Using the `Decimal` class in the `decimal` module)

In [54]:
from decimal import Decimal

x = Decimal('0.1')
y = Decimal('0.2')

x + y

Decimal('0.3')

In [55]:
float(x+y)

0.3

In [56]:
# if you use decimal.Decimal, create your objects using strings, not floats!

x = Decimal(0.1)
y = Decimal(0.2)

x + y

Decimal('0.3000000000000000166533453694')

In [57]:
x

Decimal('0.1000000000000000055511151231257827021181583404541015625')

In [58]:
y

Decimal('0.200000000000000011102230246251565404236316680908203125')

In [59]:
# Complex 

x = 10+3j
y = 5-8j

In [61]:
x+y

(15-5j)

In [62]:
x*y

(74-65j)

# Strings

- Strings are immutable!
- Strings use Unicode!

In [63]:
s = 'abcdefghij'
s[0]

'a'

In [64]:
s[0] = '!'  # this won't work -- strings are immutable!

TypeError: 'str' object does not support item assignment

In [65]:
# but wait ... what about

x = 'abcdefghij'
x += 'klmnop'  # same as saying x = x + 'klmnop'

x

'abcdefghijklmnop'

In [66]:
s = 'שלום'
len(s)

4

In [67]:
print(s[0])

ש


In [68]:
# how can I work with bytes, if I'm stuck using characters?
# meaning: All strings in Python must be legit UTF-8 (Unicode characters)

# we have a second string type in Python: bytes

s.encode()  # give me a byte string based on the characters in s

b'\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d'

In [69]:
s = 'hello'
s.encode()

b'hello'

In [70]:
b = s.encode()

In [71]:
b[0]

104

In [72]:
b.decode()   # return a string, based on the bytes

'hello'

In [73]:
s.encode()   # return a byte string, based on the characters

b'hello'

In [74]:
b'hello'

b'hello'

In [75]:
b'שלום'

SyntaxError: bytes can only contain ASCII literal characters (2953791869.py, line 1)

In [76]:
# split and join

# split is a string method that returns a list of strings

s = 'abcd|ef|ghi|kjl'
s.split('|')  

['abcd', 'ef', 'ghi', 'kjl']

In [77]:
s = 'this is a bunch of words'
s.split(' ')

['this', 'is', 'a', 'bunch', 'of', 'words']

In [78]:
# Python split our string on every single space character...

s = 'this   is a    bunch of  words'
s.split(' ') 

['this', '', '', 'is', 'a', '', '', '', 'bunch', 'of', '', 'words']

In [79]:
# we avoid this by passing *NO* argument to str.split

s.split()  # split on any number of whitespace characters in a row (space, \t, \n, \r, \v)

['this', 'is', 'a', 'bunch', 'of', 'words']

In [80]:
words = s.split()
words

['this', 'is', 'a', 'bunch', 'of', 'words']

In [81]:
# how can I put them back together?
# str.join

'*'.join(words)   

'this*is*a*bunch*of*words'

In [82]:
' '.join(words)

'this is a bunch of words'

# Exercise: Pig Latin sentence

1. Pig Latin is a "secret" language used by children in the English-speaking world (especially the US). The rules are:
    - If a word starts with a vowel (a, e, i, o, or u) then we add `way` to it
    - If a word starts with anything else, then we move the first letter to the end, and add `ay`.
2. For this exercise, ask the user to enter a sentence (all lowercase, no punctuation)
3. Print the entire sentence, translated into Pig Latin, word by word, on one line.

Examples:

    Enter a sentence: this is a test
    histay isway away esttay
    
    Enter a sentence: this papaya is delicious
    histay apayapay isway eliciousday
    
    

In [83]:
s = 'abc\ndef\n'
print(s)

abc
def



In [84]:
s = input('Enter a string:')

Enter a string: abc


In [86]:
output = []
s = input('Enter a sentence: ').strip().lower()

for one_word in s.split():
    if one_word[0] in 'aeiou':
        output.append(one_word + 'way')
    else:
        output.append(one_word[1:] + one_word[0] + 'ay')
        
print(' '.join(output))

Enter a sentence:  this is a test


histay isway away esttay


# Sequences

String, list, and tuple are all *sequences*.  They can all:

- Get one character, counting from the left, with a positive index (`s[0]`)
- Get one character, counting from the *right*, with a negative index (`s[-1]`)
- Get a slice `s[start:finish:step]`
- Iterate with a `for` loop
- Search with `in`
- Count how often something appears with the `.count` method
- Search with the `.index` method


# Lists vs. tuples

Lists are mutable, and tuples are immutable. But that's *not* how Python wants us to think about them! 

We should think about them this way:
- Lists are for sequences of the same type
- Tuples are for sequences of *different* types

List of strings, list of lists, list of functions.  But a tuple representing a person (name + age).  Or a tuple representing coordinates.

Tuples are supposed to be for structs / records.

In [88]:
# Python allocates memory to a list, assuming that it'll need some more in the future for new elements.
# Each time we add enough elements to use up that spare memory, it then allocates a bunch more.

mylist = []

for i in range(30):
    print(f'{i=}, {len(mylist)=}, {sys.getsizeof(mylist)=}')
    mylist.append(i)

i=0, len(mylist)=0, sys.getsizeof(mylist)=56
i=1, len(mylist)=1, sys.getsizeof(mylist)=88
i=2, len(mylist)=2, sys.getsizeof(mylist)=88
i=3, len(mylist)=3, sys.getsizeof(mylist)=88
i=4, len(mylist)=4, sys.getsizeof(mylist)=88
i=5, len(mylist)=5, sys.getsizeof(mylist)=120
i=6, len(mylist)=6, sys.getsizeof(mylist)=120
i=7, len(mylist)=7, sys.getsizeof(mylist)=120
i=8, len(mylist)=8, sys.getsizeof(mylist)=120
i=9, len(mylist)=9, sys.getsizeof(mylist)=184
i=10, len(mylist)=10, sys.getsizeof(mylist)=184
i=11, len(mylist)=11, sys.getsizeof(mylist)=184
i=12, len(mylist)=12, sys.getsizeof(mylist)=184
i=13, len(mylist)=13, sys.getsizeof(mylist)=184
i=14, len(mylist)=14, sys.getsizeof(mylist)=184
i=15, len(mylist)=15, sys.getsizeof(mylist)=184
i=16, len(mylist)=16, sys.getsizeof(mylist)=184
i=17, len(mylist)=17, sys.getsizeof(mylist)=248
i=18, len(mylist)=18, sys.getsizeof(mylist)=248
i=19, len(mylist)=19, sys.getsizeof(mylist)=248
i=20, len(mylist)=20, sys.getsizeof(mylist)=248
i=21, len(mylist)

In [89]:
sys.getsizeof(mylist)

312

In [90]:
mylist[0] = 'abcde'
sys.getsizeof(mylist)

312

In [91]:
mylist[0] = 'abcde' * 100_000_000
sys.getsizeof(mylist)  # getsizeof checks the size of the list, *not* the elements of the list

312

In [92]:
t = (10, 20, 30)
type(t)

tuple

In [93]:
t = (10, 20)
type(t)

tuple

In [94]:
t = (10)   # no comma, no tuple
type(t)

int

In [95]:
t = ()  # no comma, but it's still a tuple... because it's empty
type(t)

tuple

In [96]:
2 + 3 * 4   # first multiplication, then addition

14

In [97]:
(2 + 3) * 4  # now we have parentheses, which change priority

20

In [98]:
t = (10,)  # now it's a tuple!
type(t)

tuple

In [99]:
(2 + 3,) * 4  # will this work?

(5, 5, 5, 5)

In [101]:
t = 10, 20, 30, 40, 50 # also a tuple -- commas are more important than ()
t

(10, 20, 30, 40, 50)

In [102]:
t[0] = '!'  

TypeError: 'tuple' object does not support item assignment

In [109]:
t = ([10, 20, 30], 
     [100, 200, 300])

In [110]:
# I can change the list that the tuple refers to 
t[0].append('!')

In [111]:
t

([10, 20, 30, '!'], [100, 200, 300])

In [112]:
t[0] += [40, 50, 60]   # += calls the "__iadd__" method for inplace addition. First it modifies the list, then it assigns back to the owner

TypeError: 'tuple' object does not support item assignment

In [113]:
t

([10, 20, 30, '!', 40, 50, 60], [100, 200, 300])

# Tuples, good and bad

Good:
- Very efficient
- Immutable (see above)

Bad:
- I don't like using numeric indexes



In [114]:
person = ('Reuven', 'Lerner', 46)

person[0]  # get first name

'Reuven'

In [115]:
person[1] # get last name

'Lerner'

In [116]:
person[2] # get shoe size

46

In [117]:
# I want the best of both words: Efficient like a tuple, but with names like a dict or object

from collections import namedtuple

In [123]:
# when we use "namedtuple", we're creating a new class
# the class will inherit from tuple, and thus be immutable
# each instance will have attributes that let us retrieve the elements by name

Person = namedtuple('Person', ['first', 'last', 'shoesize'])

In [124]:
# every class in Python has as __name__ attribute, a string indicating its name
# the way to do this with namedtuple is to pass the name as a string in the call 

Person.__name__

'Person'

In [125]:
# create a new instance of Person

p = Person('Reuven', 'Lerner', 46)

In [126]:
p[0]

'Reuven'

In [127]:
p[1]

'Lerner'

In [128]:
p[2]

46

In [130]:
p.first

'Reuven'

In [131]:
p.last

'Lerner'

In [132]:
p.shoesize

46

In [133]:
print(p)

Person(first='Reuven', last='Lerner', shoesize=46)


In [134]:
p.first = 'asfsafas'

AttributeError: can't set attribute

In [136]:
# If I want to create a new Person based on the existing Person, I can use _replace

p._replace(first='whatever')

Person(first='whatever', last='Lerner', shoesize=46)

In [137]:
sys.getsizeof(t)

56

In [138]:
from typing import NamedTuple

class Person(NamedTuple):
    first : str
    last : str
    shoesize : int

# Exercise: Bookshop

1. Create a `Book` class, using `namedtuple`. Each instance of `Book` will have three fields: `title`, `author`, `price`.
2. Create a list, `inventory`, with three or four books that you've created.
3. Create a variable, `total`, with the user's total purchase amount (currently 0).
4. Ask the user, repeatedly, to enter the name of a book they want to buy.
    - If they enter an empty string, stop asking and print `total`.
    - If they enter the name of a book in our inventory, print all of the details, and the current (updated) total.
    - If they enter the name of a book *not* in our inventory, scold them a bit and them try again.
5. Print the total.

Example:

    Enter a book: title1
    title1 by author1 costs 50, total is now 50
    Enter a book: title2
    title2 by author2 costs 75, total is 125
    Enter a book: whatever
    We don't carry whatever!
    Enter a book: [ENTER]
    Total is 125

In [141]:
from collections import namedtuple

Book = namedtuple('Book', ['title', 'author', 'price'])

book1 = Book('title1', 'author1', 50)
book2 = Book('title2', 'author1', 60)
book3 = Book('title3', 'author2', 75)
book4 = Book('title4', 'author3', 150)

inventory = [book1, book2, book3, book4]

total = 0

while True:
    look_for = input('Enter title: ').strip()
    
    if not look_for:   # empty string? break
        break
        
    for one_book in inventory:
        if one_book.title == look_for:
            price = one_book.price
            total += price
            print(f'Found {look_for} by {one_book.author}, price is {price}; new total is {total}')
            break
            
    else:  # this fires if we got to the natural end of the loop -- NOT exiting via break
        print(f'We do not have {look_for}')
            
print(f'{total=}')

Enter title:  title1


Found title1 by author1, price is 50; new total is 50


Enter title:  asdfasdfa


We do not have asdfasdfa


Enter title:  


total=50


# Next up:

- Dictionaries 
- Sets
- Variants on dicts

Return at 11 Paris Time

# Names for dictionaries

- Hash table
- Hash
- Hash map
- Map
- Key-value store
- Name-value store
- Associative array

In [142]:
d = {'a':1, 'b':2, 'c':3}
len(d)

3

In [143]:
d['a']

1

In [144]:
d['b']

2

In [145]:
d['x']

KeyError: 'x'

In [146]:
d

{'a': 1, 'b': 2, 'c': 3}

In [147]:
d['x'] = 100   # this will either add to the dict (if 'x' isn't yet a key) or will replace the existing value


In [148]:
d

{'a': 1, 'b': 2, 'c': 3, 'x': 100}

In [149]:
# remove something from a dict

d.pop('x')  # removes the pair, and returns the value

100

In [150]:
d.pop('x')

KeyError: 'x'

In [151]:
# I want to retrieve from a dict and *not* get an error if the key doesn't exist
# I could use if-else... or I can use dict.get

d.get('a')  # this means: if 'a' is a key in d, return d['a'] otherwise, return None

1

In [152]:
d.get('q')  

In [153]:
# Alternative: Pass a second argument to dict.get, and that'll be the default
d.get('q', 'No such key')

'No such key'

In [154]:
# I want to assign to a dict, and only add a new key-value pair.  If the key already exists, then
# I want to ignore my assignment.

d

{'a': 1, 'b': 2, 'c': 3}

In [156]:
d.setdefault('x', 100)   # it'll return 100, since I successfully added 'x', and 100 is its value

100

In [157]:
d.setdefault('x', 2)     # returns 100, because 'x' is already a key -- so we get the existing value to indicate our assignment failed

100

In [158]:
d = {'a':1, 'b':2, 'c':3}
other = {'b':20, 'c':30, 'd':40}

d.update(other)  # assign each key-value pair from other onto d -- d will change *and* any conflicts are in favor of "other"

In [159]:
d

{'a': 1, 'b': 20, 'c': 30, 'd': 40}

In [160]:
# as of Python 3.9, we don't have to change d!  We can use the | operator to do this and get a new dict back

d = {'a':1, 'b':2, 'c':3}
other = {'b':20, 'c':30, 'd':40}

d | other

{'a': 1, 'b': 20, 'c': 30, 'd': 40}

In [161]:
d

{'a': 1, 'b': 2, 'c': 3}

In [162]:
d |= other   # basically, same as d.update(other)

In [163]:
d

{'a': 1, 'b': 20, 'c': 30, 'd': 40}

In [165]:
# creating dicts

# I want to create a new dict with keys a, b, c, and d.  All values should be 0.

dict.fromkeys('abcd')  # by default, all values are None

{'a': None, 'b': None, 'c': None, 'd': None}

In [166]:
dict.fromkeys('abcd', 0)

{'a': 0, 'b': 0, 'c': 0, 'd': 0}

In [168]:
# beware mutable values!

d = dict.fromkeys('abcd', [])
d

{'a': [], 'b': [], 'c': [], 'd': []}

In [169]:
d['a'].append('hello')

In [170]:
d['a']

['hello']

In [171]:
d['b']

['hello']

In [172]:
d['c']

['hello']

In [173]:
d['d']

['hello']

In [174]:
mylist = [10, 20, 30]

d[mylist] = 5

TypeError: unhashable type: 'list'

# How are dicts implemented?

## Part 1: Before Python 3.6

In the olden days, dictionaries were simple hash tables.  Meaning: The location of a key-value pair was determined by the value returned by `hash(key)`. 

In [176]:
d['a'] = 1

hash('a') % 8    # gives us a result from 0-7

3

In [177]:
d['b'] = 2
d['c'] = 3

In [178]:
hash('b') % 8

7

In [179]:
hash('c') % 8

7

# Good and bad with this implementation

Good
- Simple to understand 
- Simple to implement
- Even collisions are fairly straightforward
- Searching by key is very fast -- O(1), or constant time
- When our dict reaches 2/3 fullness, it doubles in size -- again, easy to deal with

Bad
- We're allocating far more space than we need (to ensure lots of empty slots)
- The order of items in a dict is completely unknowable
- Mutable data cannot be used as a key -- because if it changes, then the value of `hash(key)` will change, and our values will get lost

# Dictionary implementation, starting in 3.6

Now we use *two* data structures:
- A table that grows over time, with one row added for each new key-value pair in the dict.
- An array (at the C level) that keeps track of the indexes in that table.

In [180]:
d = {}
d['a'] = 1
hash('a') % 8

3

In [None]:
'a' in d   # what does Python do?  (1) hash('a') % 8  --> array index 3 --> table index 0 --> yes, it's there

In [181]:
d['b'] = 2
d['c'] = 3

In [182]:
hash('b') % 8

7

In [183]:
hash('c') % 8

7

# Advantages of the new dict implementation

1. The items are in chronological order, as added.
2. Dicts now use 30% less memory than they did before.

# Looping over dicts

You can loop over them in a few ways:

1. `for` loop on the dict itself
2. `for` loop on `d.keys()`  -- never do this -- just iterate over the dict itself
3. `for` loop on `d.values()`
4. `for` loop on `d.items()`  -- I prefer this

In [184]:
d

{'a': 1, 'b': 2, 'c': 3}

In [185]:
for one_thing in d.items():
    print(one_thing)

('a', 1)
('b', 2)
('c', 3)


In [186]:
# take advantage of tuple unpacking, and grab each iteration of d.items() into key and value variables

for key, value in d.items():
    print(f'{key}: {value}')

a: 1
b: 2
c: 3


# Sets

Sets are basically like a dict's keys. You can check if they are in a set, but they have no values associated with them.  Also, the order is from the old-style dict implementation.

In [187]:
s = {10, 20, 30, 40}    # no :, so it's a set, not a dict

10 in s

True

In [188]:
30 in s

True

In [189]:
s.add(50)   # add a new value
s

{10, 20, 30, 40, 50}

In [190]:
s.add(40)  # add an existing value... it's ignored

In [191]:
s

{10, 20, 30, 40, 50}

In [192]:
# add numbers from the user's input

numbers = set()   # cannot use {} because that's an empty dict?

while s := input('Enter a number: ').strip():
    numbers.add(int(s))
    
print(numbers)    


Enter a number:  10
Enter a number:  20
Enter a number:  30
Enter a number:  10
Enter a number:  20
Enter a number:  30
Enter a number:  20
Enter a number:  30
Enter a number:  20
Enter a number:  30
Enter a number:  30
Enter a number:  40
Enter a number:  


{40, 10, 20, 30}


In [193]:
sum(numbers)

100

In [194]:
s = {10, 20, 30, 40}
s2 = {30, 40, 50, 60}

s | s2  # union

{10, 20, 30, 40, 50, 60}

In [195]:
s & s2   # intersection - what's common to them?

{30, 40}

In [196]:
s ^ s2   # xor -- what's in one, but not both?

{10, 20, 50, 60}

# Exercise: `dictdiff`

1. Write a function, `dictdiff`, that gets two different dictionaries as arguments.
2. The return value from `dictdiff` will be a dict itself, representing the difference between the two arguments.
    - If a key exists in both input dicts, and has the same value in both input dicts, ignore it in the output.
    - If a key exists in both input dicts, but has different values, then the output dict should have that key, and the value will be a two-element list of the first value and the second value.
    - If a key exists in one but not the other, then we'll still get a key-value pair in the output dict -- the key will be that key, and the value will be a two-element list with `None` as the value for where it was missing.


```python
d1 = {'a':1, 'b':2, 'c':3}
d2 = {'a':1, 'b':2, 'c':4}
d3 = {'a':1, 'b':3, 'd':4}

dictdiff(d1, d2)   # returns {'c':[3,4]}
dictdiff(d1, d3)   # returns {'b':[2,3], 'c':[3,None], 'd':[None, 4]}
```

In [202]:
def dictdiff(first, second):
    output = {}
    
    for one_key in first.keys() | second.keys():
        v1 = first.get(one_key)
        v2 = second.get(one_key)
        
        if v1 != v2:
            output[one_key] = [v1, v2]
        
    return output

In [204]:
d1 = {'a':1, 'b':2, 'c':3}
d2 = {'a':1, 'b':2, 'c':4}
d3 = {'a':1, 'b':3, 'd':4}

print(dictdiff(d1, d1))
print(dictdiff(d1, d2))
print(dictdiff(d1, d3))


{}
{'c': [3, 4]}
{'c': [3, None], 'd': [None, 4], 'b': [2, 3]}


In [197]:
d.keys()

dict_keys(['a', 'b', 'c'])

In [200]:
set(list(d.keys()) + list(d.keys()))

{'a', 'b', 'c'}

In [201]:
d.keys() | d.keys()

{'a', 'b', 'c'}

In [205]:
d = {'a':1, 'b':2, 'c':3}
d.keys()

dict_keys(['a', 'b', 'c'])

In [206]:
s = {'a', 'b', 'c'}
s

{'a', 'b', 'c'}

In [207]:
s = set('abcdefg')
s

{'a', 'b', 'c', 'd', 'e', 'f', 'g'}

In [208]:
s = set([10, 20, 30, 40, 50])
s

{10, 20, 30, 40, 50}

# `defaultdict`

`defaultdict` is a subclass of `dict` that lets have a default value when you retrieve a key, if the key doesn't exist.

In [209]:
# My naive use of defaultdict:

from collections import defaultdict

d = defaultdict(0)

d['a'] += 5
d['b'] += 10
d['b'] += 3
d['a'] += 8
d['c'] += 9


TypeError: first argument must be callable or None

In [210]:
callable(0)

False

In [211]:
callable(int)

True

In [212]:
# My real use of defaultdict:

from collections import defaultdict

d = defaultdict(int)  # if we retrieve a key that doesn't exist, our function (int) will run, and its output will be that pair's value

d['a'] += 5

In [213]:
d

defaultdict(int, {'a': 5})

In [214]:
d['b'] += 10

In [215]:
d

defaultdict(int, {'a': 5, 'b': 10})

In [216]:
d['b'] += 3
d['a'] += 8
d['c'] += 9

In [217]:

d

defaultdict(int, {'a': 13, 'b': 13, 'c': 9})

In [218]:
d = defaultdict(dict)

d['a']['b'] = 10
d['a']['c'] = 20
d['x']['y'] = 100
d['x']['y'] += 3

d

defaultdict(dict, {'a': {'b': 10, 'c': 20}, 'x': {'y': 103}})

In [219]:
import time

time.time()

1636369710.704443

In [220]:
d = defaultdict(time.time)

print(d['a'])  # the value for each key is when I first requested that key from the dict
print(d['b'])

1636369730.476041
1636369730.476095


In [221]:
d

defaultdict(<function time.time>,
            {'a': 1636369730.476041, 'b': 1636369730.476095})

In [222]:
# the function passed to defaultdict gets *ZERO* arguments.

In [223]:
def return_5():
    return 5

d = defaultdict(return_5)   # no parentheses after return_5!

d['a'] += 2
d

defaultdict(<function __main__.return_5()>, {'a': 7})

# Exercise: Travel

1. We're going to ask the user, repeatedly, to enter a string containing 'city, country' that they have visited.
2. The goal is to create a dict in which the keys are country names and the values are lists of cities.
3. Ask the user repeatedly to enter that data. If the data isn't valid, then scold them and try again.
4. If they enter an empty string, then stop asking, and print all countries and cities.

Example:

    Where have you traveled: Boston, USA
    Where have you traveled: Chicago, USA
    Where have you traveled: Shanghai, China
    Where have you traveled: Beijing, China
    Where have you traveled: [ENTER]
    
    USA
        Boston
        Chicago
    China
        Shanghai
        Beijing

In [224]:
from collections import defaultdict

all_places = defaultdict(list)

while True:
    s = input('Where have you traveled: ').strip()
    
    if not s:
        break
        
    if s.count(',') != 1:
        print('Enter city,country; try again')
        continue
        
    city, country = s.split(',')
    all_places[country.strip()].append(city.strip())
    
for country, all_cities in all_places.items():
    print(country)
    for one_city in all_cities:
        print(f'\t{one_city}')

Where have you traveled:  Chicago, USA
Where have you traveled:  Beijing, China
Where have you traveled:  Boston, USA
Where have you traveled:  Shanghai, China
Where have you traveled:  asdfasdfasfas


Enter city,country; try again


Where have you traveled:  a,b,c,d,e


Enter city,country; try again


Where have you traveled:  


USA
	Chicago
	Boston
China
	Beijing
	Shanghai


In [225]:
all_places

defaultdict(list,
            {'USA': ['Chicago', 'Boston'], 'China': ['Beijing', 'Shanghai']})

# Next up

1. Some more dict variants
2. Functions
    - Function objects 
    - Byte compilation
    - Scoping
    - Nested functions

In [226]:
# Resume at 13:30 Paris Time

In [227]:
from collections import Counter

# you can use Counter as a version of defaultdict(int)

c = Counter()
c['a'] += 5
c['b'] += 10
c['c'] += 2

c

Counter({'a': 5, 'b': 10, 'c': 2})

In [228]:
for key, value in c.items():
    print(f'{key}: {value}')

a: 5
b: 10
c: 2


In [229]:
# the real way to use Counter is to create it, passing an argument that's iterable

c = Counter('abcababbbccccd')
c

Counter({'a': 3, 'b': 5, 'c': 5, 'd': 1})

In [230]:
c.most_common()  # returns a list of tuples, ordered by most common to least common

[('b', 5), ('c', 5), ('a', 3), ('d', 1)]

In [231]:
c.most_common(2)

[('b', 5), ('c', 5)]

In [232]:
c | c

Counter({'a': 3, 'b': 5, 'c': 5, 'd': 1})

In [234]:
c2 = Counter('cdefgcdcd')

c + c2 

Counter({'a': 3, 'b': 5, 'c': 8, 'd': 4, 'e': 1, 'f': 1, 'g': 1})

In [235]:
c

Counter({'a': 3, 'b': 5, 'c': 5, 'd': 1})

In [236]:
c2

Counter({'c': 3, 'd': 3, 'e': 1, 'f': 1, 'g': 1})

In [238]:
sum([c, c2], Counter())

Counter({'a': 3, 'b': 5, 'c': 8, 'd': 4, 'e': 1, 'f': 1, 'g': 1})

In [239]:
sum([10, 20, 30])

60

In [241]:
sum([10, 20, 30], 5)

65

In [243]:
sum([c, c2], Counter())

Counter({'a': 3, 'b': 5, 'c': 8, 'd': 4, 'e': 1, 'f': 1, 'g': 1})

In [244]:
from collections import OrderedDict

In [245]:
od = OrderedDict(a=1, b=2, c=3)

In [246]:
od

OrderedDict([('a', 1), ('b', 2), ('c', 3)])

In [247]:
# do I need OrderedDict any more?

In [248]:
od.keys()

odict_keys(['a', 'b', 'c'])

In [249]:
od.pop('b')


2

In [250]:
od

OrderedDict([('a', 1), ('c', 3)])

In [251]:
od['b'] = 2
od

OrderedDict([('a', 1), ('c', 3), ('b', 2)])

In [252]:
d = {'a':1, 'b':2, 'c':3}

d == od

True

In [253]:
od2 = OrderedDict(a=1, b=2, c=3)


In [254]:
od

OrderedDict([('a', 1), ('c', 3), ('b', 2)])

In [255]:
od2

OrderedDict([('a', 1), ('b', 2), ('c', 3)])

In [256]:
od == od2

False

In [257]:
od == d

True

In [258]:
od.move_to_end('a')
od

OrderedDict([('c', 3), ('b', 2), ('a', 1)])

# Functions

In [259]:
s = 'abcd'
x = len(s)

x

4

In [260]:
type(x)

int

In [261]:
s = 'abcd'
x = s.upper()

x

'ABCD'

In [262]:
s = 'AbCd'
s.upper()

'ABCD'

In [263]:
s.lower()

'abcd'

In [264]:
s.swapcase()

'aBcD'

In [265]:
x = s.upper  # x is now an alias to the method s.upper

x

<function str.upper()>

In [267]:
x()

'ABCD'

In [268]:
d = {'a':1, 'b':2, 'c':3}

for key, value in d.items():    # the result of calling d.items() is indeed iterable -- a list of tuples
    print(f'{key}: {value}')

a: 1
b: 2
c: 3


In [269]:
d = {'a':1, 'b':2, 'c':3}

for key, value in d.items:    # the object d.items is *NOT* iterable
    print(f'{key}: {value}')

TypeError: 'builtin_function_or_method' object is not iterable

In [270]:
id(x)

4520243536

In [271]:
type(x)

builtin_function_or_method

In [273]:
id(str.upper)

4476537088

In [274]:
# functions have attributes, just like all other objects!
# attributes are a private dict, which we access using .

In [275]:
def hello():
    print(f'Hello!')

In [276]:
hello()

Hello!


In [277]:
hello('world')

TypeError: hello() takes 0 positional arguments but 1 was given

In [278]:
hello.__code__.co_argcount  # this is how many arguments our function can accept

0

In [279]:
def hello():
    print(f'Hello!')

In [280]:
x = hello()

Hello!


In [281]:
print(x)

None


In [282]:
import dis  # the dissamble module in Python

In [283]:
dis.dis(hello)

  2           0 LOAD_GLOBAL              0 (print)
              2 LOAD_CONST               1 ('Hello!')
              4 CALL_FUNCTION            1
              6 POP_TOP
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE


In [284]:
def myfunc():
    pass

In [285]:
dis.dis(myfunc)

  2           0 LOAD_CONST               0 (None)
              2 RETURN_VALUE


In [286]:
myfunc.__code__.co_consts

(None,)

In [287]:
hello.__code__.co_consts

(None, 'Hello!')

In [288]:
def hello():
    return f'Hello!'

In [289]:
dis.dis(hello)

  2           0 LOAD_CONST               1 ('Hello!')
              2 RETURN_VALUE


In [290]:
def hello(name):
    return f'Hello, {name}!'

In [291]:
hello('world')

'Hello, world!'

In [292]:
hello('Reuven')

'Hello, Reuven!'

In [293]:
dis.dis(hello)

  2           0 LOAD_CONST               1 ('Hello, ')
              2 LOAD_FAST                0 (name)
              4 FORMAT_VALUE             0
              6 LOAD_CONST               2 ('!')
              8 BUILD_STRING             3
             10 RETURN_VALUE


In [294]:
hello()    # no arguments

TypeError: hello() missing 1 required positional argument: 'name'

In [295]:
hello.__code__.co_argcount

1

In [296]:
hello.__code__.co_varnames

('name',)

# Two kinds of arguments in Python

Python needs to know how to map our arguments onto parameters. There are two ways to do that:

- Positional arguments: Python takes arguments, in order, and assigns them to the parameters, in the same order.
- Keyword arguments: These arguments look like `name=value`. Python assigns to the parameters according to their names, not according to their positions.



In [297]:
def add(first, second):
    return first + second

In [298]:
add(10, 3)   # positional arguments


# args     10     3
# params first  second

13

In [299]:
add(first=10, second=3)     # keyword arguments

# Python looks:
# params   first second
#           10    3

13

In [300]:
# different order!

add(second=10, first=3)     # keyword arguments

# Python looks:
# params   first second
#            3     10

13

In [301]:
# one positional and one keyword?

add(10, second=3)

13

In [302]:
# one keyword, then one positional?

add(first=10, 3)  # this won't work -- all positional must come before all keyword.

SyntaxError: positional argument follows keyword argument (1306189306.py, line 3)

In [303]:
hello('world')

'Hello, world!'

In [304]:
hello(5)

'Hello, 5!'

In [305]:
hello([10, 20, 30])

'Hello, [10, 20, 30]!'

In [306]:
hello(hello)

'Hello, <function hello at 0x10d25bb50>!'

In [307]:
# type hints aka type annotations might help?

def hello(name:str):
    return f'Hello, {name}!'

In [308]:
hello.__annotations__

{'name': str}

In [309]:
hello(5)

'Hello, 5!'

In [310]:
hello(hello)

'Hello, <function hello at 0x10d259f30>!'

Python is a *dynamic* language. Objects have types, but variables don't. Which means that any variable, at any time, can contain any type and any value.  This is a feature, not a bug!  We can write more flexible, expressive software this way.



# Exercise: `firstlast`

1. Write a function, `firstlast`, that takes any sequence (string, list, tuple) of any length.
2. If the sequence is empty, return the sequence.
3. Otherwise, return a 2-element sequence of the same type you were passed. The first element in the output will be the first element from the input sequence, and the second element in the output will be the final element in the input sequence.

```python
firstlast('abcde')          # 'ae'
firstlast([10, 20, 30])     # [10, 30]
firstlast((100, 200, 300))  # (100, 300)
```

In [311]:
def firstlast(s):
    if not s:
        return s
    
    return s[0] + s[-1]

print(firstlast('abcde'))          # 'ae'
print(firstlast([10, 20, 30]))     # [10, 30]
print(firstlast((100, 200, 300)))  # (100, 300)


ae
40
400


In [312]:
def firstlast(s):
    if not s:
        return s
    
    return type(s)([s[0],s[-1]])

print(firstlast('abcde'))          # 'ae'
print(firstlast([10, 20, 30]))     # [10, 30]
print(firstlast((100, 200, 300)))  # (100, 300)


['a', 'e']
[10, 30]
(100, 300)


In [314]:
def firstlast(s):
    if not s:
        return s
    
    # slices always return the same type as we got!
    return s[:1] + s[-1:]

print(firstlast('abcde'))          # 'ae'
print(firstlast([10, 20, 30]))     # [10, 30]
print(firstlast((100, 200, 300)))  # (100, 300)


ae
[10, 30]
(100, 300)


In [315]:
s = 'abcd'
s.upper()

'ABCD'

In [316]:
# that was the same as saying
str.upper(s)

'ABCD'

In [317]:
# mapping arguments to parameters

# Parameter types

1. Regular parameters (mandatory, positional or keyword)

In [319]:
# optional parameters -- they have a default value

def add(first, second=10):
    return first + second

In [320]:
add(3, 5)   # 2 arguments (3, 5) mapped to two parameters (first, second)

8

In [321]:
add.__code__.co_argcount

2

In [322]:
add.__code__.co_varnames

('first', 'second')

In [325]:
add(3)   # 1 argument (1,) mapped to two parameters (first, second)

# before the function even runs, Python has retrieved from add.__defaults__ and assigned the final value to second

13

In [324]:
add.__defaults__   # this is where Python stores the defaults for our function

(10,)

# Parameter types

1. Regular parameters (mandatory, positional or keyword)
2. Optional parameters (have a default, positional or keyword)

In [None]:
def add_one(x

mylist = [10, 20, 30]

