# Python review of concepts

Mainly to point out useful aspects of Python you may have glossed over. Assumes you already know Python fairly well.

## Python as a language

### Why Python? 

- Huge community - especially in data science and ML 
- Easy to learn 
- Batteries included 
- Extensive 3rd party libraries 
- Widely used in both industry and academia 
- Most important “glue” language bridging multiple communities

In [1]:
import __hello__

### Versions 

- Only use Python 3 (current release version is 3.8, container is 3.7) 
- Do not use Python 2

In [2]:
import sys

In [3]:
sys.version

### Multi-paradigm 

#### Procedural

In [4]:
x = []
for i in range(5):
    x.append(i*i)
x

[0, 1, 4, 9, 16]

#### Functional

In [5]:
list(map(lambda x: x*x, range(5)))

[0, 1, 4, 9, 16]

#### Object-oriented 

In [6]:
class Robot:
    def __init__(self, name, function):
        self.name = name
        self.function = function
        
    def greet(self):
        return f"I am {self.name}, a {self.function} robot!"

In [7]:
fido = Robot('roomba', 'vacuum cleaner')

In [8]:
fido.name

'roomba'

In [9]:
fido.function

'vacuum cleaner'

In [10]:
fido.greet()

'I am roomba, a vacuum cleaner robot!'

### Dynamic typing 

#### Complexity of a + b 

In [11]:
1 + 2.3

3.3

In [12]:
type(1), type(2.3)

(int, float)

In [13]:
'hello' + ' world'

'hello world'

In [14]:
[1,2,3] + [4,5,6]

[1, 2, 3, 4, 5, 6]

In [15]:
import numpy as np

np.arange(3) + 10

array([10, 11, 12])

### Several Python implementations! 

- CPtyhon 
- Pypy 
- IronPython 
- Jython

### Global interpreter lock (GIL) 

- Only applies to CPython
- Threads vs processes 
- Avoid threads in general 
- Performance not predictable

In [16]:
from concurrent.futures import ThreadPoolExecutor

In [17]:
def f(n):
    x = np.random.uniform(0,1,n)
    y = np.random.uniform(0,1,n)
    count = 0
    for i in range(n):
        if x[i]**2 + y[i]**2 < 1:
            count += 1
    return count*4/n

In [18]:
n = 100000
niter = 4

In [19]:
%%time

[f(n) for i in range(niter)]

CPU times: user 521 ms, sys: 7.74 ms, total: 529 ms
Wall time: 510 ms


[3.14496, 3.14336, 3.141, 3.13708]

In [20]:
%%time

with ThreadPoolExecutor(4) as pool:
    xs = list(pool.map(f, [n]*niter))
xs

CPU times: user 525 ms, sys: 4.53 ms, total: 530 ms
Wall time: 513 ms


[3.13488, 3.13636, 3.13952, 3.13908]

## Coding in Python

In [21]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


### Coding conventions 

- PEP 8 
- Avoid magic numbers 
- Avoid copy and paste 
- extract common functionality into functions

[Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/)

### Data types 

- Integers  
    - Arbitrary precision 
    - Integer division operator 
     - Base conversion 
     - Check if integer 

In [22]:
import math

In [23]:
n = math.factorial(100)

In [24]:
n

93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000

In [25]:
f'{n:,}'

'93,326,215,443,944,152,681,699,238,856,266,700,490,715,968,264,381,621,468,592,963,895,217,599,993,229,915,608,941,463,976,156,518,286,253,697,920,827,223,758,251,185,210,916,864,000,000,000,000,000,000,000,000'

In [26]:
h = math.sqrt(3**2 + 4**2)

In [27]:
h

5.0

In [28]:
h.is_integer()

True

- Floats 
    - Checking for equality 
    - Catastrophic cancellation 
- Complex

In [29]:
x = np.arange(9).reshape(3,3)
x = x / x.sum(axis=0)
λ = np.linalg.eigvals(x)

In [30]:
λ[0]

1.0

In [31]:
λ[0] == 1

True

In [32]:
math.isclose(λ[0], 1)

True

In [33]:
def var(xs):
    """Returns variance of sample data."""
    
    n = 0
    s = 0
    ss = 0

    for x in xs:
        n +=1
        s += x
        ss += x*x

    v = (ss - (s*s)/n)/(n-1)
    return v

In [34]:
xs = np.random.normal(1e9, 1, int(1e6))

In [35]:
var(xs)

28454.18679018679

In [36]:
np.var(xs)

1.0016197834349219

- Boolean 
    - What evaluates as False? 

In [37]:
stuff = [[], [1], {},'', 'hello', 0, 1, 1==1, 1==2]
for s in stuff:
    if s:
        print(f'{s} evaluates as True')
    else:
        print(f'{s} evaluates as False')

[] evaluates as False
[1] evaluates as True
{} evaluates as False
 evaluates as False
hello evaluates as True
0 evaluates as False
1 evaluates as True
True evaluates as True
False evaluates as False


- String 
    - Unicode by default 
    - b, r, f strings

In [38]:
u'\u732b'

'猫'

String formatting

- Learn to use the f-string.

In [39]:
import string

In [40]:
char = 'e'
pos = string.ascii_lowercase.index(char) + 1
f"The letter {char} has position {pos} in the alphabet"

'The letter e has position 5 in the alphabet'

In [41]:
n = int(1e9)
f"{n:,}"

'1,000,000,000'

In [42]:
x = math.pi

In [43]:
f"{x:8.2f}"

'    3.14'

In [44]:
import datetime
now = datetime.datetime.now()
now

datetime.datetime(2021, 8, 25, 21, 10, 17, 113586)

In [45]:
f"{now:%Y-%m-%d %H:%M}"

'2021-08-25 21:10'

### Data structures 

- Immutable - string, tulle 
- Mutable - list, set, dictionary 
- Collections module 
- heapq 

In [46]:
import collections

[x for x in dir(collections) if not x.startswith('_')]

['ChainMap',
 'Counter',
 'OrderedDict',
 'UserDict',
 'UserList',
 'UserString',
 'abc',
 'defaultdict',
 'deque',
 'namedtuple']

### Functions 

- \*args, \*\*kwargs 
- Care with mutable default values 
- First class objects 
- Anonymous functions 
- Decorators

In [47]:
def f(*args, **kwargs):
    print(f"args = {args}") # in Python 3.8, you can just write f'{args = }'
    print(f"kwargs = {kwargs}")

In [48]:
f(1,2,3,a=4,b=5,c=6)

args = (1, 2, 3)
kwargs = {'a': 4, 'b': 5, 'c': 6}


In [49]:
def g(a, xs=[]):
    xs.append(a)
    return xs

In [50]:
g(1)

[1]

In [51]:
g(2)

[1, 2]

In [52]:
h = lambda x, y, z: x**2 + y**2 + z**2

In [53]:
h(1,2,3)

14

In [54]:
from functools import lru_cache

In [55]:
def fib(n):
    print(n, end=', ')
    if n <= 1:
        return n
    else:
        return fib(n-2) + fib(n-1)

In [56]:
fib(10)

10, 8, 6, 4, 2, 0, 1, 3, 1, 2, 0, 1, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 7, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 6, 4, 2, 0, 1, 3, 1, 2, 0, 1, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 9, 7, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 6, 4, 2, 0, 1, 3, 1, 2, 0, 1, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 8, 6, 4, 2, 0, 1, 3, 1, 2, 0, 1, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 7, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 6, 4, 2, 0, 1, 3, 1, 2, 0, 1, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 

55

In [57]:
@lru_cache(maxsize=100)
def fib_cache(n):
    print(n, end=', ')
    if n <= 1:
        return n
    else:
        return fib_cache(n-2) + fib_cache(n-1)

In [58]:
fib_cache(10)

10, 8, 6, 4, 2, 0, 1, 3, 5, 7, 9, 

55

### Classes 

- Key idea is encapsulation into objects  
- Everything in Python is an object 
- Attributes and methods 
- What is self? 
- Special methods - double underscore methods 
- Avoid complex inheritance schemes - prefer composition 
- Learn “design patterns” if interested in OOP

In [59]:
(3.0).is_integer()

True

In [60]:
'hello world'.title()

'Hello World'

In [61]:
class Student:
    def __init__(self, first, last):
        self.first = first
        self.last = last
        
    @property
    def name(self):
        return f'{self.first} {self.last}'    

In [62]:
s = Student('Santa', 'Claus')

In [63]:
s.name

'Santa Claus'

### Enums

Use enums readability when you have a discrete set of CONSTANTS.

In [64]:
from enum import Enum

In [65]:
class Day(Enum):
    MON = 1
    TUE = 2
    WED = 3
    THU = 4
    FRI = 5
    SAT = 6
    SUN = 7

In [66]:
for day in Day:
    print(day)

Day.MON
Day.TUE
Day.WED
Day.THU
Day.FRI
Day.SAT
Day.SUN


### NamedTuple

In [67]:
from collections import namedtuple

In [68]:
Student = namedtuple('Student', ['name', 'email', 'age', 'gpa', 'species'])

In [69]:
abe = Student('Abraham Lincoln', 'abe.lincoln@gmail.com', 23, 3.4, 'Human')

In [70]:
abe.species

'Human'

In [71]:
abe[1:4]

('abe.lincoln@gmail.com', 23, 3.4)

### Data Classes

Simplifies creation and use of classes for data records. 

Note: NamedTuple serves a similar function but are immutable.

In [72]:
from dataclasses import dataclass

In [73]:
@dataclass
class Student:
    name: str
    email: str
    age: int
    gpa: float
    species: str = 'Human'

In [74]:
abe = Student('Abraham Lincoln', 'abe.lincoln@gmail.com', age=23, gpa=3.4)

In [75]:
abe

Student(name='Abraham Lincoln', email='abe.lincoln@gmail.com', age=23, gpa=3.4, species='Human')

In [76]:
abe.email

'abe.lincoln@gmail.com'

In [77]:
abe.species

'Human'

**Note**

The type annotations are informative only. Python does *not* enforce them.

In [78]:
Student(*'abcde')

Student(name='a', email='b', age='c', gpa='d', species='e')

### Imports, modules and namespaces 

- A namespace is basically just a dictionary 
- LEGB 
- Avoid polluting the global namespace

In [79]:
[x for x in dir(__builtin__) if x[0].islower()][:8]

['abs', 'all', 'any', 'ascii', 'bin', 'bool', 'breakpoint', 'bytearray']

In [80]:
x1 = 23

def f1(x2):
    print(locals())
    # x1 is global (G), x2 is enclosing (E), x3 is local
    def g(x3):
        print(locals())
        return x3 + x2 + x1 
    return g

In [81]:
x = 23

def f2(x):
    print(locals())
    def g(x):
        print(locals())
        return x 
    return g

In [82]:
g1 = f1(3)
g1(2)

{'x2': 3}
{'x3': 2, 'x2': 3}


28

In [83]:
g2 = f2(3)
g2(2)

{'x': 3}
{'x': 2}


2

### Loops 

- Prefer vectorization unless using numba 
- Difference between continue and break 
- Avoid infinite loops 
- Comprehensions and generator expressions

In [84]:
import string

In [85]:
{char: ord(char) for char in string.ascii_lowercase}

{'a': 97,
 'b': 98,
 'c': 99,
 'd': 100,
 'e': 101,
 'f': 102,
 'g': 103,
 'h': 104,
 'i': 105,
 'j': 106,
 'k': 107,
 'l': 108,
 'm': 109,
 'n': 110,
 'o': 111,
 'p': 112,
 'q': 113,
 'r': 114,
 's': 115,
 't': 116,
 'u': 117,
 'v': 118,
 'w': 119,
 'x': 120,
 'y': 121,
 'z': 122}

### Iterations and generators 

- The iterator protocol
    - `__iter__` and `__next__`
    - iter()
    - next()
- What happens in a for loop
- Generators with `yield` and `yield from`

In [86]:
class Iterator:
    """A silly class that implements the Iterator protocol and Strategy pattern.
    
    start = start of range to apply func to
    stop = end of range to apply func to
    """
    def __init__(self, start, stop, func):
        self.start = start
        self.stop = stop
        self.func = func
        
    def __iter__(self):
        self.n = self.start
        return self
    
    def __next__(self):
        if self.n >= self.stop:
            raise StopIteration
        else:
            x = self.func(self.n)
            self.n += 1
            return x

In [87]:
sq = Iterator(0, 5, lambda x: x*x)

In [88]:
list(sq)

[0, 1, 4, 9, 16]

### Generators

Like functions, but lazy.

In [89]:
def cycle1(xs, n):
    """Cuycles through values in xs n times."""
    
    for i in range(n):
        for x in xs:
            yield x

In [90]:
list(cycle1([1,2,3], 4))

[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]

In [91]:
for x in cycle1(['ann', 'bob', 'stop', 'charles'], 1000):
    if x == 'stop':
        break
    else:
        print(x)

ann
bob


In [92]:
def cycle2(xs, n):
    """Cuycles through values in xs n times."""
    
    for i in range(n):
        yield from xs

In [93]:
list(cycle2([1,2,3], 4))

[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]

Because they are lazy, generators can be used for infinite streams.

In [94]:
def fib():
    a, b = 1, 1
    while True:
        yield a
        a, b = b, a + b

In [95]:
for n in fib():
    if n > 100:
        break
    print(n, end=', ')

1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 

You can even slice infinite generators. More when we cover functional programming.

In [96]:
import itertools as it

In [97]:
list(it.islice(fib(), 5, 10))

[8, 13, 21, 34, 55]