***
<img src=https://www.python.org/static/img/python-logo.png align=right>
    
# Mastering Python 
Marcel Haas, June 2022

<br>
<br>
<br>
<br>
<br>
<br>



# Introductions

<img src=figures/Me_slide.PNG align=left>


Are you familiar with:
- 1: Variables, lists, dictionaries, tuples
- 2: Numpy, pandas, scikit-learn
- 4: Classes, functions, objects
- 8: Closures, multiple inheritance

# Table of contents

- *Introductions*

- The Python data model

- Measuring and improving performance

- Iterators, generators and decorators

- Object behavior and special methods

- Class inheritance to customize functionality



<font size="3">
<br>
<br>
<br>
<i>Disclaimer</i>: I do not always stick to conventions (like importing only on top of the notebook), but this is meant only for clarity and to make sure a small bit of code runs, without running the whole notebook first. </font>

<img src=https://cdn.onlinewebfonts.com/svg/img_472974.png align=right width=300 style=display:inline>

# What is *not* discussed

*(but might be interesting when talking about performance)*

- Visualization
- Multi-threading and multi-processing
- Debugging
- Best practices (testing, documentation, version control, packaging, environment maagement etc.)

# Resources

These were resources that did it *for me*. Just ingest a lot, if you like that sort of thing.

<img src=https://images-na.ssl-images-amazon.com/images/I/41R%2BfNX-akL._SX379_BO1,204,203,200_.jpg align=right width=200 style=display:inline>

- Fluent Python by Luciano Ramalho
- Python Data Science Handbook by Jake VanderPlas (whole book as notebooks on [github](https://github.com/jakevdp/PythonDataScienceHandbook))
- PyCon/PyData/SciPy youtube videos (really!)
- Dave Beazley's "Advanced Python Mastery" course.



<img src=http://4.bp.blogspot.com/-IRcqITPir5Q/T7KNf7yD4TI/AAAAAAAAAMk/caAJlEnY0IE/s320/Capture3.PNG align=right width=400 style=display:inline>

# The Python Data Model



In [None]:
print("Hello World!")

It is clear that `print()` is a function that can take strings as an argument.

How about the string itself?

In [None]:
print(type("Hello World!"))
'  '.join(dir("Hello world!"))   # dir lists all attributes and methods of an object

That string is an *object* (instance of a *class*) in itself.


Objects, or class instances, usually come with *attributes* and *methods* defined on top of it.

<img src=https://upload.wikimedia.org/wikipedia/commons/thumb/9/98/CPT-OOP-objects_and_classes_-_attmeth.svg/1280px-CPT-OOP-objects_and_classes_-_attmeth.svg.png align=center width=400>


- *Attributes* of an object: metadata that describe the object (= class instance)
- *Methods* of an object: functions that are defined in the class body of an object and act on the object itself

## Everything in Python is an object
Therefore: "The Python data model" == "The Python object model"

# The very basis of working with objects: assignment

Assignment does two things:
1. It creates an object (in this case with integer value 2)
2. It assigns variable `a` to that object.

*Note: the object is **not** assigned to variable `a`, but vice versa!*

In [None]:
a = 2

In [None]:
print(a)
print(type(a))

In [None]:
a = [2, 4, 6, 8]

The assignment does several things:
1. It creates the integer objects 2, 4, 6 and 8
2. It creates a list of these
3. It assigns variable `a` to that list

Is anyone suprised by this output?

In [None]:
a = [2, 4, 6, 8]
b = a
b[2] = 100
a

Remember: the assignment creates objects, a container and then assigns variable `a`!

In [None]:
a = [2, 4, 6, 8]
b = a

The second assignment assigns a second variable (named `b`) to the object referenced to with `a`.

The objects *do not change* in memory. 

When you assign a new value to the object *referenced by a variable in the container* `b[2]`

you make a change to an object that has another variable (`a[2]`) assigned to it.

In [None]:
b[2] = 100
a

What most people **want** to happen is that the objects are copied, with **new pointers** pointing at **new objects**.

This can be done, but needs to be made explicit:

In [None]:
b = a.copy()
b[1] = print
a[0] = 'New value!'
print(a)
print(b)

You might be tricked into thinking that Python (or pandas, or numpy, or ...) will do that for you...



In [None]:
import pandas as pd
df_a = pd.DataFrame({'x':[0, 1, 2], 'y':[110, 121, 130]})
df_b = df_a
df_b['x'][0] = 1022
df_a

This all becomes easier when thinking of variables as sticky notes, rather than boxes.

<img src=figures/labels_vs_boxes.png>

In [None]:
charles = {'name':'C.L.D', 'born':1832}
lewis = charles
alex = {'name':'C.L.D', 'born':1832}

In [None]:
charles = {'name':'C.L.D', 'born':1832}
lewis = charles
alex = {'name':'C.L.D', 'born':1832}

In [None]:
alex == charles

In [None]:
alex is charles

In [None]:
lewis is charles

# Quiz time!

In [None]:
a = 2
b = a
b = 6
print(a)

In [None]:
a = 2
b = a
b = print
print(b)

In [None]:
import numpy as np
a = np.array([2, 4, 6])
b = a
b[1] = 1006
print(a)

In [None]:
import numpy as np
a = np.array([2, 4, 6])
b = a
b = 106
print(a)

In [None]:
class ExampleObject():
    def __init__(self):
        self.name = "Guido van Rossum"

a = ExampleObject()
b = a
b.name = "Python Creator"
a.name

# Mutable and immutable types

Some objects, like tuples (but also floats, ints, strings!), are immutable.

**What does this immutability mean?**

The Online Python Tutor can be found at http://pythontutor.com

It lets you execute code line by line and see the effects:

<img src=figures/OPT_assign_tuple_1.png>

<img src=figures/OPT_assign_tuple_2.png>

<img src=figures/OPT_assign_tuple_3.png>

<img src=figures/OPT_assign_tuple_4.png>

In [None]:
b = (1, 's', [10, 20])
b[2].extend([30, 40])
print(b)

In [None]:
b[2] += [50, 60]
b[2] = b[2] + [50, 60]

In [None]:
print(b)

A bit more about nested lists, accessing and assigning.

<img src=figures/OPT_nestedlist_function.png>

<img src=figures/OPT_nestedlist_newassignment.png>

`c` was not referring to a list, but to an object!

`c` is re-assigned to a new object!

# Dis does this too

In [None]:
import dis
x = [1, 2, 3]
dis.dis('for _ in x: pass')

But in slightly more obscure language, I'm afraid...

# Exercise time!

These exercises do not necessarily connect to the material we just discussed (that would be too easy). They do result in cute, Pythonic code examples. The answers can be loaded by uncommenting the non-empty code cell. Enjoy!

1. Create two objects, assign them to variables `a` and `b`. Then, reverse their names (new `a` should give you the result that `b` first gave you and vice versa).

2. Create a list with 4 or more elements. Assign, in one line, the first and the last to variables `first` and `last` and all other ones to `other`.

3. Transpose this "matrix", without the use of packages like math, numpy and scipy:
```
x = [[31, 17],
        [40 ,51],
        [13 ,12]]
```
4. [BONUS]: Create a dictionary and a function to decrypt a [Caesar ciphered](https://en.wikipedia.org/wiki/Caesar_cipher) text given a shift in the alphabet. What is the likely shift of this word: `zvsbapvu`?

In [None]:
import os
#to_include = os.path.join('solutions', 'data_constructs.py')
#%load $to_include

# Measuring and improving performance

<img src=https://freesvg.org/img/Performance--cyberang3l.png align=right width=600  style=display:inline>

- IPython magic

- Timing and profiling code

- Vectorizing code for speed




#  IPython magic!

<img src=https://res.cloudinary.com/practicaldev/image/fetch/s--D-aJRe8W--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/d7oeukl09h4vemh75dl0.png align=right width=600>

Lines acting on the next line or the whole cell.

Can be, and are, custom made.

Work only in IPython and (therefore) Jupyter.

In [None]:
% # Magic line!

In [None]:
%%  # Cell magic!

In [None]:
%time test = [x + y for x, y in np.random.random(size=(100,2))]

In [None]:
%%timeit                                             # You want only one run? Remove 'it'!
import time
time.sleep(2)
print("Sleeping is awesome.")

In [None]:
# If you only want to know about the run-time of some lines...
import time
x = 0
y = 0
%time x = [time.sleep(i/1000) for i in range(12,144,3)]
print("Let's not time this")
%time print("Let's time this, though!")

<img src=https://miro.medium.com/max/500/0*krNxs9ggOE1ToVgt.png align=right width=300>

# Magic assignment

Bring out your wand, run the cell below and pick one that you don't know yet.

Read about it online and prepare to explain something about.

Put the one you adore most in the chat!

In [None]:
%lsmagic

# SnakeViz for profiling

<br>
<br>
<br>
<br>


(It's Python, after all....)

In [None]:
%load_ext snakeviz

In [None]:
%%snakeviz -t
# -t puts it in a new tab (without, it won't work in Rise.js)
import numpy as np
a = np.random.random(size=10000)
b = [2*aa for aa in a]
c = 2 * a
d = []
for i in range(len(a)):
    d.append(2*a[i])

### Snakeviz examples

From the [documentation page](https://jiffyclub.github.io/snakeviz/):

<img src="figures/snakeviz.png" align="left"/><img src="figures/snakeviz_sunburst.png" align="left"/>

# Vectorizing code and why numpy is your friend

In [None]:
x = [2, 4, 6]
print(2 * x)

Which is the behavior that exactly nobody ever needs.

### Hence, numpy

<img src=figures/numpy.png>

Numpy is the basis for basically all numerical computing in Python.

<font size="3">
<br>
<br>
<br>
(Image from David Beazley - EuroScipy 2012 Bruxelles)
</font>

### A short history of numpy

<br>
<br>

<img src=figures/numpy_history.png>

<font size="3">
<br>
<br>
<br>
(Image from Travis Oliphant - Scipy 2012 Tokyo)
</font>

### Learn to use numpy, it pays off

Pandas Series are basically just numpy arrays with a named index.

Pandas DataFrames are just collections of Series with a common index.

Matplotlib converts everything to arrays before doing calculations and plotting.

...


In [None]:
import numpy as np
my_first_array = np.array([42, 100])
print(my_first_array)
my_first_array

Arrays are constructed from core Python collections.

<img src=https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/NumPy_logo.svg/775px-NumPy_logo.svg.png align=right width=300 style=display:inline>

# 4 Strategies to speed up your code

1. Use numpy's **ufuncs**
2. Use numpy's **aggregations**
3. Use numpy's **broadcasting**
4. Use numpy's **slicing, masking and fancy indexing**

(Material adapted from a [talk by Jake VanderPlas](https://www.youtube.com/watch?v=EEUXKG97YRw) @SciPy2015)

### Array operations, or vector operations, are what you often might want.

In [None]:
my_list = [42, 100, 1.5]
added_list = [a + 5 for a in my_list]
print(added_list)

In [None]:
import numpy as np
my_first_array = np.array(my_list)
print(my_first_array + 5)

### Where did the loop go?

In [None]:
print(my_first_array * 2)
print(my_first_array + np.array([2, 111, 0.01]))

# Is this actually better?

In [None]:
a = list(range(100000))
%timeit [val + 5 for val in a]
a = np.array(a)
%timeit a + 5

<img src=figures/Python_speed.jpg align=right width=600 style=display:inline>

# The reason:

## What makes Python *fast*
(for development)
## is what makes Python *slow*
(for execution)

# So what are these *ufuncs* you're speaking of?

- Arithmetic: +, -, \*, /, ** etc.
- Bitwise: &, |, ~, >> etc.
- Comparisons: <, >, ==, != etc.
- Trigonomteric, exponential etc.: np.sin(), np.cos(), np.exp(), np.log10() etc.
- And a whole bunch of `scipy.special` functions.

In [None]:
# Quiz intermezzo!
print(11 & 9)

# Strategy number 2: numpy aggregations

In [None]:
from random import random
c = [random() for i in range(100000)]
%timeit min(c)
c = np.random.random(size=100000)
%timeit c.min()
%timeit min(c)

*Methods* like `.min()`, `.max()`, `.abs()`, `.sum()`, `.mean()`, `.std()`, etc. Use them!

<img src=https://miro.medium.com/max/481/1*cxfqR8NAj8HGal8CVOZ7hg.png align=right width=400 style=display:inline>

# Pandas strategies for speeding up


I am assuming you know about
1. loc .iloc etc.
2. Groupby operations and aggregations on them
3. Multi-Index
4. Boolean selection ("filters")
5. Merging, pivoting, melting and other data prep functionality

# You thought I was going to tell you?
### ...a.k.a. Exercise time!

Pick one or more from the following list. 

Google it. Understand it.

Make a cute code example for one and put in the chat which one you picked.

I might just let you explain it to all of us!

- `df.query()`
- `df.lookup()`
- `df.infer_objects()`
- `df.memory_usage()`
- `df.sample()`

# Back to numpy, and strategy #3: Broadcasting
### Arrays can have any number of dimensions

In [None]:
small_array = np.array([])
print(small_array)
print(f'\nShape of the array: {small_array.shape}')
small_array

In [None]:
multi_D = np.array( [ [ [2, 3], [12, 13] ], [ [44, 55], [100, 200]], [ [1000, 2000], [1005, 2007]] ] )
print(multi_D)
print(f'\nShape of the array: {multi_D.shape}')

### Array operations can come in several dimensions

In [None]:
multi_D = np.zeros(shape=(3, 2, 2))
one_D = np.ones(2)

print(multi_D)
print("-------------------")
print(one_D)
print("-------------------")
print(multi_D + one_D)

### Until it goes wrong...

In [None]:
print(np.array([0, 0]) + np.array([2, 111, 345]))

In [None]:
multi_D = np.zeros(shape=(3, 2, 2))
one_D = np.ones(3)
print(multi_D.T + one_D)   

In [None]:
print(multi_D.T.shape)
print(np.ones(3).reshape(1, 1, 3).shape)


Note the mention of the word **broadcast** here...

# Broadcasting all them dimenzionz....
(Strategy number 3)

<div align=center>
<br>
<br>
<br>
<i>"<b>Broadcasting</b> is a set of rules by which <b>ufuncs</b> operate on arrays with <b>different sizes or dimensions</b>"</i>

### Let's see if we can deduce these rules

In [None]:
import numpy as np
multi_D = np.zeros(shape=(3, 2, 2))
one_D = np.ones(3)
print(multi_D.T + one_D)
print(multi_D + one_D)  ## Error from before!

In [None]:
print(np.arange(3).reshape(3, 1) + np.arange(3))

In [None]:
1+np.arange(3)

Who can tell me the rules?


# The rules!


1. If array shapes differ in number of dimensions, **left**-pad the smaller shape with **ones**
2. If any dimension does not match, broadcast the dimension with size=1
3. If neither non-matching dimension is one, throw an error

<img src=https://jakevdp.github.io/PythonDataScienceHandbook/figures/02.05-broadcasting.png align=center width=800  style=display:inline>


In [None]:
print(np.shape(np.arange(3)))
print(np.shape(5))

### So why does this one go wrong?

In [None]:
multi_D = np.zeros(shape=(300, 200, 200))
one_D = np.ones(200)

print(multi_D + one_D)   

### Note: no memory overhead and no loops!

# Slicing, masking and fancy indexing

The last strategy for numpy!

In [None]:
# We already know some indexing, right?
marcels_list = [1, 1, 2, 3, 5, 8, 13]
print(marcels_list[0:1])
print(marcels_list[2:4])

In [None]:
# And for numpy, these still work!
marcels_array = np.array(marcels_list)
print(marcels_array[0])
print(marcels_array[2:4])

# But.... there's more!

Like, masking, for example:

In [None]:
odd_mask = np.array([True, True, False, True, True, False, True])
print(marcels_array[odd_mask])

Not too much magic in that, but...

... let's combine this with the ufuncs:

In [None]:
new_mask = (marcels_array < 4) | (marcels_array > 8)
print(marcels_array[new_mask])

# Fancy indexing

You can give a list of indices as well!

Let's say you want the first and third and fifth element of the array:

In [None]:
indices = [0, 2, 4]
print(marcels_array[indices])

In [None]:
print(marcels_list[indices])

# Let's get it all combined!

In [None]:
M = np.arange(6).reshape(3, 2)
N = np.arange(6).reshape(2, 3)
print(M)
print(N)

In [None]:
ind = [0,1]
print(M[ind])
print(N[ind])

In [None]:
M[[0,1]]

# Who will explain to me what happens here?

Please use as many terms from the last half hour that you can remember!

In [None]:
M = np.arange(6).reshape(2, 3)

print(M)

In [None]:
M.shape

In [None]:
filtertje = ((M != 2) & (M != 5))
M[filtertje]

In [None]:
filtertje

In [None]:
M[M.sum(axis=1) > 4, 1:]

# Beyond the scope of today

- Just in time compilation: `jit`, `numba`, ... (they come with IPython magic!)
- Multi-D named arrays: `xarray`, ...
- Moving to the GPU: `cupy`, `jax`, `pytorch`, `tensorflow`, ...
- Multiprocessing: `multiprocessing`, `mpi4py`, ...

<img src=https://svgsilh.com/svg/1970468.svg align=right width=200 style=display:inline>

# Exercise time!

## Performance is important

Exercises are to be found in the following cells... 

There is a short exercise about another theme around performance: **Memory** usage.

**The second exercise** tries to combine some of the numpy aspects discussed, make sure you take the time to try that one (before the next session starts)!

# Exercise: memory usage
Execution time isn't the only ingredient to "performance". Let's have a look at memory usage.

In the package `sys` you can interact with the operating system. `getsizeof()` is a function to get the physical memory used by an object (in bytes).

Investigate the memory usage by ever growing lists and dicts (start at them containing 1 element and let them grow to, say, 100). What do you observe?

In [None]:
import os
#to_include = os.path.join('solutions', 'memory_listdict.py')
#%load $to_include

# Exercise: Fix my code and show numpy muscle!

Below is a piece of code. It has issues (quite serious issues). Make it sweet. 

*The goal*: Randomly draw 1000 points in three dimensions. Calculate the index of the **nearest neighbor** for every point.

Compare the perfomance of both options. Where was the biggest win? Post your factor speedup in the chat at the start of the next session!

*Yes, you can do this with `sklearn`'s nearest neighbor stuff, but please don't.*


In [None]:
import numpy as np
npoints = 10000
ndim = 3
X = np.random.random((npoints, ndim)) # 1000 points in 3 dimensions

def find_nearest_neighbor(points):
    nearest_neighbor = []                 # A list for the nearest neighbors
    for i in range(npoints):              # For all points
        nearest = -1                      # Initialize a tracker for the nearest neighbor
        mindist = np.inf                  # Initialize a tracker for its distance
        for j in range(npoints):          # With all other points
            if i == j: continue           # No need to calculate distance to itself
            dist = 0                      # Initialize a distance
            for k in range(ndim):         # in all dimensions
                dist = np.sqrt(dist**2 + (X[i,k] - X[j, k])**2)  # Update distance
            if dist < mindist:            # If it's closer than the previous closest:
                nearest = j               # Write down this as nearest
                mindist = dist            # And take note of its distance
        nearest_neighbor.append(nearest)  # Fill up the array
    return nearest_neighbor

%time ngbs = find_nearest_neighbor(X)

# Spoiler: simple things like removing sqrt and checking 
# whether you even need to do the next dimension are not
# going to cut it...

In [None]:
10000, 1, 3
1, 10000, 3

In [None]:
%%timeit
import os
#to_include = os.path.join('solutions', 'nearest_neighbor.py')
#%load $to_include

<img src=figures/second_half.png align=right width=200 style=display:inline>

# What will we see in the second half?

1. Iterators, generators, decorators
2. Classes, objects and special methods
3. Class inheritance for pratical use

# Generator expressions

What, below, is the difference between `doubles` and `squares`?

In [None]:
numbers = [1, 3, 5, 7, 9, 11]
doubles = [n*2 for n in numbers]
squares = (n**2 for n in numbers)

In [None]:
for x in squares: print(x)   # And squares?

### Generators and lists are not the same thing

In [None]:
numbers = [1, 3, 5, 7, 9, 11]
doubles = [n*2 for n in numbers]
squares = (n**2 for n in numbers)

In [None]:
squares[1]   # And squares?

In [None]:
list(squares)[1]

In [None]:
squares = (n**2 for n in numbers)

In [None]:
sum([])

In [None]:
sum(squares)   # Run it twice!

In [None]:
numbers = [1, 3, 5, 7, 9, 11]
squares = (n**2 for n in numbers)
print(9 in squares)

In [None]:
list(squares)

<img src=https://nvie.com/img/iterable-vs-iterator.png align=right width=700 style=display:inline>

# Iterators go in one direction, 
# no reset button

In [None]:
iterator = iter(numbers)

In [None]:
next(iterator)

# Exercise: funky for-loops using iterators and while

Use a while statement to loop over an iterable and make it do an action. `for` is the forbidden statement. `iter()` is the obligatory function. It may be wise to use a `try/except` block.
Make a function that you can call like
```
numbers = [1, 3, 5, 7, 9, 11]
funky_for_loop(numbers, print)
```
that will print all those numbers.

In [None]:
import os
#to_include = os.path.join('solutions', 'funky_for_loop.py')
#%load $to_include

# All iterating in Python depends on the Iterator Protocol

In [None]:
for n in numbers: print(n)

In [None]:
first, second, *rest = numbers
print(rest)

In [None]:
print(*numbers)

In [None]:
unique_number = set(numbers)

# Iterators are iterables.

In fact, the iterable created from an iterator is the iterator itself:

In [None]:
numbers = [2, 4, 6, 8, 10]
iterator = iter(numbers)
iterator2 = iter(iterator)
iterator is iterator2

In [None]:
def is_iterable(iterator):
    return iter(iterator) is iterator

- An iterable is a thing you can iterate over
- In iterator is the thing that does the iteration

# Quiz time!  `True` or `False`?
In the chat: like indicates True, sad emoticon indicates False.

Your thinking time is the time I need to copy-paste.

### Generators are both iterators and iterable

### Lists are iterables

### All iterables are iterators, but not all iterators are iterable

### I'm confused by now

# Reasons to care about iterators
(in a discussion that started about generators)

1. Iterators are lazy (and so should you!)
2. Iterators allow for infinitely long iterables
3. Iterators can save us memory (and sometimes time)

So how do we create an iterator?

... with a generator function!

In [None]:
squares = (i**2 for i in range(1000))

or...

In [None]:
def square_all(numbers):
    for i in numbers: 
        yield i**2

<img src=https://svgsilh.com/svg/1970468.svg align=right width=150 style=display:inline>

### Thinking lazy makes for prettier code

This can be read as: exercise time. Below is an ugly piece of code. Use iterators/iterables/`iter()` to write a **generator function** that works like (first line is optional):

```python
from my_fancy_generators import with_previous

differences = []
for previous, current in with_previous(my_list):
    differences.append(current-previous)
```
Make sure it results in the same data structure as the example below!

In [None]:
my_list = [12, 14, 17, 21, 26]

differences = []
previous = my_list[0]
for current in my_list[1:]:
    differences.append(current-previous)
    previous = current
    
print(differences)

In [None]:
import os
#to_include = os.path.join('solutions', 'with_previous.py')
#%load $to_include

# Functions, generators and decorators

A function is a block of code that is meant to be used more than twice.

The general structure of a function is

```python
def function_name(*args, **kwargs):
    """Docstring of the function.
    Describes in- and output and functionality.    
    """
    
    #body of the function with code
    ...
    
    return "whatever_needs_returned"
```


<img src=https://upload.wikimedia.org/wikipedia/commons/thumb/3/3b/Function_machine2.svg/191px-Function_machine2.svg.png align=right width=400 style=display:inline>

# Functions are first-class objects

So...?

- Functions can be assigned to variables
- Functions can be stored in collections
- Functions can be passed as arguments
- Functions can return functions


# Functions can be assigned to variables

In [None]:
new_name = str
new_name(5)

# Functions can be stored in collections

In [None]:
a = [1, 2, 'string', print]
a[3]("Yes, I'm printing already")

# Functions can be passed as arguments

In [None]:
def execute_function(f, *args, **kwargs):
    
    return f(*args, **kwargs)

execute_function(print, 4)
execute_function(sorted, 'monkey' , **{'reverse':True})

# Functions can return functions

In [None]:
import math as m

def my_new_function(having_fun=False):
    """This function returns a function!
    And it's definitely having some fun....
    """
    return m.sqrt if having_fun else m.sin

python = 3.7
my_new_function(python)(16)



# Decorators: functions of functions that return functions

Which is pretty functional!

So functions can take functions and return functions,
<img src=https://c.pxhere.com/images/d0/bb/1d9d7b8fcf940e1c16e987a8a2b4-1586981.jpg!d align=right width=500 style=display:inline>
<br>
<br>
<br>
So... 
<br>
<br>
<br>

it can also take a function as argument, <br>
..... alter it, <br>
..... return it <br>
..... and assign it to the *same variable* as the original function!

Is that evil or awesome?

(Much of this material comes from [a talk by Reuven Lerner](https://www.youtube.com/watch?v=MjHpMCIvwsY) @PyCon2019)

# Take a function through a function and assign it to the same variable

```python
def this_will_be_a_decorator(a_function):
    ...
    return a_new_function

def my_function():
    pass

my_function = this_will_be_a_decorator(my_function)

```

### This will be done in one short line with an `@`:

```python
def this_will_be_a_decorator(a_function):
    ...
    return a_new_function

@this_will_be_a_decorator
def my_function():
    pass
```

# A simple example

In [None]:
def mydeco(func):
    def wrapper(*args, **kwargs):
        return f'{func(*args, **kwargs)}!!!'
    return wrapper

@mydeco
def add(a, b):
    return a + b

print(add(2, 2))
print(add(3, 3))

Note the *two* occurences of `def` and `return`.

<img src=figures/ReuvenLerner_1.png>

In [None]:
import time
def logtime(func):
    decorate_time = time.time()
    
    def wrapper(*args, **kwargs):
        start_time = time.time()
        result = func(*args, **kwargs)
        total_time = time.time() - start_time
        func_age = time.time() - decorate_time
        print(f'{func.__name__} took {total_time} s, {func_age:2.1f}s after decoration')
        return result

    
    return wrapper

In [None]:
@logtime
def add(a, b):
    return a+b

@logtime
def mult(a, b):
    return a*b

In [None]:
print(add(3, 4))

In [None]:
print(mult(4, 5))

# One more complication...

In [None]:
import time
def once_per_minute(func):
    last_invoked = 0

    def wrapper(*args, **kwargs):
        elapsed_time = time.time() - last_invoked
        if elapsed_time < 60:
            raise Exception("Not so quick!")

        last_invoked = time.time()

        return func(*args, **kwargs)

    return wrapper

@once_per_minute
def mul(a, b):
    return a*b

In [None]:
mul(2,3)

In [None]:
import time
def once_per_minute(func):
    last_invoked = 0

    def wrapper(*args, **kwargs):
        nonlocal last_invoked
        elapsed_time = time.time() - last_invoked
        if elapsed_time < 60:
            raise Exception("Not so quick!")

        last_invoked = time.time()

        return func(*args, **kwargs)

    return wrapper

@once_per_minute
def mul(a, b):
    return a*b

In [None]:
mul(2,3)

# Oh no, one *more* complication...
We want our function to be able to run at most once per `n` seconds.

So we need another extra layer.

In [None]:
def once_per_n(n):
    def middle(func):
        last_invoked = 0

        def wrapper(*args, **kwargs):
            nonlocal last_invoked

            elapsed_time = time.time() - last_invoked
            if elapsed_time < n:
                raise Exception(f"Only {elapsed_time} has passed")

            last_invoked = time.time()

            return func(*args, **kwargs)

        return wrapper
    return middle

@once_per_n(5)
def slow_add(a, b):
    time.sleep(8)
    return a + b

print(slow_add(2, 2))
print(slow_add(3, 3))
print(slow_add(4, 4))

<img src=https://svgsilh.com/svg/1970468.svg align=right width=200>

# Exercise time!

You're free to do either of these exercises. You can try both if you're quick.

1. Write a decorator that can decorate functions and log their function name as well as the time they run in one text file for all decorated functions. This log file should contain the moment the function ran, the function name and the run time. Use it to log functions that take a while, using `time.sleep()`. Create a log file with at least two different functions, that are both called at least twice.
2. Write a decorator that caches results of a function that might (or might not) take a long time to run. This cache result should be used when available, otheriwse the function does what it does and stores the result in the cache, based on the `*args` and `**kwargs`, for later retrieval. Think of a nice function to use this for. Can you access the contents of the cache?

In [None]:
import os
#to_include = os.path.join('solutions', 'logtime.py')
#%load $to_include

In [None]:
import os
#to_include = os.path.join('solutions', 'caching.py')
#%load $to_include

# Classes and Object Oriented Programming

<img src=https://upload.wikimedia.org/wikipedia/commons/thumb/6/62/CPT-OOP-objects_and_classes.svg/1280px-CPT-OOP-objects_and_classes.svg.png align=right width=400>

A **class** is a block of code with functionality and descriptions that "belong together".

An **object** is an instance of a class.

# The class, objects, methods, attributes

<img src=https://upload.wikimedia.org/wikipedia/commons/thumb/9/98/CPT-OOP-objects_and_classes_-_attmeth.svg/1280px-CPT-OOP-objects_and_classes_-_attmeth.svg.png align=right width=400>

Of *one* class, you can have multiple instances: the objects.

An object *can* have methods and/or attributes.

In [None]:
class ExampleObject():
    """docstring"""
    pass



instance = ExampleObject()

### Quiz:

1. How many classes are there?
2. How many objects are there?
3. How many methods have I defined?
4. How many attributes are there?

In [None]:
dir(instance)

# Special methods in classes

Under-used functionality with **__dunder__**-methods!


*For instance:*
The `+` sign does something with objects, but how does the interpreter know what to do?

In [None]:
print(3 + 4)
print(['monkeys', 'like'] + ['bananas'])

# And what to think of...

In [None]:
print('Pretty print?')
'ugly print'

In [None]:
import pandas as pd
df = pd.DataFrame({'x':[1, 2], 'y':[3, 40]})
print(df)
df

# \_\_dunder\_\_ methods

In [None]:
import numpy as np

class Complex():
    def __init__(self, real, imaginary, make_polar=False):
        self.r = real
        self.i = imaginary
        if make_polar: self.to_polar()
    
    def to_polar(self):
        self.R = np.sqrt(self.r**2 + self.i**2)
        self.theta = np.arctan(self.i/self.r)
        return self

In [None]:
x = Complex(3.0, -4.5)
print(x.r, x.i)
y = x.to_polar()
print(y.R)
print(x.R)
print(x.__doc__)
dir(x)

The idea: a user does not need to know about special/dunder methods. A developer does.

In [None]:
import pandas as pd
dir(pd.DataFrame)

<img src=http://resumbrae.com/ub/dms423_f05/20/fig2.jpg align=right width=300>

# Exercise time!

Do exercise one, as well as the number corresponding to your name (see below):

1. Create a class for **vectors**. Make sure it can handle n-dimensional vectors by taking an iterable describing the components (e.g. a list of components) and `len()` should of course give you the number of components. 

2. Make sure that addition is defined as exepected:
$\vec{v} + \vec{w} = (v_1+w_1, v_2+w_2, ...)$

3. Create a pretty printed representation for it, e.g. in the form of a list, but without the ugly square brackets.

4. Make sure that when you multiply two instances of the class, the result is the vector dot-product (or: inner product): $$\vec{v} \cdot \vec{w} = \sum_{i} v_iw_i$$

5. Add a method to calculate the magnitude of the vector (total length). Make sure that the `abs()` built-in function calls this method!

In [None]:
import os
#to_include = os.path.join('solutions', 'vector_class.py')
#%load $to_include

# Class inheritance


Classes that inherit from a parent class know about their attributes and methods

In [None]:
class Parent():
    def f1(self):
        print("parent")
        
class Child(Parent):
    def f2(self):
        print("child")

In [None]:
obj = Child()
obj.f1()
obj.f2()

In [None]:
class Parent():
    def __init__(self):
        self.name = "Parent"
        self.lastname = "Familyname"
        
    def f1(self):
        print("Method from parent")
        
class Child(Parent):
    def __init__(self):
        Parent.__init__(self)
        self.name = "Child"
        
    def f2(self):
        print("Method from child")
        
    def f1(self):
        print('Overwritten method of parent by child')

In [None]:
obj = Child()
print(obj.name, obj.lastname)
obj.f1()

## Example: scikit-learn's MinMaxScaler
It is a data scaler with a very simple definition: linearly scale the data from 0 to 1 by applying

$$X_{\rm scaled} = \frac{X - {\rm min}(X)}{{\rm max}(X) - {\rm min}(X)} = \frac{X - {\rm min}(X)}{{\rm range}(X)}$$

In [None]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()


In [None]:
print(scaler)
data = [[-1, 2], [-0.5, 6], [0, 10], [1, 18]]
print(data)
scaled = scaler.fit_transform(data)
print("Scaled:\n",scaled)

The `scaler` object now has a bunch of attributes that it didn't know about before the `.fit()` method was applied:

In [None]:
print(scaler.data_min_)
print(scaler.data_max_)
print(scaler.data_range_)
print(scaler.n_samples_seen_)

There are also methods like `.transform()` and `.inverse_transform()` that can only exist after the class has seen the data to train on.

These methods can be used on new data:

In [None]:
other_scaled_data = scaler.transform([[0.5, 4]])
other_scaled_data

**What if this new data is outside the original range of the data?**

When used on data outside the original range:

In [None]:
other_scaled_data = scaler.transform([[1.5, 0]])
other_scaled_data

The transformer does what it promises to do, but now the scaled data range is not between 0 and 1 anymore.

You could easily fix that by clipping at 0 and 1, but there is a more elegant way to do this.

Let's create our own scaler class that act mostly like `MinMaxScaler()`, but with some features adapted: 

In [None]:
class MyOwnScaler(MinMaxScaler):
    pass

new_scaler = MyOwnScaler()
newly_scaled = new_scaler.fit_transform(data)
print(newly_scaled)

So far, `MyOwnScaler()` behaves exactly like `MinMaxScaler()` because it *inherited* everything from it, and no attributes or methods were overwritten or added.

In [None]:
class MyOwnScaler(MinMaxScaler):
    def transform(self, newdata):
        transformed = super().transform(newdata)
        transformed = np.clip(transformed, 0, 1)
        return transformed

new_scaler = MyOwnScaler()
newly_scaled = new_scaler.fit_transform(data)
print(newly_scaled)

In [None]:
print(scaler.transform([[1.5, 0]]))
print(new_scaler.transform([[1.5, 0]]))

By **over-writing** the `.transform()` method you can make it do whatever you like.

Using `super()`, it still uses the parent's class transform!

*Warning, though....:* The reverse transform won't work well, now...
<br>
<br>
<br>
<br>

And what about `.fit_transform()`?

In [None]:
class MyOwnScaler(MinMaxScaler):
    def transform(self, newdata):
        transformed = super().transform(newdata)
        transformed = np.clip(transformed, 0, 1)
        print("I'm using this transform!")
        return transformed

scaler = MyOwnScaler()
scaled = scaler.fit_transform(data)

Apparently, the `.fit_transform()` method calls `.transform()`.

Try figuring *that* out from the code!

(in fact, in this case that isn't hopelessly difficult, from the [documentation page](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html).)
<img src=figures/scikit-learn-logo-small.png align=right>



<img src=https://p0.pikist.com/photos/455/154/balloons-clouds-word-clouds-abstract-dialogue-discussion-talk-graphic-background.jpg align=right width=300>

# Exercise time!


You're not making anything, that would take too long. Instead, let's brainstorm on the use of this.

### What would be your use case for creating a class with inheritance?

# Special class decorators

Can we also decorate classes?

... It's Python, of course we can.

In [None]:
import time

def object_birthday(c):
    def wrapper(*args, **kwargs):
        o = c(*args, **kwargs)
        o._created_at = time.time()
        return o
    return wrapper

@object_birthday
class Foo():
    def __init__(self, x, y):
        self.x = x
        self.y = y

f = Foo(10, [10, 20, 30])
print(f)
print(f._created_at)

# Special classes: The dataclass

*Disclaimer*: Python 3.7 and up! See [PEP 557](https://www.python.org/dev/peps/pep-0557/) for the rationale, and full definition.

Some default methods are implemented for you, behind the scenens. E.g.:

In [None]:
class RegularCard:
    def __init__(self, rank, suit):
        self.suit = suit
        self.rank = rank
        

RegularCard('Q', 'Hearts') == RegularCard('Q', 'Hearts')

In [None]:
from dataclasses import dataclass

@dataclass
class DataClassCard:
    rank: str
    suit: str
        
DataClassCard('Q', 'Hearts') == DataClassCard('Q', 'Hearts')

# Type hints are always optional

In 
```python
@dataclass
class DataClassCard:
    rank: str
    suit: str
```
there is a hint for the type of both attributes, but:

In [None]:
DataClassCard('Q', 12)

(but you can of course do the type checking yourself!)

Type hints are also available for function definitions, by the way...:
```python
def func(a: int, b:str) -> str:
    return str(a)+b
```
or even
```python
def add_vecs(a: Vector, b:Vector) -> Vector:
    return a.__add__(b)
```

Equally optional, equally non-binding

# Freedom within dataclasses

Dataclasses come with some extra functionality. 

Never used by default.

Always possible for the user to exploit!

In [None]:
from dataclasses import dataclass, field
from typing import List, Any

@dataclass(frozen=True)  # Make it immutable!
class Position:
    name: Any
    lon: float = field(default=0.0, metadata={'unit': 'degrees'})
    lat: float = field(default=0.0, metadata={'unit': 'degrees'})
        


# Other class decorators that you might find useful

1. `@staticmethod`: Similar to `@classmethod`: no instantiated object necessary:

In [None]:
class Student(object):

    @staticmethod
    def is_full_name(name_str):
        names = name_str.split(' ')
        return len(names) > 1

print(Student.is_full_name('Scott Robinson'))
print(Student.is_full_name('Scott'))

2. `@abstractmethod`: Part of the slew of Abstract Base Classes: for your journey in the future.

# Time for a recap!

- Iterators
- Generators
- Decorators
- Class inheritance

# Thank you!


<br>
<br>
Marcel Haas, marcel@marcelhaas.com 
<br>
<br>
<br>
<br>
<br>
<br>
<br>

--------------------

<div align=right><i>This material is meant to be used exclusively for educational purposes. Original authors of the material are known to the author.