# CME 193 - Lecture 8

Here's what you've seen over the past 7 lectures:
* Python Language Basics
* NumPy - Arrays/Linear Algebra
* SciPy - Sparse Linear Algebra/Optimization
* DataFrames - Reading & Maniputlating tabular data
* Scikit learn - Machine Learning Models & use with data
* Ortools - More Optimization

You've now seen some tools for scientific computing in Python.  How you add to them and what you do with them is up to you!

![python](https://imgs.xkcd.com/comics/python.png)

(Maybe you've also had a bit of [this](https://xkcd.com/1987/) experience)

## Today

1. We'll revisit object oriented programming in Python
2. We'll talk a bit about image processing

# Object Oriented Programming - II

Recall some of the basic terminology of [object oriented programming](https://en.wikipedia.org/wiki/Object-oriented_programming)
* **Classes** are templates for objects (e.g., "the Integers" is a class)
* **Objects** are specific instances of a class (e.g., "2 is an integer")
* **Methods** are fuctions associated to objects of a class
    * the "the square of 2" may be expressed as `2.square()` (returns 4)
    * the "addition of 1 to 2" may be expressed as `2.add(1)` (returns 3)
    * the "name of 2" may be expressed as `2.name()` (returns "two")

Today we'll use an extended example of univariate functions
$$f:\mathbb{R} \to \mathbb{R}$$
to see how you might use object oriented programming for something like automatic differentiation, classical machine learning, or deep learning.  Yes - you can maybe use a library like [Tensorflow](https://www.tensorflow.org/), [Keras](https://keras.io/), or [PyTorch](https://pytorch.org/), but it's more fun to understand how to do it yourself (and then maybe use someone else's fancy/high quality implementation).

First thing to remember is that everything in Python is an object, even functions.

In [None]:
def f(x):
    return x

isinstance(f, object)

In [None]:
isinstance(isinstance, object)

In [None]:
isinstance(object, object)

Once you create an object, it lives somewhere on your computer:

In [None]:
id(f) # memory address on your computer

In [None]:
x = 1000
id(x)

You can check if two variables are referring to the same address using `is`

In [None]:
z = x
print("equality:     {}".format(z == x))
print("same address: {}".format(z is x))

In [None]:
y = 1000
print("equality:     {}".format(y == x))
print("same address: {}".format(y is x))

## Univariate functions

Let's consider functions that send a real number to a real number
$$f:\mathbb{R} \to \mathbb{R}$$
Perhaps these functions have some parameters $\theta$, such as
$$f(x; \theta) = \theta x$$
(a linear function with slope $\theta$), or
$$g(x;\theta) = \theta_1 x + \theta_0$$
(linear function with slope $\theta_1$ and intercept $\theta_0$), or
$$h(x;\theta) = \theta_0 \exp(-\theta_1 x^2)$$
and so on.  The point is that we can parameterize functions that have a similar form, and that there may be different numbers of parameters depending on the function.

What might we want to be able to do with a function?
1. Evaluate it (`y = f(x)`)
2. Print it as a string `f(x) = "3x + 2"`
3. Calculate a gradient
4. add/multiply/exponentiate...

We could think of doint the above with methods like `f.evaluate(x)`, and `f.name()`, but we'll use the special methods `__call__` and `__str__` to be able to do things like call `f(x)` and `format(f)` just as we might do so with built-in objects.  You can see the different special methods available to overload [here](https://docs.python.org/3/reference/datamodel.html)

We're going to create an abstract function class that all the other classes we create will inherit from.  If you haven't seen object oriented programming before, think of this as a way to promise all our functions will be able to do certain things (or throw an error).  We'll provide default implementations for some methods (these will get filled in later), and have some methods that will need to be implemented differently for each sub-class.

For more on classes and inheritance, see [here](https://thepythonguru.com/python-inheritance-and-polymorphism/).  The idea of giving objects methods with the same name is one form of [polymorphism](https://stackoverflow.com/questions/1031273/what-is-polymorphism-what-is-it-for-and-how-is-it-used) - we'll see how this is actually quite useful and allows you to do things that would be difficult without object-oriented programming.

In [None]:
class AbstractUnivariate:
    def __init__(self):
        raise NotImplementedError
        
    def __call__(self, x):
        raise NotImplementedError
        
    def fmtstr(self, x="x"):
        raise NotImplementedError
        
    def __str__(self):
        return self.fmtstr("x")
        
    def gradient(self):
        raise NotImplementedError
        
    # the rest of these methods will be implemented when we write the appropriate functions
    def __add__(self, other):
        return SumFunction(self, other)
    
    def __mul__(self, other):
        return ProdFunction(self, other)
    
    def __rmul__(self, other):
        return ScaleFunction(other, self)
    
    def __pow__(self, n):
        return ComposeFunction(PowerFunction(1, n), self)

Now, to create a class that inherits from our abstract class, we just use the following syntax:

In [None]:
class ConstantFunction(AbstractUnivariate): # AbstractUnivariate indicates class to use for inheritance
    def __init__(self, c):
        self.c = c

In [None]:
f = ConstantFunction(3)

We can see there's a class hierarchy now:

In [None]:
print(isinstance(f, ConstantFunction))
print(isinstance(f, AbstractUnivariate))
print(isinstance(f, object))

If we haven't implemented the methods we promised we would, we'll get errors

In [None]:
f(1)

Let's go ahead an implement the promised methods

In [None]:
class ConstantFunction(AbstractUnivariate):
    def __init__(self, c):
        self.c = c
    
    def __call__(self, x):
        return self.c
    
    def fmtstr(self, x="x"):
        return "{}".format(self.c)
    
    # __str__(self) uses default from abstract class
    
    def gradient(self):
        return ConstantFunction(0)
    
    # we inherit the other functions from the AbstractUnivariate class

In [None]:
f = ConstantFunction(3)
print(f)
print(f(1))
print(f(2))
print(f.gradient())

What is it this object does?  It represents the constant function
$$f: x \mapsto c$$

Let's do something a little less trivial.  Now we'll implement
$$f: x \mapsto ax + b$$

In [None]:
class AffineFunction(AbstractUnivariate):
    def __init__(self, a, b):
        self.a = a
        self.b = b
        
    def __call__(self, x):
        return self.a * x + self.b
    
    def fmtstr(self, x="x"):
        s = "{}".format(x)
        if self.a != 1:
            s = "{}*".format(self.a) + s
        if self.b != 0:
            s = s + " + {}".format(self.b)
        return s
    
    def gradient(self):
        return ConstantFunction(self.a)

In [None]:
f = AffineFunction(1, 1)
print(f)
print(f(2))
print(f.gradient())
print(isinstance(f, AbstractUnivariate))

# Exercise 1

Implement classes for the following univariate function templates:
1. `QuadraticFunction` -- $f: x \mapsto a x^2 + bx + c$
2. `ExponentialFunction` -- $f: x \mapsto a e^{bx}$
3. `PowerFunction` -- $f: x \mapsto ax^n$

Make sure to return derivatives that are also `AbstractUnivariate` sub-classes.  Which class can I use to represent $f: x \mapsto x^{-1}$?

In [None]:
# your code here
from math import * # for math.exp

# More functions

We can do more than just encode standard functions - we can scale, add, multiply, and compose functions.

Scaling a function:
$$ g(x)= a *f(x)$$

In [None]:
class ScaleFunction(AbstractUnivariate):
    def __init__(self, a, f):
        self.a = a
        if isinstance(f, AbstractUnivariate):
            self.f = f
        else:
            raise AssertionError("must input an AbstractUnivariate function")
        
    def __call__(self, x):
        return self.a * self.f(x)
    
    def fmtstr(self, x="x"):
        if self.a == 1:
            return self.f.fmtstr(x)
        else:
            return "{}*({})".format(self.a, self.f.fmtstr(x))
    
    def gradient(self):
        return ScaleFunction(self.a, self.f.gradient())

In [None]:
f = ExponentialFunction(1, 2)
print(f)
g = ScaleFunction(2, f)
print(g)
print(g.gradient())
print(g(1))

Sum and product of two functions
$$ h(x) = f(x) + g(x)$$
$$ h(x) = f(x) * g(x)$$

In [None]:
class SumFunction(AbstractUnivariate):
    def __init__(self, f, g):
        if isinstance(f, AbstractUnivariate) and isinstance(g, AbstractUnivariate):
            self.f = f
            self.g = g
        else:
            raise AssertionError("must input AbstractUnivariate functions")
        
    def __call__(self, x):
        return self.f(x) + self.g(x)
    
    def fmtstr(self, x="x"):
        return "{} + {}".format(self.f.fmtstr(x), self.g.fmtstr(x))
    
    def gradient(self):
        return SumFunction(self.f.gradient(), self.g.gradient())

In [None]:
f = ExponentialFunction(1, 2)
g = AffineFunction(2, 1)
h = SumFunction(f, g)
print(h.fmtstr(x="y"))
print(h(-1))

In [None]:
print(h.gradient())

In [None]:
class ProdFunction(AbstractUnivariate):
    def __init__(self, f, g):
        if isinstance(f, AbstractUnivariate) and isinstance(g, AbstractUnivariate):
            self.f = f
            self.g = g
        else:
            raise AssertionError("must input AbstractUnivariate functions")
        
    def __call__(self, x):
        return self.f(x) * self.g(x)
    
    def fmtstr(self, x="x"):
        return "({}) * ({})".format(self.f.fmtstr(x=x), self.g.fmtstr(x=x))
    
    # product rule (f*g)' = f'*g + f*g'
    def gradient(self):
        return SumFunction(ProdFunction(self.f.gradient(),self.g), ProdFunction(self.f, self.g.gradient()))

In [None]:
f = ExponentialFunction(1, 2)
g = AffineFunction(2, 1)
h = ProdFunction(f, g)
print(h)
print(h(-1))

In [None]:
print(h.gradient())

Compose Functions:
$$h(x) = (g \circ f)(x) = g(f(x))$$

In [None]:
class ComposeFunction(AbstractUnivariate):
    def __init__(self, g, f):
        if isinstance(f, AbstractUnivariate) and isinstance(g, AbstractUnivariate):
            self.f = f
            self.g = g
        else:
            raise AssertionError("must input AbstractUnivariate functions")
        
    def __call__(self, x):
        return self.g(self.f(x))
    
    def fmtstr(self, x="x"):
        return self.g.fmtstr(x="({})".format(self.f.fmtstr(x)))
    
    # chain rule : g(f(x))' = g'(f(x))*f'(x)
    def gradient(self):
        return ProdFunction(ComposeFunction(self.g.gradient(), self.f), self.f.gradient())

In [None]:
f = PowerFunction(1,2)
print(f.fmtstr("x"))
g = ComposeFunction(f,f)
print(g)
h = ComposeFunction(g, f)
print(h)
print(h(2)) # 2^(2*2*2) = 2^8 = 256

In [None]:
f = PowerFunction(1,2)
g = ExponentialFunction(0.5, -1)
h = ComposeFunction(g, f)
print(h)
print(h.gradient())

## Operator overloading makes everything better

Recall how when we wrote the AbstractUnivariate class, we included some default methods
```python
class AbstractUnivariate:
    # ...
    
    # the rest of these methods will be implemented when we write the appropriate functions
    def __add__(self, other):
        return SumFunction(self, other)
    
    def __mul__(self, other):
        return ProdFunction(self, other)
    
    def __rmul__(self, other):
        return ScaleFunction(other, self)
    
    def __pow__(self, n):
        return ComposeFunction(PowerFunction(1, n), self)
```

If you think it is clunky to keep writing `SumFunction` or `ProdFunction` everywhere, you're not alone.  Again, you can use the special methods above to [overload operators](https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types)

In [None]:
f = ExponentialFunction(1, 2)
g = AffineFunction(2, 1)
print("f = {}".format(f))
print("g = {}".format(g))
print("f + g = {}".format(f+g))
print("f * g = {}".format(f*g))
print("f^2 = {}".format(f**2))
print("2*g = {}".format(2*g))

In [None]:
f = ExponentialFunction(1, 2)
g = AffineFunction(2, 1)
h = f*g
print(h.gradient())

## What's going on?

Because we thought ahead to define addition, multiplication, scaling, and powers in our `AbstractUnivariate` class, every sub-class will implement those methods by default **without needing to write any extra code**.

If we hadn't done this, we would have had to copy and paste the same thing into every class definition to get the same behavior, **but we don't need to**.  In fact, if we write a new basic univariate function class, e.g. `LogFunction`, we get addition, multiplication, etc., for free!

## Symbolic Functions

Just for fun, let's create an `AbstractUnivariate` sub-class, which just holds a placeholder symbolic function

In [None]:
class SymbolicFunction(AbstractUnivariate):
    def __init__(self, name):
        if isinstance(name, str):
            self.name=name
        else:
            raise AssertionError("name must be string")
        
    def __call__(self, x):
        return "{}({})".format(self.name, x)
    
    def fmtstr(self, x="x"):
        return self.name + "({})".format(x)
    
    # product rule (f*g)' = f'*g + f*g'
    def gradient(self):
        return SymbolicFunction(self.name + "'")
    

In [None]:
f = SymbolicFunction("f")
print(f)
print(f.gradient())
g = SymbolicFunction("g")
print(g  + f)

Now we can remind ourselves of product rule, and chain rule (which we encoded in `ProductFunction` and `ComposeFunction` classes)

In [None]:
f = SymbolicFunction("f")
g = SymbolicFunction("g")
print((f*g).gradient())
h = ComposeFunction(g, f)
print(h.gradient())

And we can derive quotient rule

In [None]:
f = SymbolicFunction("f")
g = SymbolicFunction("g")
h = f * g**-1
print(h)
print(h.gradient())

You can also add symbolic functions to non-symbolic ones:

In [None]:
f = SymbolicFunction("f")
g = AffineFunction(1, 2)
h = f + g
print(h)
print(h.gradient())

## Summary

You're now on your way to having your own automatic differentiation library!  Or your own symbolic computation library!  You can probably see lots of ways to extend and improve what you've seen here:
* Support Multivariate Functions
* Add more "basic functions" such as trig functions, etc.
* Reduce expressions when you are able to
* ...

Yes, there are many libraries that do this very thing.  Keywords are "autodifferentiation", "symbolic math".  This sort of thing is used extensively in deep learning libraries, as well as optimization libraries.

### How was Object Oriented Programming Useful?

**Class Inhertiance** allowed you to get functions like addition and multiplication for free once you defined the class everything inherited from

**Polymorphism** enabled you to use any combination of `AbstractUnivariate` functions and still evaluate them, calculate derivatives, and format equations. Everyone played by the same rules.

**Encapsulation** let you interact with functions without worrying about how they are implemented under the hood.

If you think back to HW1, we implicitly used polymorphism in the power method function (e.g., matrix-vector multiply always uses `dot()` no matter which class we're using)

# Exercise 2

Ignoring our `SymbolicFunction` class, any sub-class of `AbstractUnivariate` is a real function $f:\mathbb{R} \to \mathbb{R}$ that we can evaluate using `f(x)` syntax.  One thing that you may wish to do is find roots of your function: $\{x \mid f(x) = 0\}$.

One very classical algorithm for doing this is called [Newton's Method](https://en.wikipedia.org/wiki/Newton%27s_method), and has the basic pseudocode:
```
initialize x_0
while not converged:
    x_{k+1} = x_k - f(x_k)/f'(x_k)
```

Write a function that implements Newton's method on any `AbstractUnivariate` function

Hint: use the `gradient()` method to get a function for derivatives

In [None]:
def find_root(f, x0=0.0, tol=1e-8):
    if isinstance(f, SymbolicFunction):
        raise AssertionError("can't handle symbolic input")
    elif not isinstance(f, AbstractUnivariate):
        raise AssertionError("Input must be AbstractUnivariate")
    x = x0
    # your code here
    
    return x

# Deep Learning

After the first part of this lecture, you now have a pretty good idea of how to get started implementing a deep learning library.  Recall that above we considered functions of the form
$$f(x; \theta): \mathbb{R} \to \mathbb{R}$$

To get to machine learning, you need to handle multivariate input and output
$$f(x; \theta):\mathbb{R}^p \to \mathbb{R}^k$$
You also need to be able to take the gradient of $f$ with respect to the parameters $\theta$ (which we didn't do in our `AbstractUnivariate` class, but is straightforward), and then you can do things like optimize a loss function using your favorite optimization algorithm.

In deep learning, we have the exact same setup
$$f(x; \theta):\mathbb{R}^p \to \mathbb{R}^k$$
What makes deep learning a "special case" of machine learning is that the function $f$ is the composition of several/many functions
$$f = f_n \circ f_{n-1} \circ \dots \circ f_1$$
This is what we mean by "layers", and you use chain rule to "backpropagate" gradients with respect to the parameters.

**Disclaimer** If you really want to learn to use a deep learning library, you really should go through several tutorials and learn about the different functions that are used (and *why* they are used).  This is beyond the scope of this course, but there are several courses at Stanford that are devoted to this.

## Deep Learning Libraries

Some popular libraries for deep learning are [Tensorflow](https://www.tensorflow.org/), [Keras](https://keras.io/), and [PyTorch](https://pytorch.org/).  Each has their strengths and weaknesses.  All of them do essientially the same thing: you define a function through composition using objects that are in many ways similar to what you just implemented.  Then you choose a loss function and start optimizing the parameters in these functions using something like stochastic gradient descent.

We'll do an example in PyTorch, since it is higher-level than Tensorflow, and perhaps the most "Pythonic" of the libraries.

```bash
conda install pytorch pillow
```

## PyTorch

What's a tensor?  Conceptually identical to numpy array.

We'll consider the following network
$$ x \xrightarrow{w_1} h \to ReLU(h) \xrightarrow{w_2} y$$
where $x$ is a 500-dimensional vector, $h$ is a 100-dimensional "hidden layer", and $y$ is a 10-dimensional vector.  $w_1$ and $w_2$ are linear transformations (matrices), and ReLU refers to the function
$$ReLU(x) = \begin{cases}
x & x > 0\\
0 & x \le 0
\end{cases}$$

In [None]:
import torch
from torch.autograd import Variable

dtype = torch.FloatTensor
# N - batch size
# D_in - x dimension
# H - h dimension"
# D_out - y dimension
N, D_in, H, D_out = 64, 500, 100, 10

# Setting requires_grad=False indicates that we do not need to compute gradients w.r.t var
# during the backward pass.
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad = False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad = False)

# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Variables during the backward pass.
w1 = Variable(torch.randn(D_in, H).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)

In [None]:
learning_rate = 1e-6
for t in range(10000):
  # Forward pass: compute predicted y using operations on Variables;
    y_pred = x.mm(w1).clamp(min=0).mm(w2) # clamp=ReLU
  
  # Compute and print loss using operations on Variables.
  # Now loss is a Variable of shape (1,) and loss.data is a Tensor of shape
    loss = (y_pred - y).pow(2).sum()


  # Use autograd to compute the backward pass. This call will compute the
  # gradient of loss with respect to all Variables with requires_grad=True.
    loss.backward()

  # Update weights using gradient descent; w1.data and w2.data are Tensors,
  # w1.grad and w2.grad are Variables and w1.grad.data and w2.grad.data are
  # Tensors.
    w1.data -= learning_rate * w1.grad.data
    w2.data -= learning_rate * w2.grad.data

    # Manually zero the gradients after running the backward pass
    w1.grad.data.zero_()
    w2.grad.data.zero_()
    print("Loss is: {}".format(loss.data.numpy()), end='\r')

print()
print("Final loss is {}".format(loss.data[0]))

## That's still fairly cumbersome

- When building neural networks, arrange the computation into layers, some of which have learnable parameters which will be optimized during learning.
- Use the ``` torch.nn ``` package to define your layers
- Create custom networks by subclassing the nn.Module
- Really clean code!
- Just create a class subclassing the nn.Module
    - specify layers in the ```__init__``` 
    - define a forward pass by ```forward(self,x)``` method
    
This is analgous to how we created specific sub-classes of `AbstractUnivariate`, and got a lot for free through class inheritance, polymorphism, abstraction, etc.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable

class TwoLayerNet(nn.Module):
    
    def __init__(self, D_in, H, D_out): # this defines the parameters, and stores them
        super(TwoLayerNet, self).__init__() # overrides class inheritance
        self.layer1 = nn.Linear(D_in, H) # initializes weights
        self.layer2 = nn.Linear(H, D_out)
        
    def forward(self, x): # this defines the composition of functions
        out = F.relu(self.layer1(x)) 
        out = self.layer2(out)
        return out

In [None]:
# N is batch size; D_in is input dimension; H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs, and wrap them in Variables
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)

# Construct our model by instantiating the class defined above
model = TwoLayerNet(D_in, H, D_out) # we create our function f:x \to y

# Construct our loss function and an Optimizer. 
loss_fn = torch.nn.MSELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(1000):
  # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x) # evaluate the f(x)

    # Compute and print loss
    loss = loss_fn(y_pred, y) # evaluate the loss

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
print("Final Loss is {}".format(loss.data[0]))

## For more examples... 
- check out [Pytorch Docs](http://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html)

# Additional Resources

## Object Oriented Programming

* Beginner's guide to Object Oriented Programming in Python [here](https://stackabuse.com/object-oriented-programming-in-python/)

## Image Processing

In this class, we've worked a lot with tabular data.  Another important type of data to be able to work with is image data.

Some option are
* [scikit-image](https://scikit-image.org/)
* [scipy](http://www.scipy-lectures.org/advanced/image_processing/index.html)
* [Pillow](https://pillow.readthedocs.io)
* [OpenCV](https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_tutorials.html)

For many examples, see the [Scikit-image gallery](http://scikit-image.org/docs/stable/auto_examples/).  Other libraries also have examples.