# Overview

In this tutorial we will introduce both Google Colab and the Tensorflow 2 / Keras API, including demonstration of basic concepts related to statistical modeling and machine learning. An overview of topics covered in this tutorial include:

**Google Colab**

* Jupyter notebooks
* mounting Google drive
* environment setup

**Tensorflow 2 / Keras API**

* Tensorflow graphs
* creating models
* creating optimizers
* creating loss functions
* model fitting

This tutorial is part of the class **Introduction to Deep Learning for Medical Imaging** at University of California Irvine (CS190); more information can be found: https://github.com/peterchang77/dl_tutor/tree/master/cs190.

# Google Colab

The following lines of code will configure your Google Colab environment for this tutorial.

### Enable GPU runtime

Use the following instructions to switch the default Colab instance into a GPU-enabled runtime:

```
Runtime > Change runtime type > Hardware accelerator > GPU
```

### Jupyter

A Jupyter notebook is composed of blocks of `Markdown` documentation or code referenced as cells. Each cell can be individually selected by a simple click. As you progress through this notebook, simply select a code-containing cell and click the `Run` button on the top toolbar (or alternatively `shift` + `[Enter]`) to execute that particular line or block of code. The `In [ ]` header to the left of each cell will change status to `In [*]` while a line or block of code is executing and then to a number indicating a line or block of executed code if successful.

# Tensorflow and Keras

Tensorflow is a free and open-source software library developed by the Google Brain team for dataflow and differentiable programming
across a range of tasks. It is a symbolic math library, and is most popularly used for machine learning applications such as neural networks. In November 2019, the first stable release of the verson 2.0 library was made available, with significant changes including:

* formal integration of the high-level Keras API for easy model building
* `eager execution` of code, eliminating the need to manually compile man abstract syntax tree using a `session.run()` call
* improved support for model deployment in production on any platform
* improved support for distributed machine learning paradigms

More information highlighting the key improvements can be found here: https://www.tensorflow.org/guide/effective_tf2

## Import

In this tutorial we will use the following Numpy and Tensorflow library components:

In [None]:
import os, numpy as np
from tensorflow import losses, optimizers
from tensorflow.keras import Input, Model, models, layers

## Machine learning 

Machine learning models are **mapping functions** that learn to predict target output(s) based on provided input(s). Inputs may consist of a number of **features** derived from raw data (traditional machine learning models) or simply the **raw data** itself (neural networks). 

A machine learning model is defined by its **parameters**, numeric variables that are applied via **operations** on the algorithm inputs to yield desired outputs. Note that this defintion of learning is broad; in fact all conventional statistical models are systems with *learnable* parameters.  

Consider a simple linear regression model:

```
y = m * x + b
```

As per above, this model meets all key specifications of a learnable system:

* input: `x`
* output: `y`
* parameters: `m` and `b`

Indeed, through repeated exposure to data samples, optimal values for `m` and `b` can be learned such that the target output `y` can be reliably predicted from any given input `x`.

## Tensorflow graphs

To implement a model with Tensorflow, one must reformulate an algorithm into a computational graph: a series of **operations** that define use of **parameters** to map provided input(s) to target output(s). Tensorflow / Keras is an object-oriented framework whereby the entire computational graph is created by passing `Tensor` objects into `layers` objects, yielding new (intermediate) `Tensor` objects which are fed into new `layers` objects, and so on. More explicitly, the overall process proceeds as follows:

1. Define the model input(s) ==> returns instantiated `Tensor` object(s)
2. Define first operation ==> returns instantiated `layers` object
3. Pass the input `Tensor` object into the operation `layers` object ==> returns new (intermediate) Tensor object
4. Define next operation ==> ...
5. Pass intermediate `Tensor` object (from step 3) into new operation `layers` object ==> ...

...and so on.

In the above simple linear regression model, the multiplication (`m`) and addition (`b`) operators are combined into a single operation known as a **linear transformation**. In the context of neural networks, this operation is also synonmous with the term **densely-connected layer**. 

See below for an example of defining a simple computational graph to replicate a linear regression model:

In [None]:
# --- Define an input
x = Input(shape=(1,))
print('x : type = {}'.format(type(x)))

# --- Define a linear transform operation
op = layers.Dense(1)
print('op: type = {}'.format(type(op)))

# --- Apply linear transform
y = op(x)
print('y : type = {}'.format(type(y)))

### Dense layers

In the context of neural networks, a **dense** layer refers to a simple matrix multiply operation (or dot product for vectors). Recall that matrices are defined by shape (`row`, `colummns`). Specifically:

```
Let:

X = 1 x N matrix = [a, b, c, d, e...]

M = N x 1 matrix = [[A],
                    [B],
                    [C],
                    [D],
                    [E],
                    ...
                    ]

Then:

X x M = ...

= [[a, b, c, d, e...]] x [[A], = a * A + b * B + c * C + d * D + e * E ...
                          [B],
                          [C],
                          [D],
                          [E],
                          ...
                          ]
```

In [None]:
# --- Example with NumPy
x = np.array([1, 2, 3, 4, 5]).reshape(1, 5)
m = np.array([1, 2, 3, 4, 5]).reshape(5, 1)

# --- Matrix multiply or "dense" operation
np.matmul(x, m)

Additionally, a **dense** operation may be defined by a **bias** term, which is simply a constant that is added to the matrix multipy operation. In other words:

```
Let:

X = 1 x N matrix (per above)
M = N x 1 matrix (per above)
b = constant

Then: 

A "dense" operation = X x M + b
```

In [None]:
# --- Example with NumPy
x = np.array([1, 2, 3, 4, 5]).reshape(1, 5)
m = np.array([1, 2, 3, 4, 5]).reshape(5, 1)
b = 5

# --- Matrix multiply or "dense" operation with bias
np.matmul(x, m) + b

Based on this discussion, it is evident that a simple linear regression model can be in fact be defined using a **dense** layer with a single-element `m`, `x` and `y` terms.

### Inputs

Let us first redefine the simple linear regression model:

In [None]:
# --- Define lineaer regression model
x = Input(shape=(1,))
op = layers.Dense(1)
y = op(x)

What are the shapes of the `x` and `y` Tensors?

In [None]:
# --- Print tensor shapes
print(x.shape)
print(y.shape)

What is the meaning of the `None` shape? This generic value indicates to Tensorflow that the first dimension of the `x` and `y` tensors can be **anything** (>= 1). In other words, if I have 10 observations then the shape of the `x` matrix will be `(10, 1)`, which when multiplied by a `(1, 1)` `m` matrix (+ the `b` bias term) will yield a `(10, 1)` output `y` matrix. 

### Parameters

In the above model it is clear where `x` and `y` are defined, but where are the parameters `m` and `b`? The answer is that in Tensorflow, only input(s), output(s), intermediate `Tensors` are explicitly defined. By contrast parameters are maintained (and updated) implicitly by the Tensorflow library upon definition of operation `layers`. In other words, `m` and `b` in our model are created automatically by Tensorflow as soon as the operation (`op`) object is recieves its input `Tensor` object. Once instantiated both `m` and `b` are stored in the `op` object, and may be retrieved with the `get_weights(...`) method:

In [None]:
# --- See parameters
m, b = op.get_weights()
print(m)
print(b)

By default in Tensorflow, multiplication parameters are set to random values (chosen via the `Glorot` intialization scheme) whereas addition parameters are set to zero. 

## Creating Models

Once the input(s), output(s) and all required operations have been defined, a Tensorflow `Model()` object can be created:

In [None]:
# --- Create model by passing input(s) and output(s)
model = Model(inputs=x, outputs=y)

To pass an arbitrary value (in the form of a NumPy array) into the model, use the `model.predict(...)` function:

In [None]:
# --- Pass an input into the model
model.predict(np.array([1]))

How do we replicate this operation using the `m` and `b` NumPy arrays retrieved from above?

In [None]:
# --- Equivalent NumPy operation
?

For complex models, it may be useful to visualize a summary of all intermediate operations:

In [None]:
# --- Print summary of model architecture
model.summary()

## Compiling a Model

The current model parameters have been initialized to random values. Through exposure to data, the goal is for the model to *learn* optimal parameter values that allow for robust mapping of provided input to target output. To prepare the model for learning, a graph must be **compiled** through definition of, at minimum, the key following training components (each represented by Keras Python objects):

* loss function
* optimizer

### Defining a loss object

A loss function simply represents a formula that the machine can use to provide feedback regarding the quality of its current set of parameters. In other words, given a provided input `x` and a target output `y`, as well as model prediction `y'`, how does one quantify the *goodness* of the estimated output? Choosing a representative loss function is important as this feedback is used by the machine to improve its parameter values.

In machine learning, any loss formulation can be used to estimate goodness of fit as long as the function is **differentiable**. Many pre-built loss functions encapsulated by Python classes are availabe for use in the `tf.losses.*` module.

For a linear regression model, performance (e.g. fit) is most commonly evaluated by calculating the *squared distance* between the target output `y` and the model prediction `y'`. In other words, if a model predicts `5` when the target output is `2`, then the error is `(5 - 2) ** 2` or `9`. Thus, the parameters `m` and `b` that yield the **least squared error** for all data observations is defined to be optimal. 

In [None]:
# --- Define a MSE loss
loss = losses.MeanSquaredError()

Let us test the loss object with some NumPy arrays:

In [None]:
# --- Example MSE losses
loss(y_true=np.array([1]), y_pred=np.array([5]))

### Defining an optimizer object

An optimizer is a method used by the machine to improve its parameters. By definition, the parameters are updated such that the loss value (calculated by the loss function) decreases. A number of optimization methods have been described and are available through the `tf.optimizers.*` module. Currently, one of the most effective optimizers is the Adam technique which will be used in this tutorial (a good default choice for most tasks). 

In addition to optimizer technique, a learning rate specifying the *degree of change* per update step is required. For the purposes of this tutorial, we will use a default learning rate of `1e-3`.

In [None]:
# --- Define an Adam optimizer
optimizer = optimizers.Adam(learning_rate=1e-3)

### Compiling

Once the model `optimizer` and `loss` objects have been defined, simply pass these objects into the `model.compile(...)` method to prepare for training:

In [None]:
# --- Compile model
model.compile(
    optimizer=optimizer,
    loss=loss)

The model is now compiled and ready for training!

# Data

In this tutorial, training data pairs (x, y) will be "simulated". Specifically (x, y) pairs will be drawn using the following equation:

```
y = m * x + b + random
```

Here `m` and `b` represent some ground-truth value for slope and intercept, while `random` represents some random number between `[0, 1]` to simulate noise in the data. For the purposes of demonstration, the values `m = 2` and `b = -1` will be used. 

The following lambda function can be used to create (x, y) training data pairs: 

In [None]:
# --- Define lambda function for random values [-0.5, +0.5]
lo = -0.5
hi = +0.5
rand = lambda shape : np.random.rand(*shape) * (hi - lo) + lo

# --- Define lambda function for linear transform
m = 2
b = -1
f = lambda x : m * x + b + rand(x.shape)

# --- Generate an example (x, y) pair
xs = np.random.rand(1)
ys = f(xs)
print(xs, ys)

The following lines of code can be used to visualize 100 random samples:

In [None]:
# --- Generate N number examples of data
xs = np.random.rand(100)
ys = f(xs)

# --- Visualize
import pylab
pylab.scatter(xs, ys)

### Python generators

There are a number of different ways to feed training data into a Tensorflow model. Recall above that a placeholder, the `Input()` object, was defined simply to identify the entrypoint(s) for data into the model. For algorithm training, real data matching the specifications of `Input()` above must be provided. The method of data input used throughout this class will be via **Python generators**.

The Python generator pattern is a unique type of *function* that returns a **lazy generator**, a special iterable that does not store its contents in memory. Instead the contents *yielded* by the generator are created dynamically upon each loop. This is a very efficient way to iterate through datasets, especially as the total size of a dataset becomes large.

The following lines of code wrap up the lambda function above in a Python generator:

In [None]:
def Generator(batch_size=128):
    """
    Method to define a Python generator for training data
    
    """
    # --- Define lambda function for random values [-0.5, +0.5]
    lo = -0.5
    hi = +0.5
    rand = lambda shape : np.random.rand(*shape) * (hi - lo) + lo

    # --- Define lambda function for linear transform
    m = 2
    b = -1
    f = lambda x : m * x + b + rand(x.shape)
    
    while True:
        
        xs = np.random.rand(batch_size, 1) 
        ys = f(xs)
        
        yield xs, ys

Note that the Generator function recieves a single argument named `batch_size`; this variable represents the total number of training examples that will be generated during each loop. In addition, notice that the shape of both `x` and `y` are in fact `(batch_size, 1)`; this is defined to conform the `Input(...)` variable defined above:

In [None]:
# --- Recall model input shape
?

The following lines of code demonstrated basic Generator functionality:

In [None]:
# --- Instantiating the generator object
gen_train = Generator()

In [None]:
# --- Iterating using for loop
for xs, ys in gen_train:
    print(xs.shape, ys.shape)
    break

In [None]:
# --- Iterating using next(...)
xs, ys = next(gen_train)

In [None]:
# --- Visualizing
pylab.scatter(xs, ys)

# Model Training

Once the model has been compiled and the data prepared (via a generator), training can be invoked using the `model.fit(...)` method. The only parameters that need to be set relate to the number of total training iterations (total number of steps = `steps_per_epoch` x `epochs`). 

In [None]:
model.fit(
    x=gen_train, 
    steps_per_epoch=500, 
    epochs=10)

After the model has converged, what do we expect that the algorithm learned value for `m` should be? How do we check this?

## Saving and Loading a Model

After a model has been successfully trained, it can be saved and/or loaded by simply using the `model.save()` and `models.load_model()` methods. 

In [None]:
# --- Serialize a model
fname = './model.hdf5'
model.save(fname)

In [None]:
# --- Load a serialized model
del model
model = models.load_model(fname, compile=False)

# Exercises

The following exercise questions will assist in preparation for this lesson's assignment, to extend our single variable linear model to a multivariate linear regression:

```
y = m0 * x0 + m1 * x1 + m2 * x2 ... + b
```

### Exercise 1

What modifications to the model definition are needed to increase the number of inputs from 1 variable to 5 (e.g. 5-element vector), while keeping the prediction a single output (e.g. `y`)? Does the `shape` in input need to change? Does the argument in `Dense(...)` need to change? Use the following cell to experiment:

In [None]:
# --- Define a new 5-element input model
x = Input(shape=(?,))
op = layers.Dense(?)
y = op(x)

#### Hints

Use `x.shape` and `y.shape` to check the input and output shapes. Keep in mind that`x.shape` should be `(None, 5`) for a 5-element input and `y.shape` should be `(None, 1)` for a single-element output.

What shape should the weights in `op.get_weights(...)` represent?

In [None]:
# --- Get weights and check shape
w = op.?
print(?)

### Exercise 2

What modifications to the lambda function are needed to extend the number of inputs from 1 variable to 5? Keep in mind that the following must be true:

* the input `x` is of shape `(None, 5)`
* the output `y` is of shape `(None, 1)` 

Use the following cell to experiment:

In [None]:
# --- Define new 5-element lambda function
m = ?
b = ?
f = lambda x : ?

#### Hints

Start with an arbitrary `(None, 5)` input `x`, e.g. an array with shape `(2, 5)`. Given that `m` must also have 5 elements, what shape must `m` be in order to properly perform an `np.matmul(...)` operation? Does the order of multiplication matter e.g. `np.matmul(x, m)` vs. `np.matmul(m, x)`?

In [None]:
# --- Define arbitrary x and m
x = np.random.rand(2, 5)
m = np.array([0, 1, 2, 3, 4]).reshape(?)

# --- Multiply together
np.matmul(?, ?)

Now add a constant to your matrix multiply:

In [None]:
# --- Multiply together + bias
b = -1
np.matmul(?, ?) + ?

Now add random offsets to each of your output `y` values. What dimension of random numbers do you need to accomplish this? Note that if you add a **constant** (like the `b` bias term) you will simply be adding the same random value to **all** your `y` values, thus simply shifting your linear curve in the y-direction: 

In [None]:
# --- Define random number generator
lo = -0.5
hi = +0.5
rand = lambda shape : np.random.rand(*shape) * (hi - lo) + lo
    
# --- Multiply together + bias + random
np.matmul(?, ?) + ? + rand((?, ?))