<a href="https://colab.research.google.com/github/junjy007/UTS_ML2019_Main/blob/master/NB03_NeuralNets.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1 Basic Neural Networks

## 1.1 Prepare Environment

### 1.1.1 Library Import

In [0]:
import numpy as np
import math
import torch
import plotly.express as px
import pandas as pd
import random

### 1.1.2 Visualisation Functions

#### A Simple visualiser

In [0]:
def simple_visualise(mod):
    xx, yy = np.meshgrid(np.arange(-5, 5.01, 0.05), 
                         np.arange(-5, 5.01, 0.05))
    xx = xx.flatten()
    yy = yy.flatten()
    d = pd.DataFrame(data=dict(x=xx, y=yy, 
                               z=[mod.forward((x, y)) 
                                  for x, y in zip(xx, yy)]))
    fig = px.scatter(d, x='x', y='y', color='z')
    fig.show()

## 1.2 Neural Networks Model Definition

### 1.2.1 Definition and Naive Implementation

Let us literally translate the definition of a neural network into computer implementation:
Neural network: Multiple _layers_ of _perceptron(s)_.
```python
def compute_neural_network(x):
    # 0. prepare the input for the first layer
    layer_input = x
    for layer_idx in [0, 1, 2]:
        # 1. fill output of this layer by executing each
        #    perceptron in this layer
        layer_output = []
        for perceptron in net_layers[layer_idx]:
            perceptron.compute_output(layer_input) 
            # Note all perceptrons in this layer share the same
            # `layer_input`
        #!!----------------------------------    
        # 2. pass the output of THIS layer
        #    to the NEXT layer as the input
        #------------------------------------
        layer_input = layer_output
        # END OF LOOP OVER `layer_idx`
```
Recall that a perception is to get the weighted sum of all attributes in an input, followed by some _activation_. See below:
```python
def compute_perceptron(x):
    weighted_sum = sum([xi * wi for xi, wi in zip(x, weights)])
    return activation_function(weighted_sum)
```
Of course, we will need to get the weights and the activation function setup. So we will use an object class to represent both the perceptrons and the networks.

NB: If you don't understand the construction `[t for t in list_of_t_values]`, please checkout tutorials about Python list.

In [0]:
def sigmoid_func(a):
    return 1 / (1 + math.exp(-a))



class Perceptron2D:
    """Perceptron model: linearly combine data attributes followed by a non-linear activation
    This is a simplified implementation and deals with data with 2 attributes. 
    You can also refer to the more complete implementation in the note of Week 3.
    """
    def __init__(self, w0=1, w1=0, activation_func=sigmoid_func):
        self.w0 = w0
        self.w1 = w1
        self.act = activation_func
    
    def forward(self, x):
        wsum = x[0] * self.w0 + x[1] * self.w1
        sigmoid_wsum = self.act(wsum) # sigmoid for activation
        return sigmoid_wsum
        

In [0]:
# Let's have a look at how the perceptron worked on 2D data
p = Perceptron2D(0.5, -2.5)
simple_visualise(p)
# Please note the effect of activation by examining the z-value.

__EXERCISE__

In the code cell above, adjust the model parameters and observe the change of the model behaviour.

__EXERCISE__

Use a different activation function. Such as
$$
\begin{align}
y(h) = \left\{ \begin{array}{c}
0, \textrm{ if } h \leq 0 \\
h, \textrm{ if } h > 0
\end{array} \right.
\end{align}
$$

Add implement your activation like:
```python
def relu_func(h):
    # compute y
    # HINT: consider using `max`
    return y
```
Then use your function to construct a Perceptron check the behaviour of the perceptron.

In [0]:
def my_activation_func(h):
    return max(0, h)
p = Perceptron2D(0.1, -0.5, activation_func=my_activation_func)
simple_visualise(p)

In [0]:
def k2(x):
    return [xi**2 for xi in x]

class Perceptron2DX:
    """Perceptron model wrapped. We transform the input before
    processing them using the perceptron.
    """
    def __init__(self, percep, xtransform=k2):
        self.perceptron = percep
        self.xtransform = xtransform
    
    def forward(self, x):
        return self.perceptron.forward(self.xtransform(x))

In [0]:
pp = Perceptron2DX(p)
simple_visualise(pp)

__EXERCISE__

__1__:
Use a different transform function. Such as
$$
\begin{align}
x'_1 = \sin(\omega_1 \cdot x_1) \\
x'_2 = \cos(\omega_2 \cdot x_2)
\end{align}
$$

__2__:
Using your transform function on a _different perceptron core_, e.g.
```python
vanilla_perceptron_2 = Perceptron2D(
        -1, 1, 
        activation_func=my_activation_func)
pp2a = Perceptron2DX(vanilla_perceptron_2, 
                     xtransform=new_transform)
```


In [0]:
def new_transform(x):
    tx0 = math.sin(x[0] * 5)
    tx1 = math.cos(x[1])
    return [tx0, tx1]

pp2 = Perceptron2DX(p, xtransform=new_transform)
simple_visualise(pp2)

In [0]:
ly0_p0 = Perceptron2D(+1, -1, activation_func=my_activation_func)
ly0_p1 = Perceptron2D(-1, -3, activation_func=my_activation_func)
ly1_p = Perceptron2D(-1, 0.5, activation_func=math.tanh)
def new_transform(x):
    tx0 = ly0_p0.forward(x)
    tx1 = ly0_p1.forward(x)
    return [tx0, tx1]

pp3 = Perceptron2DX(ly1_p, xtransform=new_transform)
simple_visualise(pp3)

In [0]:
# Try yourself: can you change the parameters so 
ly0_p0 = Perceptron2D(+1, -1, activation_func=lambda x:x)
ly0_p1 = Perceptron2D(-1, -3, activation_func=lambda x:x)
ly1_p = Perceptron2D(-1, 0.5, activation_func=math.tanh)
def new_transform(x):
    tx0 = ly0_p0.forward(x)
    tx1 = ly0_p1.forward(x)
    return [tx0, tx1]

pp3 = Perceptron2DX(ly1_p, xtransform=new_transform)
simple_visualise(pp3)

### 1.2.2 Multiple Layer Perceptron

Alternatively (to the nested perceptrons above), we can define an NeuralNet class to hold all perceptrons in all layers. The advantage is that now we can easily extend the network to have more layers. 

#### Naive Implementation

In [0]:
# Define a class, so our network can manage its "perceptrons" easily
class NeuralNet:
    """NeuralNet represents a simple neural network object class.
    As an example, it consists of 2 layers of perceptrons. 
    The first layer has 2 perceptrons and the second one has 1.
    
    The perceptrons deal with data of 2 attributes.
    """
    def __init__(self, perc0_w=(-1, 1), perc1_w=(2, -1), perc2_w=(0.5, 0.5)):
        self.layers = [[Perceptron2D(*perc0_w), Perceptron2D(*perc1_w)], 
                       [Perceptron2D(*perc2_w)]] # *(w0,w1) expand the values in tuple
        
    def forward(self, x):
        """
        Compute the network output layer by layer. "forward" is a conventional
        term for execute computing of a net.
        """
        # use layer-0 to process x and get what's to feed to layer-1
        layer1_input = [p.forward(x) for p in self.layers[0]]
        # get the final output from layer-1
        final_output = [p.forward(layer1_input) for p in self.layers[1]]
        # note we have only two layers, so I didn't use a loop over the layers
        return final_output[0]
        

In [0]:
net0 = NeuralNet()
simple_visualise(net0)

__EXERCISE__ (optional, similar to one above)

Adjust parameters to show how the network behaviour changes.

#### Computation using Matrix Operations

In [0]:
def matmul(A, B):
    """
    matmul computs A x B for two matrices
    :param A: a collective object contains rows of a matrix
        A[i], i-th row, another collective object contains the elements
        A[i][j], the element
    :param B: similar to A
    """
    # figure out the size of A and B and the result
    rows = len(A)
    elements_inner = len(A[0])
    assert elements_inner == len(B), "Rows of B must be the same as Cols of A"
    cols = len(B[0])
    # Initialise C to the appropriate size
    C = [[0.0 for ci in range(cols)] for ri in range(rows)]
    
    # Fill C: 2 outer loops are for each element
    for r in range(rows):
        for c in range(cols):
            # Compute element [i][j]
            for k in range(elements_inner):
                C[r][c] += A[r][k] * B[k][c]
    return C

HINT: read the code and try it while watching the accompany video.

In [0]:
# Define a class, so our network can manage its "perceptrons" easily
class NeuralNetV2:
    """NeuralNetV2 represents a simple neural network object class.
    This object will manage all the neurons in the network. 
    """
    def __init__(self, neuron_numbers_in_layers=[2, 2, 1],
                 weights=None):
        """
        :param neuron_numbers_in_layers: first/last -- inputs and outputs
        :param weights: a dict, weights["in0out1"] represents the weights
          for computing layer1 from layer0, if layer0 has 3 inputs and layer1
          has 2 outputs, then the weight will be a matrix of 3 x 2, i.e.
          weights["in0out1"][i][j] is the weight for computing element-j in layer1
          by using element-i in layer0.
        """
        # Weights between 
        # Layer1 and 2, Layer 2 and 3, ...
        
        self.weights = dict()
        self.activ_fn = dict()
        self.neuron_nums = neuron_numbers_in_layers
        self.layer_num = len(neuron_numbers_in_layers) - 1
        
        # DEFINE (!not DO!, we dont have input X now) computation between layers
        for l_in in range(self.layer_num):
            l_out = l_in + 1
            pkey = f"in{l_in}out{l_out}"
            try:
                # try to use provided weight
                W = weights[pkey] # NOTE: should copy
            except:
                # if not provided ...
                n_in = neuron_numbers_in_layers[l_in]
                n_out = neuron_numbers_in_layers[l_out]
                W = [[random.gauss(0, 0.1) for j in range(n_out)] 
                     for i in range(n_in)] # see init above
            self.weights[pkey] = W
            self.activ_fn[pkey] = math.tanh # you may want to try your own

        
    def forward(self, x):
        """
        DO computations:
        
        Compute the network output layer by layer. "forward" is a conventional
        term for execute computing of a net.
        :param x: a list of list: x[i], sample-i, having a number of input attributes
          if you have only one sample, input it as 
              [[0, 1, 0]], 
              NOT [0, 1, 0]
        """
        layer_input = x
        for l_in in range(self.layer_num):
            # Use layer-in to process x and get what's to feed to layer-out
            # Setup 
            l_out = l_in + 1
            pkey = f"in{l_in}out{l_out}"
            # Compute the weighted sum 
            layer_pre_activation = matmul(layer_input, self.weights[pkey])
            # Perform activation(the construction below is equivalent to nested loops)
            layer_out = list(map(lambda x_:list(map(self.activ_fn[pkey], x_)), 
                            layer_pre_activation))
            layer_input = layer_out # feed the output of this layer to the next layer
        return layer_out
    
    
class VisWrap:
    "Wrap I/O for the new network object for visualisation"
    def __init__(self, nn):
        self.nn = nn
        
    def forward(self, x):
        return self.nn.forward([x])[0][0]

In [0]:
nn2 = NeuralNetV2()
nn2_wrap = VisWrap(nn2)
simple_visualise(nn2_wrap)

In [0]:
# Let's try to adjust the weights. Carefully keep the number of weights 
# consistent with the number of neurons you had set to the layers.
nn2 = NeuralNetV2(
    neuron_numbers_in_layers=[2, 3, 1],
    weights={"in0out1":[[0, 0.5, 1], [-1, -5, 2]],
             "in1out2":[[-1], [+1], [0.5]]})
nn2.activ_fn["in1out2"] = lambda x:max(0, x)
nn2_wrap = VisWrap(nn2)
simple_visualise(nn2_wrap)

## 1.3 Training Neural Nets via Backprop

It is not trivial to come up with a simple rule to adjust all the parameters in the entire neural network stucture (recall when we consider a single perceptron, we did propose intuitive scheme to improve the fitness of the model to data). 

The idea is to take a divide-and-conquer scheme. Let's take a careful look at the computation in one perceptron. Throw the machine learning terminology in wind and focus on the computation steps only.
$$
\begin{align}
a &\leftarrow w_0 \cdot x_0 + w_1 \cdot x_1 \\
h &\leftarrow g(a) \\
Loss &\leftarrow Compare(h, y)
\end{align}
$$

Let us think about the statement "to make the loss smaller" with a bit care: which specific means we could possibily take to "make" the final loss change? During training, we change the model parameters, including $w_{i,j}$. So we need to know the influence on the final loss of each model parameter. In this section we will examine an example of so-called "backpropagation" process.

### 1.3.1 Adjust Parameters to Modify Model Behaviour

Given training data $\{(x_1, y_1), (x_2, y_2), \dots\}$, we would like the net to predict for each $x_i$ the target value $y_i$. If this is the case, then our mission has completed. Of course, this is generally NOT the case if we start from a random set of model parameters.

For example, if we have one training sample $(x=(2.5, 2), y=1.0)$, let us compare the prediction given by the network above and the target value $y=1.0$: 

In [0]:
# First, check the current output of the net
def sigm(x):
    return 1/(1+math.exp(-x))

nn2 = NeuralNetV2(
    neuron_numbers_in_layers=[2, 3, 1],
    weights={"in0out1":[[0, 0.5, 1], [-1, -5, 2]],
             "in1out2":[[-1], [+1], [0.5]]})
nn2.activ_fn["in0out1"] = sigm
nn2.activ_fn["in1out2"] = sigm

nn2_wrap = VisWrap(nn2)
simple_visualise(nn2_wrap)

Let's check the nets behaviour at one data:

In [0]:
nn2.forward([(2.5, 2)])

This is smaller than the desired output 1.0. So we want the output to increase at this $x$. Let's learn how to adjust the network using gradients computed through backprop.

### 1.3.2 Manual Backprop Through Layers

Let us implement the backpropagation for a three layer simple net. 

#### Helper Functions

In [0]:
##############################################################
# HELPER FUNCTIONS
# You do NOT need to learn those to USE modern neural networks
# Those functions provide basic array functions in LOW efficiency
# but clear manner. You may want to check them if you want to
# UNDERSTAND the technical details of NN.
# --------
# First define sigmoid gradient
import math
def sigm(x): # redefine here for reference
    return 1/(1+math.exp(-x))

def gsigmoid(x):
    return math.exp(-x)/(1+math.exp(-x))**2

def elementwise_apply(f, nested_list):
    return list(map(lambda x_:list(map(f, x_)), nested_list))

def elementwise_times(nested_list1, nested_list2):
    return [[a * b for a, b in zip(r1, r2)] 
            for r1, r2 in zip(nested_list1, nested_list2)]

def mat_tr(nested_list):
    return [c for c in zip(*nested_list)]

def shape(nested_list):
    return len(nested_list), len(nested_list[0])
##############################################################

In [0]:
# UNIT Test: gsigmoid -- Test the others.
test_x = [-3, -1, 0, 1, 3, 5]
test_eps = 1e-4
for x_ in test_x:
    numerical_diff = (sigm(x_ + test_eps) - sigm(x_)) / test_eps
    analytic_diff = gsigmoid(x_)
    print(f"NumDiff: {numerical_diff:.3f} ~ AnaDiff: {analytic_diff:.3f}")

#### Backprop

Below I explicily write out the forward method followed by a backward computation.

In [0]:
def special_forward(nn, x):
    """
    This is a special version for manual implementing and testing the backpropagation algorithm. 
    We only use the network 
    We don't use the computation and activation of network `nn` 
    """
    
    # Copy-and-paste and simplify forward computation here:
    layer1_input = x
    layer1_pre_activation = matmul(layer1_input, nn.weights["in0out1"])
    layer1_out = elementwise_apply(sigm, layer1_pre_activation)
    
    layer2_input = layer1_out # feed the output of this layer to the next layer
    
    layer2_pre_activation = matmul(layer2_input, nn.weights["in1out2"])
    layer2_out = elementwise_apply(sigm, layer2_pre_activation)

    layer2_pre_activation_g = elementwise_apply(gsigmoid, layer2_pre_activation)
    w12_g = matmul(mat_tr(layer2_input), layer2_pre_activation_g)
    layer1_out_g = matmul(layer2_pre_activation_g, mat_tr(nn.weights["in1out2"]))
    layer1_pre_activation_g = elementwise_times(
        elementwise_apply(gsigmoid, layer1_pre_activation),
        layer1_out_g)
    w01_g = matmul(mat_tr(layer1_input), layer1_pre_activation_g)

    return layer2_out, w01_g, w12_g
    
# TODO: Mark this somewhere else: this Les Mis is FUNNY! https://www.youtube.com/watch?v=dF495ERjRUo

In [0]:
out, w01_g, w12_g = special_forward(nn2, [[2.5, 2]])

#### Numerical verification

Next, let us check element by element how does our backprop work. The plan is to adjust each adjustable parameter a bit and check the change of the final output.

In [0]:
# test adjusting weights in0out1
wkey = "in0out1"
w_rows, w_cols = shape(nn2.weights[wkey])
numerical_g = [[0 for c in range(w_cols)] for r in range(w_rows)]
test_eps = 1e-4
test_x = [[2.5, 2]]
old_out = nn2.forward(test_x)

for r in range(w_rows):
    for c in range(w_cols):
        old_value = nn2.weights[wkey][r][c] # save the old value to put back after test        
        nn2.weights[wkey][r][c] += test_eps
        new_out = nn2.forward(test_x)
        diff = new_out[0][0] - old_out[0][0] # check the effect of adjusting the corresponding parameter
        numerical_g[r][c] = diff / test_eps
        # put the old value back
        nn2.weights[wkey][r][c] = old_value
        

In [0]:
print("numerical differential")
print(numerical_g)
print("analytical differential")
print(w01_g)

__EXECISE__ [Optional]

1. Read the code and Explain what you had observed.
2. Check the computation for weights transforms data from layer 1 to 2.
3. Integrate the backpropagation function into the network class.



### 1.3.3 Using Computational Framework

Modern framework allows us to easily perform all the steps above. The example above can be reformulated as

In [0]:
import torch
import torch.nn as nn

In [0]:
class MyNN(nn.Module):
    def __init__(self, neuron_numbers_in_layers=[2, 3, 1]):
        super(MyNN, self).__init__()
        
        self.layers = nn.ModuleList(
            [nn.Linear(in_features=nin, out_features=nout)
             for nin, nout in zip(neuron_numbers_in_layers[:-1], 
                                  neuron_numbers_in_layers[1:])])
        
    def forward(self, x):
        h = x
        for l in self.layers:
            h = torch.tanh(l(h))
        return h
    
class TorchVisWrap:
    def __init__(self, nn):
        self.nn = nn
    def forward(self, x):
        y = self.nn(torch.Tensor(x).unsqueeze(dim=0))
        return y.item()
        

In [0]:
torch.manual_seed(42)
nn3 = MyNN([2, 6, 1])
nn3_wrap = TorchVisWrap(nn3)
simple_visualise(nn3_wrap)

Let us perform training, call it changing a neural network behaviour or searching in the hypotheses space of neural networks, it is up to your viewpoint. Eg. We want the net to generate
    + for (4, -4)
    - for (4, 4)
    + for (-4, 4)
    - for (-4, -4)


In [0]:
from torch.optim import Adam
optim = Adam(nn3.parameters(), lr=0.01) # manager: adjust params according to grads

In [0]:
train_steps = 50
visualise_every_n_steps = 10
trn_X = torch.Tensor([[4, -4], [4, 4], [-4, 4], [-4, -4]])
trn_y = torch.Tensor([[1.0], [-1.0], [1.0], [-1.0]])
for it in range(train_steps):
    loss = ((trn_y - nn3(trn_X))**2).sum()
    optim.zero_grad() # reset gradients (to clear computed gradients from previous steps)
    loss.backward() # In one stroke, all gradients are computed!
    optim.step()  # apply the gradients to the parameters
    if it % visualise_every_n_steps == 0:
        # Check the effect
        simple_visualise(nn3_wrap)

# 2 Deep NN

## 2.0 Prepare Environment

### Libraries and Helpers

In [0]:
import torch
import torch.nn as nn
import torch.nn.functional as fn
from torchvision.datasets import MNIST, CIFAR10
from torch.utils.data.dataloader import DataLoader
from torch.utils.data.dataset import Subset
from torchvision.transforms import Compose, ToTensor, Normalize, ToPILImage
from torch.optim import Adam, SGD
mnist_dataset_trn = MNIST(root='./data', download=True, train=True,
                          transform=ToTensor())
mnist_dataset_tst = MNIST(root='./data', download=True, train=False,
                          transform=ToTensor())
mnist_train_loader = DataLoader(mnist_dataset_trn, batch_size=4)
mnist_test_loader = DataLoader(mnist_dataset_tst, batch_size=4, shuffle=True)
mnist_examples = mnist_dataset_trn[0]
device = torch.device("cuda:0") if torch.cuda.is_available \
    else torch.device("cpu")

def evaluate_model(model, tst_dataloader, max_iter=None):
    total_corr = 0
    total_num = 0
    model.eval()
    for i, (x, y) in enumerate(tst_dataloader):
        with torch.no_grad():
            pred_class_p = model(x.to(device))
        corr_num = (torch.argmax(pred_class_p, dim=1).cpu() == y).sum()
        total_corr += corr_num.item()
        total_num += len(y)
        if max_iter is not None:
            if i >= max_iter:
                break
    accu = total_corr / total_num
    return accu

def simple_logger_factory(tst_dataloader=None):
    def logger(model, info):
        print(f"Epoch {info['epoch']}, Iter {info['iter']}, " +
            f"Train Loss {info['loss']}")
    def logger_eval(model, info):
        test_accu = evaluate_model(model, tst_dataloader)
        print(f"Epoch {info['epoch']}, Iter {info['iter']}, " +
            f"Train Loss {info['loss']} Test Accuracy {test_accu}")
    return logger if tst_dataloader is None else logger_eval

simple_mnist_logger = simple_logger_factory(mnist_test_loader)
    
def train_model(model, trn_dataloader, start_iter=0,
                max_iters=None, max_epoches=1, 
                trainer_class=Adam,
                evaluate_every_n_iters=1000,
                save_check_point_every_n_iters=10000,
                evaluate_callback_fn=simple_logger_factory(),
                **kwargs):
    optim = trainer_class(model.parameters(), **kwargs)
    model.train()
    epoches = 0
    iters = start_iter
    running_loss = 0
    while True:
        for i, (x, y) in enumerate(trn_dataloader):
            optim.zero_grad()
            pred_class_p = model(x.to(device))
            loss = fn.nll_loss(pred_class_p, y.to(device))
            loss.backward()
            optim.step()

            iters += 1
            running_loss += loss.item() 
            if iters % evaluate_every_n_iters == 0:
                evaluate_callback_fn(
                    model, dict(epoch=epoches, iter=iters, 
                                loss=running_loss / evaluate_every_n_iters))
                running_loss = 0
            if max_iters is not None and iters >= max_iters:
                break

        epoches += 1
        if epoches >= max_epoches:
            break
        
    return

def count_param_num(model):
    n = 0
    for p in model.parameters():
        n += p.numel()
    return n

## 2.1 Elements of Deep Neural Networks

In [0]:
# Example of classifying hand-written digits.
x, y = mnist_dataset_trn[42]
print(f"An example of digit {y}")
ToPILImage()(x)

### 2.1.1 Shallow Network

In [0]:
# A "shallow" network
class Net1(nn.Module):
    def __init__(self):
        super(Net1, self).__init__()
        self.lin1 = nn.Linear(in_features=28*28, out_features=64)
        self.lin2 = nn.Linear(in_features=64, out_features=10)
    
    def forward(self, x):
        n = x.shape[0]
        h = fn.relu(self.lin1(x.view(n, -1)))
        y = fn.log_softmax(self.lin2(h), dim=1)
        return y
nn1 = Net1().to(device)
train_model(nn1, mnist_train_loader, evaluate_callback_fn=simple_mnist_logger)

### 2.1.2 Go Deeper

In [0]:
# A naive deep network
class NetD(nn.Module):
    def __init__(self):
        super(NetD, self).__init__()
        self.lin1 = nn.Linear(in_features=28*28, out_features=64)
        self.lin2 = nn.Linear(in_features=64, out_features=64)
        self.lin3 = nn.Linear(in_features=64, out_features=64)
        self.lin4 = nn.Linear(in_features=64, out_features=64)
        self.lin5 = nn.Linear(in_features=64, out_features=10)
    
    def forward(self, x):
        n = x.shape[0]
        h = fn.relu(self.lin1(x.view(n, -1)))
        h = fn.relu(self.lin2(h))
        h = fn.relu(self.lin3(h))
        h = fn.relu(self.lin4(h))
        h = self.lin5(h)
        y = fn.log_softmax(h, dim=1)
        return y
dnn1 = NetD().to(device)
train_model(dnn1, mnist_train_loader, evaluate_callback_fn=simple_mnist_logger)

### 2.1.3 Essentials: Convolutional Net

In [0]:
# A naive deep network
class NetCD(nn.Module):
    def __init__(self):
        super(NetCD, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3)
        self.conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3)
        self.conv3 = nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3)
        self.lin4 = nn.Linear(in_features=32*3*3, out_features=64)
        self.lin5 = nn.Linear(in_features=64, out_features=10)
        
    
    def forward(self, x):
        n = x.shape[0]
        h = fn.max_pool2d(fn.relu(self.conv1(x)), kernel_size=2)
        h = fn.max_pool2d(fn.relu(self.conv2(h)), kernel_size=2)
        h = fn.relu(self.conv3(h)).view(n, -1)
        h = fn.relu(self.lin4(h))
        h = self.lin5(h)
        y = fn.log_softmax(h, dim=1)
        return y

dnn2 = NetCD().to(device)
train_model(dnn2, mnist_train_loader, evaluate_callback_fn=simple_mnist_logger)

In [0]:
count_param_num(dnn2), count_param_num(dnn1), count_param_num(nn1)

### 2.1.4 Essentials: Direct Link

In [0]:
# A naive deep network
class NetCDR(nn.Module):
    def __init__(self):
        super(NetCDR, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3)
        self.conv2 = nn.Conv2d(in_channels=16, out_channels=16, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(in_channels=16, out_channels=16, kernel_size=3, padding=1)
        self.lin4 = nn.Linear(in_features=576, out_features=64)
        self.lin5 = nn.Linear(in_features=64, out_features=10)
        
    
    def forward(self, x):
        n = x.shape[0]
        h = fn.max_pool2d(fn.relu(self.conv1(x)), kernel_size=2)
        h = fn.max_pool2d(fn.relu(self.conv2(h)) + h , kernel_size=2)
        h = (fn.relu(self.conv3(h)) + h).view(n, -1)
        h = fn.relu(self.lin4(h))
        h = self.lin5(h)
        y = fn.log_softmax(h, dim=1)
        return y

dnn3 = NetCDR().to(device)
train_model(dnn3, mnist_train_loader, evaluate_callback_fn=simple_mnist_logger)

### 2.1.5 Essentials: Optimiser

In [0]:
nn1a = Net1().to(device)
train_model(nn1a, mnist_train_loader, evaluate_every_n_iters=100, 
            max_iters=2000,
            trainer_class=Adam, lr=0.0001)

nn1b = Net1().to(device)
train_model(nn1b, mnist_train_loader, evaluate_every_n_iters=100, 
            max_iters=2000,
            trainer_class=SGD, lr=0.0001) 

### 2.1.6 Essentials: Mini-batches

In [0]:
nn1a = Net1().to(device)
mnist_train_loader128 = DataLoader(mnist_dataset_trn, batch_size=128)
train_model(nn1a, mnist_train_loader128, evaluate_every_n_iters=10, 
            max_iters=100, trainer_class=Adam, lr=0.0001)

nn1b = Net1().to(device)
mnist_train_loader4 = DataLoader(mnist_dataset_trn, batch_size=4)
train_model(nn1b, mnist_train_loader128, evaluate_every_n_iters=80, 
            max_iters=800, trainer_class=Adam, lr=0.0001)


### 2.1.7 Essentials: Batch Normalisation

In [0]:
# A naive deep network
class NetCDRB(nn.Module):
    def __init__(self):
        super(NetCDRB, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3)
        self.bn12 = nn.BatchNorm2d(16)
        self.conv2 = nn.Conv2d(in_channels=16, out_channels=16, kernel_size=3, padding=1)
        self.bn23 = nn.BatchNorm2d(16)
        self.conv3 = nn.Conv2d(in_channels=16, out_channels=16, kernel_size=3, padding=1)
        self.bn34 = nn.BatchNorm2d(16)
        self.lin4 = nn.Linear(in_features=576, out_features=64)
        self.lin5 = nn.Linear(in_features=64, out_features=10)
        
    
    def forward(self, x):
        n = x.shape[0]
        h = fn.max_pool2d(fn.relu(self.conv1(x)), kernel_size=2)
        h = self.bn12(h)
        h = fn.max_pool2d(fn.relu(self.conv2(h)) + h , kernel_size=2)
        h = self.bn23(h)
        h = self.bn34(fn.relu(self.conv3(h)) + h).view(n, -1)
        h = fn.relu(self.lin4(h))
        h = self.lin5(h)
        y = fn.log_softmax(h, dim=1)
        return y

dnn4 = NetCDRB().to(device)
train_model(dnn4, mnist_train_loader, evaluate_callback_fn=simple_mnist_logger)

### 2.1.7 Essentials: Regularisation by Dropout

In [0]:
# A naive deep network
class NetCDRD(nn.Module):
    def __init__(self):
        super(NetCDRD, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3)
        self.dropout1 = nn.Dropout2d(inplace=True)
        self.conv2 = nn.Conv2d(in_channels=16, out_channels=16, kernel_size=3, padding=1)
        self.dropout2 = nn.Dropout2d(inplace=True)
        self.conv3 = nn.Conv2d(in_channels=16, out_channels=16, kernel_size=3, padding=1)
        self.lin4 = nn.Linear(in_features=576, out_features=64)
        self.lin5 = nn.Linear(in_features=64, out_features=10)
        
    
    def forward(self, x):
        n = x.shape[0]
        h = fn.max_pool2d(fn.relu(self.conv1(x)), kernel_size=2)
        h = self.dropout1(h)
        h = fn.max_pool2d(fn.relu(self.conv2(h)) + h , kernel_size=2)
        h = self.dropout2(h)
        h = (fn.relu(self.conv3(h)) + h).view(n, -1)
        h = fn.relu(self.lin4(h))
        h = self.lin5(h)
        y = fn.log_softmax(h, dim=1)
        return y
dnn5_dp = NetCDRD().to(device)

small_train = Subset(mnist_dataset_trn, torch.arange(600))
train_loader_sm = DataLoader(small_train, batch_size=4)
train_model(dnn5_dp, train_loader_sm, evaluate_callback_fn=simple_mnist_logger,
            evaluate_every_n_iters=150, max_epoches=10)


In [0]:
dnn5_nodp = NetCDR().to(device)
train_model(dnn5_nodp, train_loader_sm, evaluate_callback_fn=simple_mnist_logger,
            evaluate_every_n_iters=150, max_epoches=10)

## 2.2 Automatic Differential Architecture Builder

### 2.2.1 Managed variable and operator

Modern architectures allow the user to  describe the computations in a data model (e.g. neural networks) naturally, and automatically configure necessary steps and the data structures for back propagation.  To be concrete, in the simple “processing step”:
```
Y = X1 + X2
```
The user has actually describe a relationship that “the value of a variable Y is the sum of the values of variables X1 and X2”.  Put it into details, 
> `Y`  is the result of an `operation of adding` two numbers

> The `operation` takes its input 1 from `X1`;  takes its input 2 from `X2`

Describing this seemingly simple process in details seems to be tautology. The advantage is that now it is possible to automatically construct the backpropagation chain to compute how the changes in the input X1 and X2 will affect the output Y. 

Let us represent the _variables_ and _operators_ we will use in _tractable_ computations. 

__Variables__: 
- _contain_ the following information:
    - `data`: the content value
    - `grad`: its effect on one target value, say, $\frac{\partial L}{\partial x}$
    - `from_op`: "ancester" of this piece of information, `None` or some operator. E.g. in $y = x_1 + x_2$, $x_1$ and $x_2$ have none ancester, $y$ is `from_op`erator "+".
    - `id_`: and unique global ID
- _perform_ the following functions:
    - applying operator to `self` (and `another` `variable` object), see `__add__` function below.
    - backpropating information of `grad` to the `from_op` (if it exists)

__Operators__:
- _contain_:
    - a
- _perform_ the following functions:
    - applyi
    
    
__EXERCISE__: Examine the operation of `Y = X1 + X2`

__REMARK__
You may think one target is too limited, but ...

In [0]:
class NeuData:
    d_count = 0 # A class-wise *global* counter
    
    def __init__(self, data=0, grad=0, from_op=None, name=None):
        # Essential information
        # 1. the "content" of this variable
        self.data = data 
        # 2. how this variable affects the "total objective"
        # of a system. For machine learning problems, the
        # system is determined by a global evaluation, often
        # called a "loss"
        self.grad = grad
        # 3. the operation from which this variable has been computed
        # if this variable is given (no computation is needed)
        # this is None
        self.from_op = from_op 
        
        # Comp. Graph maintainance (and diagnosis and visualisation)
        self.id_ = f"D{NeuData.d_count}"
        NeuData.d_count += 1 # a global serial number

        # Further information for diagnosis and visualisation
        self.name = self.id_ if name is None else name
        
        # If configured, output log information
        # Optionally, we can output some report at this point.
        # E.g.
        print(f"Variable {self.name} has been created" + 
              ("." if self.from_op is None else 
              f" from the result of {self.from_op.name}"))
        
    def __add__(self, another):
        """
        This function builds a "computational structure" performing
        ADDING between two Variables. 
        
        The method __add__ will be called when the following piece
        of code gets executed:
            `variable_1 + variable_2`
        then invocation would be
            `variable_1.__add__(another=variable_2)` # self=variable_1
            
        To have the entire "chain of information processing" trackable, 
        we take the computation and management in our own hands:
            1. create a *managed* operator to do the addition
            2. create a new variable to contain the result -- do NOT
              forget to tell the new variable where it comes from:
              the operation we just created above.
        """
        add_op = NeuOpAdd(self, another) # create an add-op, using self and 
            # another as its inputs.
        return NeuData(add_op.forward(), from_op=add_op) # create a new 
            # variable, maintain the computation structure by 
            # - storing the value 
            #   - returned by the newly created op, which is 
            #   - which is commanded to perform its duty --
            #   - adding input-1 and input-2; 
            # - establishing new var's dependency on the `from_op`
            
    def backward(self, g=1.0):
        """
        This function creates a ring in the chain of back propagation.
        When  informed that the change of my value will cause a change
        in the final objective `L` by a certain amount `g`, the variable
            1. accumulates `g` to `self.grad`, as one variable can have
              multiple ways of affecting L, e.g. 
              L = X1 + Y; 
              Y = X1 + X2
              Then a chenage X1 will affect L in two ways.
            2. propagates the information to 
        """
        # accumulate grad
        self.grad += g
        # report
        print(f"{self.name} update grad: {self.grad-g:.3g} "
              f"-> {self.grad:.3g}")
        # backprop grad
        if self.from_op is not None:
            self.from_op.backward(g)
            
            
class NeuOpAdd:
    op_count = 0 # global operator counter
    
    def __init__(self, oprand1, oprand2, name=None):
        """
        Init of a PLUS operator, allocate an ID, ensure a name and
        link the operator to the oprands.
        """
        self.id_ = f"Op+{NeuOpAdd.op_count}"
        NeuOpAdd.op_count += 1
        self.name = self.id_ if name is None else name
        # establish link to oprands
        self.oprand1 = oprand1
        self.oprand2 = oprand2
        # report its creation
        print(f"Operator {self.name} has been created, getting inputs"
              f" from {self.oprand1.name} and {self.oprand2.name}")
        
    def forward(self):
        """
        The computation. It is trivial for this operator.
        """
        resu = self.oprand1.data + self.oprand2.data
        # report
        print(f"{self.name} forwarding... \n"
              f"\t input-1 {self.oprand1.name}: value {self.oprand1.data:.3g}\n"
              f"\t input-2 {self.oprand2.name}: value {self.oprand2.data:.3g}\n"
              f"\t ouput: {resu:.3g}")
        return resu      
    
    def backward(self, g):
        """
        As a simple +, both inputs has equal effect on its output, i.e.
        if the output of THIS + OPERATOR affects the global L by a factor 
        of g, so does the input-1 as well as input-2
        """
        # backprop for op-1
        op1_g = g
        # report
        print(f"{self.name} transforming grad {g:.3g} to {op1_g:.3g}"
              f" and pass to {self.oprand1.name}")
        self.oprand1.backward(op1_g)
        
        # backprop for op-2
        op2_g = g
        print(f"{self.name} transforming grad {g:.3g} to {op2_g:.3g}"
              f" and pass to {self.oprand2.name}")
        self.oprand2.backward(op2_g)
        
        
              

__EXERCISE__

Run the following code blocks and explain the outputs.

In [0]:
x0 = NeuData(4, name="x0")
x1 = NeuData(3, name="x1")
L = x0 + x1
print("================\nComputation completed. \n"
      f"L name:{L.name}, id:{L.id_}, value:{L.data}")

In [0]:
L.backward()
del x0, x1, L # explicitly remove those variables lest confusion

In [0]:
x0 = NeuData(4, name="x0")
x1 = NeuData(3, name="x1")
x2 = NeuData(10, name="x2")
x3 = x0 + x1
L = x3 + (x2 + x3)
L.backward(1.0)
del x0, x1, x2, x3, L

### 2.2.2 Full Version of Managed Variable/Op/Func/NeuralNets

We provide a primary but full-fledged computation framework of constructing NN models that can perform automatic backprop PROGRAMMING.

In [0]:
########################################
# Ops
VERBOSE = True
class NeuOp(object):
    """
    Now we define a system of operators, this is the root class
    defining the basic bookkeeping behaviour of operators.
    """
    op_count = 0
    
    def __init__(self, name=None):
        super(NeuOp, self).__init__()
        self.id_ = f"Op{NeuOp.op_count}"
        NeuOp.op_count += 1
        if name is None:
            name = self.id_
        self.name = name
    
    def backward(self, g):
        raise NotImplementedError
        
    def forward(self):
        raise NotImplementedError
        
        
    def make_animation_forward_(self, resu):
        try:
            g_animator.make_animation_forward(self, resu)
        except Exception as e:
            pass
            
    def make_animation_backward_(self, msgs):
        try:
            g_animator.make_animation_backward(self, msgs)
        except Exception as e:
            pass
        
class UnaryNeuOp(NeuOp):
    """
    Operators who has only one oprand (input).
    """
    def __init__(self, oprand, name=None):
        super(UnaryNeuOp, self).__init__(name)
        self.oprand = oprand
        try:
            g_animator.make_animation_create(self)
        except:
            pass
        
    def backward(self, g):
        self.make_animation_backward_([
            dict(back_node_id=self.oprand.id_, msg=g)])
        self.oprand.backward(g)
                
class NeuOpNeg(UnaryNeuOp):
    """
    Negation, the "-" operator.
    """
    def __init__(self, oprand, name="-"):
        super(NeuOpNeg, self).__init__(oprand, name)
    
    def backward(self, g):
        bg = -g
        super(NeuOpNeg, self).backward(bg)

        
    def forward(self):
        resu = -self.oprand.data
        self.make_animation_forward_(resu)
        return resu
        
class BinaryNeuOp(NeuOp):
    """
    Binary Operators
    """
    def __init__(self, oprand1, oprand2, name=None):
        super(BinaryNeuOp, self).__init__(name)
        self.oprand1 = oprand1
        self.oprand2 = oprand2
        try:
            g_animator.make_animation_create(self)
        except:
            pass
    
class NeuOpAdd(BinaryNeuOp):
    """
    Our old friend "+"
    """
    def __init__(self, oprand1, oprand2, name="+"):
        super(NeuOpAdd, self).__init__(oprand1, oprand2, name)
    
    def backward(self, g):
        self.make_animation_backward_([
            dict(back_node_id=self.oprand1.id_, msg=g),
            dict(back_node_id=self.oprand2.id_, msg=g)])
        
        self.oprand1.backward(g)
        self.oprand2.backward(g)
        
    def forward(self):
        resu = self.oprand1.data + self.oprand2.data
        self.make_animation_forward_(resu)
        return resu
    
class NeuOpMul(BinaryNeuOp):
    """* operator, note the backprop"""
    def __init__(self, oprand1, oprand2, name="*"):
        super(NeuOpMul, self).__init__(oprand1, oprand2, name)
    
    def backward(self, g):
        # output = in1 * in2, so dL/d_output * in2 = dL/d_in1, similarly for d_in2
        self.make_animation_backward_([
            dict(back_node_id=self.oprand1.id_, msg=g * self.oprand2.data),
            dict(back_node_id=self.oprand2.id_, msg=g * self.oprand1.data)])
        
        self.oprand1.backward(g * self.oprand2.data)
        self.oprand2.backward(g * self.oprand1.data)
        
    def forward(self):
        resu = self.oprand1.data * self.oprand2.data
        self.make_animation_forward_(resu)
        return resu

########################################
# Data
class NeuData:
    d_count = 0
    
    def __init__(self, data=0, grad=None, from_op=None, name=None):
        # essential information
        self.data = data
        self.grad = grad
        self.from_op = from_op
        
        # comp. graph maintainance and visualisation
        self.id_ = f"D{NeuData.d_count}"
        NeuData.d_count += 1
        if name is None:
            name = self.id_
        self.name = name
                
        try:
            g_animator.make_animation_create(self)
        except:
            pass
        
    def __neg__(self):
        neg_op = NeuOpNeg(self)
        return NeuData(neg_op.forward(), from_op=neg_op)
        
    def __add__(self, another):
        add_op = NeuOpAdd(self, another)
        return NeuData(add_op.forward(), from_op=add_op)
    
    def __sub__(self, another):
        add_op = NeuOpAdd(self, -another)
        return NeuData(add_op.forward(), from_op=add_op)
    
    def __mul__(self, another):
        mul_op = NeuOpMul(self, another)
        return NeuData(mul_op.forward(), from_op=mul_op)
    
    def backward(self, g=1.0):
        if self.grad is None:
            self.grad = g
        else:
            self.grad += g
            
        if VERBOSE:
            print(f"{self.name} grad: {self.grad:.3f}")
        if self.from_op is not None:
            self.from_op.backward(g)
            
    def __setattr__(self, name, value):
        
        if name == "grad" and value is not None:
            try:
                g_animator.make_animation_backward(
                    self, 
                    [dict(node_id=self.id_, msg=value)])
            except Exception as e:
                pass
        
        super(NeuData, self).__setattr__(name, value)
            
        
    def __str__(self):
        return f"{self.data}"
    
    def __repr__(self):
        return f"{self.name}:{self.data:.3f}" if self.grad is None \
            else f"{self.name}:{self.data:.3f}:{self.grad:.3f}"
    
########################################
# Elementwise Functions
import math
class NeuFunction(object):
    """
    Functions are customised Uniary operators.
    The barrier for understanding is the dealing with backprop.
    We create a unary op, and hack the object's back-prop function.
    """
    def __init__(self, name=None):
        super(NeuFunction, self).__init__()
        self.name = name
        
    def forward(self, x):
        import types
        op = UnaryNeuOp(x, name=self.name)
        # op will be our placeholder in the differentiable computational graph.
        # we will never use op's forward computation (not defined anyway)
        # but we will take advantage of the *automatic* invocation of op's
        # `backward` method, which will be called when the output data pass
        # gradient messages through. 
        

        # We hijack the op's function and let it point to the `backward` function
        # of THIS object. NOTE, not THIS CLASS, `self` here would represent an instance
        # of a SUBCLASS, which will implement appropriate gradient transform.
        # See the Sigmoid function example below.
        op_original_back = op.backward
        def _back(op_instance, g):
            g_back = self.backward(op_instance, g) # self is subclass instance
            op_original_back(g_back)
        op.backward = types.MethodType(_back, op) # wrap it as a class-method function.
            # see https://stackoverflow.com/questions/10374527/dynamically-assigning-function-implementation-in-python
            # Amber's answer.
        return op
        
    def backward(self, g):
        raise NotImplementedError
        
    def __call__(self, x):
        """
        This will make the object "callable", i.e.
        f(x) === f.forward(x)
        """
        return self.forward(x)

def _sigmoid(x):
    return 1 / (1 + math.exp(-x))

class FuncSigmoid(NeuFunction):
    def __init__(self, name=None):
        super(FuncSigmoid, self).__init__(name)
    
    def forward(self, x):
        op = super(FuncSigmoid, self).forward(x)
        resu = _sigmoid(x.data)
        op.make_animation_forward_(resu)
        out = NeuData(resu, from_op=op)
        return out
    
    def backward(self, op, g):
        x = op.oprand
        y = _sigmoid(x.data)
        g = g * y * (1 - y)
        if VERBOSE:
            print(f"{self.name} pass-grad: {g:.3f}")
        return g

########################################
# Neural network components

def flatten_data(dlist):
    """Yield items from any nested iterable; see Reference."""
    from collections.abc import Iterable
    for x in dlist:
        if isinstance(x, Iterable):
            for sub_x in flatten_data(x):
                yield sub_x
        else:
            assert isinstance(x, NeuData), ValueError("Must be neu data")
            yield x
            
class NeuParameterisedModule(object):
    module_count = 0
    def __init__(self, name=None):
        if name is None:
            name = f"nn{NeuParameterisedModule.module_count}"
        self.name = name
        NeuParameterisedModule.module_count += 1
        self.parameters_ = []
        
    def __setattr__(self, name, value):
        """
        We hack the setting attribute to do bookkeeping of learnable parameters:
        """
        if name == "parameters_": # parameters are directly set
            print("set param")
            value = list(flatten_data(value))
            
        if isinstance(value, NeuParameterisedModule): # member sub-modules added
            print("add sub module")
            self.parameters_ += value.parameters_
            
        super(NeuParameterisedModule, self).__setattr__(name, value)
    
        
    def forward(self, x):
        raise NotImplementedError
        
    def __call__(self, x):
        return self.forward(x)

import random    
class Linear(NeuParameterisedModule):
    """
    Linear layer
    """
    def __init__(self, in_features, out_features, name=None):
        super(Linear, self).__init__(name)
            
        self.weights = \
            [[NeuData(random.gauss(0, 1), name=f"{self.name}:w{_o,_i}") 
              for _i in range(in_features)]
             for _o in range(out_features)]
        self.bias = \
            [NeuData(0, name=f"{self.name}:b{_o}") 
             for _o in range(out_features)]
        
        self.parameters_ = self.weights + self.bias
        
    def forward(self, x):
        from functools import reduce
        out = []
        for w, b in zip(self.weights, self.bias):
            weighted = [wj * xj for wj, xj in zip(w, x)]
            out.append(reduce(lambda a1, a2:a1 + a2, weighted + [b,]))
            
        return out
    



In [0]:
# Unit test of Functions
f = FuncSigmoid()
x1 = NeuData(0.2, name="x1")
w1 = NeuData(3, name="w1")
h = x1 * (w1 * w1)
y = f(h) + f(w1)
y.backward()
print(f"Auto-back w1.grad: {w1.grad}")

# Verify numerically
f = lambda x: 1 / (1 + math.exp(-x))
x1 = 0.2
w1 = 3
h = x1 * (w1 * w1)
y = f(h) + f(w1)

t_eps = 1e-3
w1 += t_eps
h = x1 * (w1 * w1)
y_new = f(h) + f(w1)
print(f"Numerical w1.grad: {(y_new - y) / t_eps}")


In [0]:
# Test using our module to create a neural net
class MyNN(NeuParameterisedModule):
    def __init__(self, name="nn"):
        super(MyNN, self).__init__(name)
        self.lin1 = Linear(in_features=2, out_features=3, name=f"{self.name}:l1")
        self.lin2 = Linear(in_features=3, out_features=1, name=f"{self.name}:l2")
        
    def forward(self, x):
        f = FuncSigmoid()
        pre_h = self.lin1(x)
        h = [ f(hi) for hi in pre_h ]
        pre_out = self.lin2(h)
        out = [ f(yi) for yi in pre_out ]
        return out
    
random.seed(42)    
mynn = MyNN()
print(mynn.parameters_)
y = mynn([NeuData(1), NeuData(2)])
print(y[0])
y[0].backward()

## 2.3 Deep Architecture Applied

### 2.3.1 Prepare Data

We copy the library, data and environment preparation below for quick and show reference.

In [0]:
# The example is adopted from pytorch document:
# see https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
from torch.nn import functional as fn
import matplotlib.pyplot as plt
import numpy as np

# Change the type of the virtual machine in "Runtime/Runtime Type"
device = torch.device("cuda:0") if torch.cuda.is_available() \
    else torch.device("cpu")

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')



# functions to show an image
def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images))

# print labels
print(' '.join(classes[l] for l in labels))

### 2.3.2 Define and Train Network.

In [0]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        batch_size = x.shape[0]
        h = fn.max_pool2d(fn.relu(self.conv1(x)), kernel_size=2)
        h = fn.max_pool2d(fn.relu(self.conv2(h)), kernel_size=2)
        h = h.view(batch_size, -1)
        h = fn.relu(self.fc1(h))
        h = fn.relu(self.fc2(h))
        y = self.fc3(h)
        return y

net = Net()
net = net.to(device)

In [0]:
import torch.optim as optim
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data
        inputs = inputs.to(device)
        labels = labels.to(device)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = fn.cross_entropy(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

criterion = nn.CrossEntropyLoss()