# Introduction


**ASSIGNMENT DEADLINE: Week 7, 29 Sep 2020 17:00**

In this assignemnt, the task is to implement some basic components for training neural network. You need to:

- implement the [Adamax](https://ruder.io/optimizing-gradient-descent/index.html#adamax) algorithm in nn/optimizers.py
- implement linear class in nn/operators.py, which are used by the Linear layers (in nn/layer.py) 
- implement leaky_relu class in nn/operators.py
- implement a simple classifier in model/simple_classifier.py and train it to classify mnist images for digit 0 and 1.

**Attention**:
- To run this Jupyter notebook, you need to install the dependent libraries as stated in [README.MD](README.MD). You do not need and should not use other libraries (like tensorflow and pytorch) in your code except Python and numpy. The major version of Python should be 3.
- You do not need a GPU for this assignment. CPU is enough.
- Do not run this notebook before you finish the implementation of the required functions. otherwise, you will see errors as this notebook will call the functions to be implemented by you.
- Do not change the signature (the name and arguments) of the existing functions in the repository ; otherwise your implementation cannot be tested correctly and you will get penalty.
- Do not change the structure of files in the repository (e.g., adding, renaming or deleting any files); otherwise your implementation cannot be tested correctly and you will get penalty.
- You can add functions in the existing files, but you should not change the import statements (e.g., adding a new import statement). For example, if you want to implement a function foo(), you can implement it inside operator.py and call it; but you cannot implement it in another file and import that file in operators.py. Otherwise your implementation cannot be tested correctly and you will get penalty.
- After you implement one function, remember to restart the notebook kernel to help it recognize your fresh code.

In [1]:
print("hello!")

hello!


## Structure of the repository

The structure of this repository is shown as below:

```bash
codes/
    nn/              # components of neural networks
        operators.py    # operators; **You need to edit this file to add missing code**
        optimizers.py   # optimizing methods; **You need to implement the Adamax algorithm**
        layers.py       # layer abstract
        loss.py         # loss function for optimization
        initializers.py # initializing methods to initialize parameters (like weights, bias)
    model/
        simple_classifier.py  ## a simple model with two fully connected linear layer
    utils/              # some additional tools
        check_grads_cnn.py  # help you check your forward function and backward function
        tools.py        # other useful functions for testing the codes
        dataloader.py    ## loading data
    main.ipynb          # this notebook which calls the functions in other modules/files
    README.MD           # list of dependent libraries
```

## Functionality of this notebook

This iPython notebook serves to:

- explain code structure, main APIs
- explain your implementation task and tuning task
- provide code to test your implemented forward and backward function for different operations
- provide related materials to help you understand the implementation of some operations and optimizers

*You can type `jupyter lab` in the terminal to start this jupyter notebook when your current working directory is cs5242. It's much more convinient than jupyter notebook.*

# Your tasks

## Base classes

In [nn/optimizers.py](nn/optimizers.py), we define the base optimizer class. We have also implemented SGD, Adagrad, Adam for you. **You only need to implement the AdaMax optimizer in the [nn/optimizers.py](nn/optimizers.py) following the same style.**

```python
class Optimizer():

    def __init__(self, lr):
        """Initialization

        # Arguments
            lr: float, learnig rate 
        """
        self.lr = lr

    def update(self, x, x_grad, iteration):
        """Update parameters with gradients"""
        raise NotImplementedError

    def sheduler(self, func, iteration):
        """learning rate sheduler, to change learning rate with respect to iteration

        # Arguments
            func: function, arguments are lr and iteration
            iteration: int, current iteration number in the whole training process (not in that epoch)

        # Returns
            lr: float, the new learning rate
        """
        lr = func(self.lr, iteration)
        return lr
```



In [nn/operators.py](nn/operators.py), we define the base operator class and have implemented some operations (like relu) for you. **You only need to implement the rest operations in the [nn/operators.py](nn/operators.py) following the same style.

```python
class operator(object):
    """
    operator abstraction
    """

    def forward(self, input):
        """Forward operation, reture output"""
        raise NotImplementedError

    def backward(self, out_grad, input):
        """Backward operation, return gradient to input"""
        raise NotImplementedError
```



## Adamax Optimizer
In the file [nn/optimizers.py](nn/optimizers.py), there are 6 types of optimizer (`SGD`, `Adam`, `RMSprop`, `Adamax`, `Nadam`and `Adagrad`). **You only need to implement the `update` function of `Adamax`**. Follow https://ruder.io/optimizing-gradient-descent/index.html#adamax for implementing `Adamax`.

`Adamax` optimizer is initialized like this:

```python
class Adamax(Optimizer):

    def __init__(self, lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0, sheduler_func=None):
        """Initialization

        # Arguments
            lr: float, learnig rate 
            beta_1: float
            beta_2: float
            epsilon: float, precision to avoid numerical error 
            decay: float, the learning rate decay ratio
        """
        super(Adamax, self).__init__(lr)
        self.beta_1 = beta_1
        self.beta_2 = beta_2
        self.epsilon = epsilon
        self.decay = decay
        if not self.epsilon:
            self.epsilon = 1e-8
        self.momentum = None
        self.accumulators = None
        self.sheduler_func = sheduler_func

```

### Forward function of AdaMax Optimizer

You need to implement the update function of the `Adamax` class in [nn/operators.py](nn/operators.py).

You can test your implementation by restarting jupyter notebook kernel and running the following:

In [8]:
%load_ext autoreload
%autoreload 2
#### import keras
print("import tensorflow!")
import tensorflow as tf
print("import keras!")
from keras.layers import Dense
import numpy as np
print("import optimizers!")
from keras.optimizers import Adamax as keras_Adamax
print("import Adamax!")
from nn.optimizers import Adamax
print("finish importing Adamax!")
from keras.losses import SparseCategoricalCrossentropy
import warnings
print("import utils!")
warnings.filterwarnings('ignore')
from utils.tools import rel_error
print("finish import utils!")

batch_size = 20
in_features = 10
out_features = 2
x = np.random.uniform(size=(batch_size, in_features))
label = np.random.randint(low = 0, high = out_features, size=batch_size)
weight = {}
grad = {}
adamax = Adamax(lr=0.001, beta_1=0.9, beta_2=0.999,epsilon=1e-07)
layer1 = Dense(out_features, input_shape = (in_features,))

loss_fn = SparseCategoricalCrossentropy()
print("start!")

optimizer = keras_Adamax(lr=0.001, beta_1=0.9, beta_2=0.999,epsilon=1e-07)
with tf.GradientTape() as tape:

    logits = layer1(x)
    loss = loss_fn(label, logits)
    gradients = tape.gradient(loss, layer1.trainable_weights)
    
    weight['layer1'] = layer1.trainable_weights[0].numpy()
    grad['layer1'] = gradients[0].numpy()

    optimizer.apply_gradients(zip(gradients, layer1.trainable_weights))
    update_weight = adamax.update(weight,grad,0)['layer1']
    keras_update_weight = layer1.trainable_weights[0].numpy()

print('Relative error of Adamax Update (<1e-6 will be fine): ', rel_error(keras_update_weight, update_weight))

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
import tensorflow!
import keras!
import optimizers!
import Adamax!
finish importing Adamax!
import utils!
finish import utils!
start!


To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

key: layer1  value: [[ 0.2742197  -0.421238  ]
 [-0.04879677  0.37199432]
 [ 0.04655409 -0.07221776]
 [ 0.20866805  0.38413125]
 [-0.12011111  0.31888682]
 [-0.13493294  0.1065194 ]
 [ 0.23235255 -0.10372084]
 [ 0.05149519  0.3167202 ]
 [-0.26066342  0.6803432 ]
 [-0.06371373 -0.15783554]]
(10, 2)


IndexError: invalid index to scalar variable.

## Linear layer

Linear (in [nn/layers.py](nn/layers.py)) implements the fully connected linear layer. It maintains the weight matrix and bias vector, and calls the forward and backward funtion of `linear` class in [nn/operators.py](nn/operators.py) to do the real operations. 


```python
class Linear(Layer):
    def __init__(self, in_features, out_features, name='linear', initializer=Gaussian()):
        """Initialization

        # Arguments
            in_features: int, the number of input features
            out_features: int, the numbet of required output features
            initializer: Initializer class, to initialize weights
        """
        super(Linear, self).__init__(name=name)
        self.linear = linear()

        self.trainable = True

        self.weights = initializer.initialize((in_features, out_features))
        self.bias = np.zeros(out_features)

        self.w_grad = np.zeros(self.weights.shape)
        self.b_grad = np.zeros(self.bias.shape)

    def forward(self, input):
        output = self.linear.forward(input, self.weights, self.bias)
        return output

    def backward(self, out_grad, input):
        in_grad, self.w_grad, self.b_grad = self.linear.backward(
            out_grad, input, self.weights, self.bias)
        return in_grad

```

### Forward function of linear operator

You need to implement the forward function of the `linear` class in [nn/operators.py](nn/operators.py).

The input consists of N data points, each with in_features channels, Output consists of N data points with out_features channels.

You can test your implementation by restarting jupyter notebook kernel and running the following:

In [None]:
import numpy as np
import warnings
warnings.filterwarnings('ignore')

from nn.layers import Linear
from utils.tools import rel_error

from keras import Sequential
from keras.layers import Dense

batch_size = 10
in_features = 3
out_features = 10
input = np.random.uniform(size=(batch_size, in_features))

linear = Linear(in_features, out_features)
out = linear.forward(input)

keras_linear = Sequential()
layer1 = Dense(out_features, input_shape = (in_features,))
keras_linear.add(layer1) 
keras_out = keras_linear.predict(input, batch_size=batch_size) ##specify the shape of weight matrix
keras_linear.layers[0].set_weights([linear.weights, linear.bias])
keras_out = keras_linear.predict(input, batch_size=batch_size)

print('Relative error (<1e-6 will be fine): ', rel_error(out, keras_out))

### Backward function of linear operator
 
You need to implement the backward function for the `linear` class in the file [nn/operators.py](nn/operators.py). 

When you are done, restart jupyter notebook and run the following to check your backward implementation. 

In [None]:
from nn.layers import Linear
import numpy as np
from utils.check_grads_cnn import check_grads_layer

batch = 10
in_features = 3
out_features = 10

input = np.random.uniform(size=(batch, in_features))
out_grad = np.random.uniform(size=(batch, out_features))

linear = Linear(in_features, out_features)

check_grads_layer(linear, input, out_grad)

## Leaky_ReLU

Leaky_ReLU (in nn/layer.py) implements leaky relu. It calls the forward and backward funtion of the leaky_relu class in [nn/operatiors.py](nn/operators.py) to do the real operations. 

The initialization, forward and backward funtion of the `Leaky_ReLU` layer are shown as below:

```python
class Leaky_ReLU(Layer):
    def __init__(self, alpha = 0.01, name='leaky_relu'):
        """Initialization
        """
        super(Leaky_ReLU, self).__init__(name=name)
        # alpha: Float >= 0. Negative slope coefficient. Default to 0.01.
        self.leaky_relu = leaky_relu(alpha)

    def forward(self, input):
        """Forward pass

        # Arguments
            input: numpy array

        # Returns
            output: numpy array
        """
        output = self.leaky_relu.forward(input)
        return output

    def backward(self, out_grad, input):
        """Backward pass

        # Arguments
            out_grad: numpy array, gradient to output
            input: numpy array, same with forward input

        # Returns
            in_grad: numpy array, gradient to input 
        """
        in_grad = self.leaky_relu.backward(out_grad, input)
        return in_grad
```

### Forward function of leaky-relu operator

You need to implement the forward function for `leaky_relu` class in the file [nn/operators.py](nn/operators.py).

You can test your implementation by restarting jupyter notebook kernel and running the following:

In [None]:
import numpy as np
from nn.layers import Leaky_ReLU
from keras.layers import LeakyReLU
from utils.tools import rel_error

batch_size = 10
in_features = 10
alpha = 0.01
input = np.random.uniform(low=-1.0, high=1.0,size=(batch_size, in_features))
leaky_relu = Leaky_ReLU(alpha)
keras_leaky_relu = LeakyReLU(alpha)
out = leaky_relu.forward(input)
keras_out = keras_leaky_relu(input)

print('Relative error of Leaky ReLU (<1e-6 will be fine): ', rel_error(out, keras_out))

### Backward function of leaky_relu operator

In [nn/operatiors.py](nn/operators.py), you need to implement the backward function for `leaky_relu` class. After the implementation, restart jupyter notebook and run the following cell to check your implementation.

In [None]:
import numpy as np
from utils.check_grads_cnn import check_grads_layer
from nn.layers import Leaky_ReLU

batch_size = 10
in_features = 10
input = np.random.uniform(size=(batch_size, in_features))
out_grad = np.random.uniform(size=(batch_size, in_features))
leaky_relu = Leaky_ReLU()

check_grads_layer(leaky_relu, input, out_grad)


## Classification using a simple two layer perceptron

The following code trains a simple classifier defined in models/simple_classifier.py.

Your task is to tune the architecture and the hyper-parameters to improve the performance (validation accuracy) of the model.

You should submit this notebook with the best performed model. 

In [None]:
import numpy as np
from utils.dataloader import *
from models.simple_classifier import simple_classifier
from nn.optimizers import *
        
# hyper-parameters
batch = 32
log_freq = 50 # log-printing frequency
batches_of_epoch = len(X_train) // batch
epochs = 20

data_loader = batch_loader(X_train, y_train, batch, shuffle=False)
model = simple_classifier(n_in=784, n_out1=120, n_out2=2)
opt = Adamax() #, momentum=0.8)

metrics = [] # to store intermediate results during optimization

for i in range(epochs):
    print(f"==> epoch {i+1}")
    
    sum_train_acc, sum_train_loss = 0, 0

    for itr in range(batches_of_epoch):
        # get batch of data
        X_train_b, y_train_b = next(data_loader)

        train_acc, train_loss = model.forward(X_train_b, y_train_b)
        grads = model.backward(X_train_b, y_train_b)
        params,grads =  model.get_params()
        new_params = opt.update(params, grads, itr)
        model.update(new_params)

        sum_train_acc += train_acc
        sum_train_loss += train_loss

        if itr % log_freq == 0:
            print("\t iter %d \t train acc = %.2f%%, train loss = %.4f" %(itr, train_acc, train_loss))

    avg_train_acc, avg_train_loss = sum_train_acc / batches_of_epoch, sum_train_loss / batches_of_epoch

    test_acc, test_loss = model.forward(X_test, y_test)
    print("\t avg train acc = %.2f%%, avg train loss = %.4f | test acc = %.2f%%, test loss = %.4f" 
          % (avg_train_acc, avg_train_loss, test_acc, test_loss))
        
    metrics += [[avg_train_acc, avg_train_loss, test_acc, test_loss]] # to store intermediate results

# Marking Scheme

Marking scheme is shown below(15 marks in total):
-  3 marks for `Adamax` update function
-  2 marks for `linear` forward function, 2 marks for `linear` backward function
-  1 marks for `leaky relu` forward function, 2 mark for `leaky relu` backward function
-  3 marks for simple_classfier implementation and model tuning
-  1 mark for code style
-  1 mark for submission format


We will run multiple test cases to check the correctness of your implementation. You may not get the full marks even if you pass the tests in this notebook as we have a few other test cases for each task, which are not included in this notebook.

The submitted main.ipynb should include the running results of all cells. The model tuning part will be evaluated based on the tuning result (i..e, the printed metrics).

As for submission format, please follow below submission instructions.

**DO NOT** use external libraries like Tensorflow, keras and Pytorch in your implementation. **DO NOT** copy the code from the internet, e.g. github. We have offered all materials that you can refer to in this notebook.

# Final submission instructions
Please submit the following:

1) Your codes in a folder named `codes`, and keep the structure of all files in this folder the same as what we have provided. 

**ASSIGNMENT DEADLINE: Week 7, 29 Sep 17:00, 15% off per day late (17:01 is the start of one day)**

Do not include the `data` folder. Please zip the following folders under a folder named with your student number: eg. `a0123456g.zip` and submit the zipped folder to LumiNUS/Files/Assignment Submission. If unzip the file, the structure should be like this:

```bash
a0123456g/
    codes/
        models/
            ...
        nn/
            ...
        utils/
            ...
        main.ipynb
        README.MD
```