# 1 Feedforward Neural Network

## 1-1
<style>
    .red {
        color: red;
    }
    .blue {
        color: skyblue;
    }
</style>

Design a FNN model architecture and use the file of the initial weights and biases “<span class="blue">weights.npy</span>”. 

Run the <span class="red">backpropagation</span> algorithm and use the <span class="red">mini-batch SGD</span> (stochastic gradient descent) 
$$
    \mathbf{w}^{(\tau+1)}=\mathbf{w}^{(\tau)}-\eta \nabla J\left(\mathbf{w}^{(\tau)}\right)
$$
to optimize the parameters (<span class="blue">the weights and biases)</span>,
where $\nabla$ is the learning rate. 

<span class="red">You should implement the FNN training under the following settings:</span>

- number of layers: 3
- number of neurons in each layer (in order): 2048, 512, 5
- activation function for each layer (in order): relu, relu, softmax
- number of training epochs: 30
- learning rate: 0.01
- batch size: 200
- **important note**: For 1(a), <span class="red">DO NOT RESHUFFLE THE DATA.</span> We had already shuffled the data for you.

Reshuffling will make <span class="blue">your result differ from our ground-truth result</span>, and <span class="red">any difference will result in reduction of your points.</span>

On the same note, when splitting the samples into batches, split them in the given sample order.

<style>
    .red {
        color: red;
    }
    .blue {
        color: skyblue;
    }
</style>
(a) **Plot** the <span class="blue">learning curves</span> of $J(\mathbf{w})$ and the <span class="blue">accuracy</span> of classification <span class="blue">for every 25 iterations</span>, with training data as well as test data, also, **show** the final loss and accuracy values.

<style>
    .red {
        color: red;
    }
    .blue {
        color: skyblue;
    }
</style>
(b) **Repeat 1(a)** by considering <span class="red">zero initialization</span> for the model weights. And **make some discussion.**

## 1-2
<style>
    .red {
        color: red;
    }
    .blue {
        color: skyblue;
    }
</style>

Based on the model in 1, please <span class="blue">implement the dropout layers</span> and apply them <span class="blue">after the first two hidden layers</span>, i.e. the layers with 2048 and 512 neurons. 

The <span class="blue">dropout rate should be set as 0.2</span> for both layers. 

Note that the dropout operation <span class="blue">should only be applied in the training phase</span> and should be disabled in the test phase.

(a) **Train** the model by using the same settings in 1 and **repeat 1(a).**

(b) Based on the experimental results, how the dropout layers affect the model performance and why? Please **make some discussion.**

## 1-3

Based on the model in 1, please implement mini-batch SGD (stochastic gradient descent).

In this problem, we need to reshuffle the data in every batch. Note that the other settings remain the same. 

Please set the random seed as **42**, and please use **random** library that we have imported.

<style>
    .red {
        color: red;
    }
    .blue {
        color: skyblue;
    }
</style>

(a) **Plot** the <span class="blue">learning curves</span> of $J(\mathbf{w})$ and the classification <span class="blue">accuracy for every 25 iterations.</span> Please **show** the final values of loss and accuracy.

<style>
    .red {
        color: red;
    }
    .blue {
        color: skyblue;
    }
</style>

(b) Based on the experimental results, how the <span class="blue">process of reshuffling images</span> affects the model performance and why? Please **make some discussion.**

In [None]:
# Example code for reading the data and the initial weights and biases.
# Note: This is just an example of how to read these files, you can modify the code in your own implementation.

import numpy as np
import random

train_x, train_y = np.load('train_x.npy'), np.load('train_y.npy')
test_x, test_y = np.load('test_x.npy'), np.load('test_y.npy')

print('shape of data:')
print(train_x.shape)
print(train_y.shape)
print(test_x.shape)
print(test_y.shape)


checkpoint = np.load('weights.npy', allow_pickle=True).item()
init_weights = checkpoint['w']
init_biases = checkpoint['b']

print('shape of weights:')
for w in init_weights:
    print(w.shape)
    

print()

print('shape of biases:')
for b in init_biases:
    print(b.shape)