# (E6) Autoencoders
In this exercise, you will be given an example of [autoencoders](https://en.wikipedia.org/wiki/Autoencoder). 
You should be able to replicate the results given here if you have completed (E2)-(E5) correctly.

It would be best if you have a Python IDE (integrated development environment) such as [PyCharm](https://www.jetbrains.com/pycharm/) and [Anaconda](anaconda.com) is installed because they will make your life easier! If not, you may want to work on the assignment using Google Colab. In any cases, what you need to do is 1) to fill in the blanks in .py files; and 2) to import the files (e.g., layer.py, optim.py, model.py, etc) that you have completed for use. Here are some scenarios how you would go about doing the assignment: 

#### Without Google Colab: Python IDE + Anaconda 
If you have a Python IDE and Anaconda installed, you can do one of the following:
- Edit .py files in the IDE. Then, simply open .ipynb file also in the IDE where you can edit and run codes. 
- Your IDE might not support running .ipynb files. However, since you have installed Anaconda, you can just open this notebook using Jupyter Notebook.

In both of these cases, you can simply import .py files in this .ipynb file:
```python
from model import NeuralNetwork
```
 
#### With Google Colab
- Google Colab has an embedded code editor. So, you could simply upload all .py files to Google Colab and edit the files there. Once you upload the files, double click a file that you want to edit. Please **make sure that you download up-to-date files frequently**, otherwise Google Colab might accidentally restart and all your files might be gone.
- If you feel like the above way is cumbersome, you could instead use any online Python editors for completing .py files (e.g., see [repl.it](https://repl.it/languages/python3)). Also, it's not impossible that you edit the files using any text editors, but they don't show you essential Python grammar information, so you'll be prone to make mistakes in that case. Once you are done editing, you can either upload the files to Colab or follow the instruction below. 
 
- If you have *git clone*d the assignment repository to a directory in your Google Drive (or you have the files stored in the Drive anyway), you can do the following:
```jupyterpython
from google.colab import drive
drive.mount('/content/drive/')          # this will direct you to a link where you can get an authorization key
import sys
sys.path.append('/content/drive/My Drive/your-directory-where-the-python-files-exist')
```
Then, you are good to go. When you change a .py file, make sure it is synced to the drive, then you need to re-run the above lines to get access to the latest version of the file. Note that you should give correct path to *sys.path.append* method.

Now, let's get started!
## Autoencoder
### Input and Target
An autoencoder learns the latent embeddings of inputs in an unsupervised way. This is because we do not need to have specific target values associated with the inputs; however, the input data themselves will act as the targets. 

To see it more concretely, let's look at below code which prepares the data for learning an autoencoder. 

In [1]:
import numpy as np
def generate_data(num=8):
    """ Generate 'num' number of one-hot encoded integers. """ 
    x_train = np.eye(num)[np.arange(num)]                       # This is a simple way to one-hot encode integers
    
    # Repeat x_train multiple times for training
    x_train = np.repeat(x_train, 100, axis=0)
    
    # The target is x_train itself!
    x_target = x_train.copy()
    return x_train, x_target    

Clearly, *x_target* is the same as *x_train*. So, what we want to do is to encode 8-bit inputs using 3 hidden nodes, which in turn will be decoded back to the original 8-bit value by the decoder. Learning an autoencoder, therefore, means that we train both the encoder weight and the decoder weight. In our example, since we have 3 hidden nodes in a single layer, the encoder weight has *[8, 3]* shape, whereas the decoder weight has *[3, 8]* shape. 

### Training an Autoencoder
Now, let us train an autoencoder with the sigmoid activation function and the cross-entropy loss.

In [2]:
from model import NeuralNetwork
from layer import FCLayer
from activation import Activation
from utils import *
from loss import CrossEntropyLoss
from optim import SGD, Adam, RMSProp
# Load data
num = 8
np.random.seed(10)
x_train, x_target = generate_data(num=num)

In [3]:
# Define a model and add fully-connected and activation layers.
nn = NeuralNetwork()
nn.add(FCLayer(x_train.shape[1], 3, initialization='xavier', uniform=False))
nn.add(Activation(sigmoid, sigmoid_prime))
nn.add(FCLayer(3, x_train.shape[1], initialization='xavier', uniform=False))

In [4]:
# Define loss: note that CrossEntropyLoss is using the softmax output internally
loss = CrossEntropyLoss()
nn.set_loss(loss)

In [5]:
# Set up hyperparameters
lr = 0.001
epochs = 2000
freq = epochs // 10
batch_size = 64

In [6]:
# Define optimizer and associate it with the model
optimizer = Adam(nn.parameters(), lr=lr)
nn.set_optimizer(optimizer)

In [21]:
# Training begins
inds = list(range(x_train.shape[0]))
N = x_train.shape[0]

loss_hist = []
for epoch in range(epochs):
    inds = np.random.permutation(inds)
    x_train = x_train[inds]
    x_target = x_target[inds]
    
    loss = 0
    for b in range(0, N, batch_size):
        # get the mini-batch
        x_batch = x_train[b: b+batch_size]
        x_target_batch = x_target[b: b+batch_size]
        #print(x_batch)
        
        # feed forward
        pred = nn.predict(x_batch)
        
        # Error
        loss += nn.loss(pred, x_target_batch) / N
        
        # Back propagation of error
        nn.backward(pred, x_target_batch)
        
        # Update parameters
        nn.optimizer.step()

    # Record loss per epoch
    loss_hist.append(loss)

    if epoch % freq == 0:
        print()
        print("Epoch %d/%d\tloss=%.5f" % (epoch + 1, epochs, loss), end='\t', flush=True)
        
        # Test with the training data
        pred = nn.predict(x_train, mode=False)
        l = nn.loss(pred, x_target)
        print("Test loss: {:.5f}".format(l), end='')

print("\nTraining finished!")
print("Print prediction results:")
x_test = np.eye(num)[np.arange(num)]                        # Test data (one-hot encoded)
np.set_printoptions(2)
for x in x_test:
    print("\tInput: {}\tOutput: {}".format(x, softmax(nn.predict(x[None, :], mode=False))))
    print("Pass activation:", nn.layers[2].input_data)


Epoch 1/2000	loss=0.00008	Test loss: 0.06763
Epoch 201/2000	loss=0.00003	Test loss: 0.02636
Epoch 401/2000	loss=0.00001	Test loss: 0.01029
Epoch 601/2000	loss=0.00001	Test loss: 0.00403
Epoch 801/2000	loss=0.00000	Test loss: 0.00158
Epoch 1001/2000	loss=0.00000	Test loss: 0.00062
Epoch 1201/2000	loss=0.00000	Test loss: 0.00024
Epoch 1401/2000	loss=0.00000	Test loss: 0.00010
Epoch 1601/2000	loss=0.00000	Test loss: 0.00004
Epoch 1801/2000	loss=0.00000	Test loss: 0.00002
Training finished!
Print prediction results:
	Input: [1. 0. 0. 0. 0. 0. 0. 0.]	Output: [[1.00e+00 1.20e-27 3.28e-09 3.28e-21 3.01e-10 1.77e-15 6.22e-21 2.54e-09]]
	Input: [0. 1. 0. 0. 0. 0. 0. 0.]	Output: [[1.08e-29 1.00e+00 3.54e-19 1.59e-09 1.05e-16 6.43e-09 1.59e-09 4.53e-19]]
	Input: [0. 0. 1. 0. 0. 0. 0. 0.]	Output: [[2.02e-09 3.16e-19 1.00e+00 3.50e-30 5.38e-17 6.24e-09 1.77e-09 3.06e-19]]
	Input: [0. 0. 0. 1. 0. 0. 0. 0.]	Output: [[3.60e-21 2.54e-09 7.80e-28 1.00e+00 3.93e-10 1.22e-15 3.76e-21 2.53e-09]]
	Input: [

If you look at the output values of the network, clearly we have successfully trained the autoencoder to encode-decode 8-bit integers!

## (E7) Your Turn:  Explain the autoencoder
Given the trained model that can encode the 0-7 integers, explain how the NN model learned to encode/compress the numbers. Rather than just stating your reasoning in words, do explore the model closely to see what it has learned. 

The autoencoder works in two parts: the encoder and the decoder. First, the encoder take the 8x8 matrix and converts it into an 8x3 matrix. The decoder, then takes the 8x3 matrix and attempts to decode it and approximate it as best as possible into the original 8x8 matrix. I've printed out the values for the final weights and biases for the network below and all the activations for each pass above. 

In [22]:
#Weights & bias
print("Weights:",nn.layers[0].weights.value)
print("Bias:",nn.layers[0].bias.value)


Weights: [[-10.68   8.31 -10.12]
 [ 10.31 -12.06   9.9 ]
 [ 10.4    8.04 -10.74]
 [-10.43 -11.29   9.79]
 [-10.4  -10.86  -9.89]
 [ 11.15   9.21  10.29]
 [  9.88 -11.12 -10.11]
 [-11.15   9.    10.04]]
Bias: [[-0.07  1.21 -0.04]]


In [59]:
#The following is the encoded x_test matrix into an 8x3 matrix
print(x_test @ nn.layers[0].weights.value + nn.layers[0].bias.value)

[[-10.74   9.52 -10.16]
 [ 10.24 -10.86   9.87]
 [ 10.33   9.25 -10.78]
 [-10.49 -10.08   9.75]
 [-10.47  -9.66  -9.93]
 [ 11.09  10.41  10.25]
 [  9.81  -9.91 -10.14]
 [-11.22  10.21  10.  ]]


In [71]:
#This takes the above matrix and puts it through the forward method, this is the activation 
print(np.round(nn.layers[1].forward(x_test @ nn.layers[0].weights.value + nn.layers[0].bias.value)))

[[0. 1. 0.]
 [1. 0. 1.]
 [1. 1. 0.]
 [0. 0. 1.]
 [0. 0. 0.]
 [1. 1. 1.]
 [1. 0. 0.]
 [0. 1. 1.]]


In [44]:
#forward pass for data
for x in x_test:
    print(nn.predict(x[None, :], mode=False))

[[ 25.47 -36.52   5.93 -21.7    3.54  -8.5  -21.06   5.68]]
[[-46.78  19.92 -22.57  -0.35 -16.88   1.05  -0.34 -22.32]]
[[ -0.13 -22.71  19.89 -47.93 -17.57   1.    -0.26 -22.74]]
[[-21.19   6.1  -36.53  25.89   4.23  -8.45 -21.14   6.09]]
[[  4.93  -7.8   -7.69   4.75  25.02 -17.72   5.03  -7.86]]
[[-26.25  -8.81  -8.95 -26.8  -38.36  10.27 -26.43  -8.79]]
[[-20.66   6.02   6.27 -21.48   3.91  -8.22  25.83 -36.27]]
[[ -0.65 -22.63 -22.91  -0.57 -17.25   0.77 -47.23  19.63]]


In [74]:
 for x in x_test:
    print(np.round(softmax(nn.predict(x[None, :], mode=False)))) #after rounding we can see that the output is the same as the x_Test input

[[1. 0. 0. 0. 0. 0. 0. 0.]]
[[0. 1. 0. 0. 0. 0. 0. 0.]]
[[0. 0. 1. 0. 0. 0. 0. 0.]]
[[0. 0. 0. 1. 0. 0. 0. 0.]]
[[0. 0. 0. 0. 1. 0. 0. 0.]]
[[0. 0. 0. 0. 0. 1. 0. 0.]]
[[0. 0. 0. 0. 0. 0. 1. 0.]]
[[0. 0. 0. 0. 0. 0. 0. 1.]]
