# Lab 1: Neural Network From Scratch

This exercise consists of two parts:

- Part 1: NN from scratch (in-class exercise)
- Part 2: Recap Sentiment Analysis system in Sklearn (homework)


### Part I: Neural Network from scratch

In this exercise you will implement the forward step of a FFNN from scratch and compare your solution to Keras. It is very important that you understand the basic building blocks (how to encode your instances, the labels, how to do the forward pass and apply the weights, activation function etc). It is essential to understand the basic mechanisms, as that is what underlies many of the deep learning toolkits nowadays and the more advanced models.

You are going to implement the forward step manually on a small dataset. 

We are assuming multi-class classification tasks. The labels are $$ y \in \{a,b,c\}$$

The input data consists of 4 features each encoding a numeric value:

In [1]:
import numpy as np

classes = ["a","b","c"]

In [2]:
## training data
data_train = np.array([
                       [5,3,2,2],
                       [1,3,4,5],
                       [1,2,3,4],
                       [0,3,1,5],
                       [1,3,1,1]
                    ])

labels_train = ["a", "b", "c", "b", "b"]


### Step 1: Forward pass

Implement the forward pass using `numpy` for the feedforward neural network illustrated in the figure.

* How many neurons does hidden layer 1 and hidden layer 2 have? Note: the bias note is not shown in the figure, consider them as separate neurons.
* How many neurons does the output layer have? And the input layer?
* Assume there is a `tanh` activation function between the layers. (hint: you can use `np.tanh`)
* Which activation function is on the output layer, given the labels above?
* Hint: use `.shape` to check the dimensions of your inputs

<img src="nn.svg">


Specify the size of the various parts of the feedforward neural network. You can use helper variables such as `input_dim = ?` or `hidden_dim1 = ?` etc.

In [3]:
## helper functions to determine sizes
batch_size,input_dim = data_train.shape # (bs,inpd)
hidden_dim1 = 15
hidden_dim2 = 20
output_dim = len(classes)


Define the shape of the parameters to be learned for this network using numpy arrays. For now, simply initialize them with ones or random numbers (`np.ones((3,4))` defines a matrix of ones of size `3x4`, similarly, [np.random.randn](https://www.numpy.org/devdocs/reference/generated/numpy.random.randn.html) `np.random.randn(3,4)` initializes a matrix of the same size with random sample from the standard normal distribution.

* What are all the parameters of this neural network and what is their shape?


In [11]:
## define all parameters of this NN
np.random.seed(42)
w1 = np.random.randn(input_dim,hidden_dim1)
w2 = np.random.randn(hidden_dim1,hidden_dim2)
w3 = np.random.randn(hidden_dim2,output_dim)
b1 = np.random.randn(1,hidden_dim1)
b2 = np.random.randn(1,hidden_dim2)
b3 = np.random.randn(1,output_dim)

Now that we have defined the shape of all parameters, we are ready to "connect the dots" and build the network. 

It is instructive to break the computation of each layer down into two steps: the scores $a1$ are obtained by the linear function followed by the activation applications $\sigma$ to obtain the representation $z1$, as in:

$$ a1 = xW_1 + b_1$$
$$ z1 = \sigma(a1)$$

Specify the entire network up to the output layer $z3$, and **up to and exclusive** the final application of the softmax, the last activation function, which is provided.

The exact implementation of the softmax might differ from toolkit to toolkit (due to variations in implementation details in order to obtain numerical stability). Therefore, we will use the Keras backend function for the softmax calculation which accesses the tensorflow `Tensor` object. This makes sure that the manual calculations of the forward pass due not differ from the Keras-based implementation just because of the difference in the softmax calculation.

In [5]:
## use this softmax
## imports for softmax
from keras import backend as K
from keras  import activations

def keras_softmax(scores):
    ## softmax calculation
    var = K.variable(value=scores)
    act_tf = activations.softmax(var) # returns Tensor
    softmax_scores = K.eval(act_tf) # return numpy array
    return softmax_scores


Using TensorFlow backend.


In [23]:

## implement the forward pass (up to and exclusive the softmax) 
## apply it to the training data `data_train` - use vectorization
def forward_pass(input):
    z1 = np.tanh(np.dot(input,w1)+b1)
    z2 = np.tanh(np.dot(z1,w2)+b2)
    z3 = np.dot(z2,w3)+b3
    return keras_softmax(z3)


In [24]:
y_hat_manual = forward_pass(data_train)

In [25]:
## the resulting predictions will be the softmax activations for each output neuron for each training instance
print(y_hat_manual.shape)

(5, 3)


We can check that all predictions sum up to approximately 1 (hint: use `np.sum` with `axis`)



In [17]:
np.sum(y_hat_manual, axis=1)

array([1.        , 1.        , 1.        , 0.99999994, 1.        ],
      dtype=float32)


Congrats! you have made it through the manual construction of the forward pass. Now lets check your implementation by comparing it to a set of pre-determined weights.

### Load pre-trained model weights and test weights on evaluation file to check your implementation

Now we are going to:
* load pretrained weights for all parameters
* apply the weights to the evaluation data `data_eval`
* check that your manual softmax scores match the ones obtained by the pre-trained model `model` that we will load
* convert the output to labels and calculate the accuracy score

In [28]:
import pickle
with open("data/weights.pickle","rb") as f:
    weights = pickle.load(f)

Inspect the weights you just loaded. 

In [22]:
## what do the weights contain?
weights["W1"].shape


(4, 15)

Apply your manual implementation of the forward pass to the evaluation data by using the parameters (weights) you just loaded. This allows you to check if you get the same results back as the model implemented in Keras. 

In [26]:
data_eval = np.array([
                       [1,2,3,5],
                       [1,3,1,0],
                    ])

gold_labels_eval = ["c", "b"]
y_eval = np.array([[0,0,1],[0,1,0]])

In [29]:
from keras.models import load_model

model = load_model('data/model.h5') # load model parameters and model structure

# use the model for predicting on the data_eval

keras_out = model.evaluate(data_eval)

OSError: Unable to open file (unable to open file: name = 'data/model.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

Now use the weights stored in  `weights` to your manually defined forward pass above. Compare the result to the predictions of the loaded model above.

In [46]:
# load the weights and code up the forward pass manually. Compare to the predictions above.
scores = None

print(scores)
print("softmax:")
print(keras_softmax(scores))

[[-1.64962824  1.01406278 -0.06164092]
 [-0.35048741  0.80949804 -0.15451172]]
softmax:
[[0.04939969 0.70884377 0.24175662]
 [0.18496649 0.5900222  0.22501126]]


If the two softmax outputs match your implementation is correct. Congrats!

### Convert labels to 1-hot representation and evaluate the classifier

In this section, we are going to convert the softmax output into actual predicted labels. Then, we evaluate the labels on the output.

Many deep learning libraries require one-hot representations for labels, where each dimension corresponds to an output neuron.

For example, an instance labeled as 'c' is represent in the one-hot target vector as:


In [571]:
target_vector = [0,0,1] # target vector for class 'c'

classes

['a', 'b', 'c']

Let us convert the predicted softmax scores above to the actual predicted label and compare it to the gold standard labels.

In [572]:
# implement the conversion of softmax scores to actual labels (a list of labels like ["a","a"])



['c', 'b']


In [573]:
## calculate accuracy
# implement accuracy

predicted: ['c', 'b']
gold: ['c', 'b']
accuracy: 1.0


N.B. Keras has a useful [utility function](https://keras.io/utils/) `to_categorical` to convert categorical data into binary one-hot encodings for training the model. The input to the function has to be numeric, i.e., labels converted to indices (corresponding to the classes, e.g., `2` for `c`).

```
y_train_one_hot = utils.to_categorical(labels_train_num, num_classes=len(classes))
```

### Build the model yourself in Keras

Try to build the model that was used to generate the weights above in Keras. Play with the model, change its hyperparameters and observe what happens.

* Build the network yourself in Keras (Suggestion: use the functional API)
* You will need to convert the training data labels to the one-hot format
* Train the network with `SGD` as defined below for 5 epochs. 
* Modify the hyperparameters of the model (learning rate, number of neurons in the hidden layer, optimizer..) and observe the impact on performance.


In [576]:
from keras import optimizers
sgd = optimizers.SGD(lr=0.05)


In [493]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_58 (InputLayer)        (None, 4)                 0         
_________________________________________________________________
dense_172 (Dense)            (None, 15)                75        
_________________________________________________________________
dense_173 (Dense)            (None, 20)                320       
_________________________________________________________________
dense_174 (Dense)            (None, 3)                 63        
Total params: 458
Trainable params: 458
Non-trainable params: 0
_________________________________________________________________


### Part 2: Recap sentiment analysis exercise with sklearn

Solve the exercise.
