# Week 3 - A Feedforward Neural Network


In this exercise, you will implement the forward step of a FFNN from scratch and compare your solution to Keras on a small toy example to predict the nationality for a given surname. 

It is very important that you understand the basic building blocks (input/output: how to encode your instances, the labels; the model: what the neural network consists of, how to learn its weights, how to do a forward pass for prediction). 

### Feedforward Neural Networks (FNNs) or MLPs

Feedforward Neural Networks (FNNs) are also called Multilayer Perceptrons (MLPs). These are the most basic types of neural networks. They are called this way as the information is flowing from the input nodes through the network up to the output nodes. 

It is essential to understand that a neural network is a non-linear classification model which is based upon function application. Each layer in a neural network is an application of a function.

Summary (by J.Frellsen):
<img src="pics/fnn_jf.png">

In [None]:
## helper functions

## use this softmax
from keras import backend as K
from keras  import activations

def keras_softmax(scores):
    ## softmax calculation
    var = K.variable(value=scores)
    act_tf = activations.softmax(var) # returns Tensor
    softmax_scores = K.eval(act_tf) # return numpy array
    return softmax_scores


# Surname Nationality Prediction with a FNN

What you will learn:

* how to encode the data as character-based features and feed this n-hot representation as input to a FNN
* how to define the model (FNN) by reading off its structure from a graphical illustration of the network 
* compute the forward pass manually by loading existing weights for the model; to know whether your implementation is correct, you will compare the computed prediction scores to a model implemented in Keras



##  <font color='blue'>Task 1</font>: Representing the data

We are assuming multi-class classification tasks. The labels are $$ y \in \{da,no,se\}$$

The data comes from the national statistics banks. We here consider a very small dataset. 



In [None]:
## training data
import numpy as np

data = [('da','Nielsen'), # Statistik Danmark: https://www.dst.dk/en/Statistik/emner/befolkning-og-valg/navne/navne-i-hele-befolkningen
        ('da','Jensen'),
        ('da','Hansen'),
        ('da','Pedersen'),
        ('da','Andersen'),
        ('se','Andersson'), # Statistics Sweden: https://www.scb.se/en/finding-statistics/statistics-by-subject-area/population/general-statistics/name-statistics/pong/tables-and-graphs/all-registered-persons-in-sweden---last-names-top-100-list/last-names-top-100/
        ('se','Johansson'),
        ('se','Karlsson'),
        ('se','Nilsson'),
        ('se','Eriksson'),
        ('no','Hansen'), # Statistics Norway: https://www.ssb.no/en/befolkning/statistikker/navn/aar/2015-01-27?fane=tabell&sort=nummer&tabell=216083
        ('no','Johansen'),
        ('no','Olsen'),
        ('no','Larsen'),
        ('no','Andersen'),
       ]

* **task**: Convert the data into n-hot format, where each feature represents whether a single character is present or not.  Similarly, convert the labels into numeric format. For simplicity, you can assume a closed vocabulary (only the letters that you see above, no unknown-word handling). Keep original casing.
  * What is the vocabulary size?

In [None]:
from collections import defaultdict

## character to index, label to index
char2idx = defaultdict(int)
label2idx = defaultdict(int)

## Your code here
# Hint: fill in this structure data_train = np.zeros((len(data),vocab_size))

##  <font color='blue'>Task 1</font>: Forward pass (from scratch)

You are going to implement the forward step manually on a small dataset. 

Implement the forward pass for the feedforward neural network illustrated in the figure by solely using `numpy`.

* How many neurons do hidden layer 1 and hidden layer 2 have? Note: the bias node is not shown in the figure, consider them as separate neurons.
* How many neurons does the output layer have? And the input layer? (Note: the figure shows only 4 input nodes, in this example your input size is defined above - what is the input layer size?)
* Assume there is a `tanh` activation function between the layers. (hint: you can use `np.tanh`)
* Which activation function is on the output layer, given the labels above?
* Hint: use `.shape` to check the dimensions of your inputs

<img src="pics/nn.svg">


Specify the size of layers of the feedforward neural network:

In [None]:
## helper functions to determine the input and output dimensions of each layer
# input_dim = ..
#hidden_dim1 = ..
# etc

Define the shape of the parameters to be learned for this network using numpy arrays. For now, simply initialize them arbitrarily. You can use ones or random numbers, e.g., `np.ones((3,4))` defines a matrix of ones of size `3x4`, similarly, [np.random.randn](https://www.numpy.org/devdocs/reference/generated/numpy.random.randn.html) `np.random.randn(3,4)` initializes a matrix of the same size with random sample from the standard normal distribution.

* What are all the parameters of this neural network and what is their shape?


In [None]:
## define all parameters of this NN (W_1, ... bias1)


Now that we have defined the shape of all parameters, we are ready to "connect the dots" and build the network. 

It is instructive to break the computation of each layer down into two steps: the scores $a1$ are obtained by the linear function followed by the activation applications $\sigma$ to obtain the representation $z1$, as in:

$$ a1 = xW_1 + b_1$$
$$ z1 = \sigma(a1)$$

Specify the entire network up to the output layer $z3$, and **up to and exclusive** the final application of the softmax, the last activation function, which is provided.

The exact implementation of the softmax might differ from toolkit to toolkit (due to variations in implementation details in order to obtain numerical stability). Therefore, we will use the Keras backend function for the softmax calculation which accesses the tensorflow `Tensor` object. This makes sure that the manual calculations of the forward pass due not differ from the Keras-based implementation just because of the difference in the softmax calculation.

In [None]:

## implement the forward pass (up to and exclusive the softmax) 
## apply it to the training data `data_train` - use vectorization

#final_scores = None
#y_hat_manual = keras_softmax(final_scores)




In [None]:
## the resulting predictions will be the softmax activations for each output neuron for each training instance
print(y_hat_manual.shape)

In [None]:
print(y_hat_manual)

We can check that all predictions sum up to approximately 1 (hint: use `np.sum` with `axis`)



In [None]:
np.sum(y_hat_manual, axis=1)


Congrats! you have made it through the manual construction of the forward pass. Now lets check your implementation by comparing it to a set of pre-determined weights.

##  <font color='blue'>Task 2</font>: Where do the weights come from?  Loading existing weights

So far, the model that you used randomly initialized weights. In this step we will load pre-trained model weights and do the forward pass with those weights, in order to check your implementation against model predictions computed by the toolkit.

Now we are going to:
* load pretrained weights for all parameters
* apply the weights to the evaluation data `data_eval`
* check that your manual softmax scores match the ones obtained by the pre-trained model `model` that we will load
* convert the output to labels and calculate the accuracy score

In [None]:
import pickle
with open("data/weights.pickle","rb") as f:
    weights = pickle.load(f)

Inspect the weights you just loaded. 

In [None]:
## inspect the weights to get familiar with the structure of the loaded model



Apply your manual implementation of the forward pass to the evaluation data by using the parameters (weights) you just loaded. This allows you to check if you get the same results back as the model implemented in Keras. 

* **Task**: Convert the following test data into the input format for the neural network above. 

In [None]:
## create data_eval matrix


In [None]:
## This is the build-in forward pass and what we would like to get as well

from keras.models import load_model

model = load_model('data/model.h5') # load model parameters and model structure

# use the model for predicting on the data_eval
predictions = model.predict(data_eval)
print(predictions)


* **Task**: Now use the weights stored in  `weights` to your manually defined forward pass above. Compare the result to the predictions of the loaded model above.

In [None]:
# load the weights and code up the forward pass manually. Compare to the predictions above.



If the two softmax outputs match, your implementation is correct. Congrats!

* **Task**: Finally, get the labels back. Convert the output from the softmax into predicted labels. What do you get?

In [None]:
# your code here

(optional) In reality, you will train the model on data to estimate its weight. If you like, train the model on the data above with SGD and 5 epochs. 