# Basics of Neural networks and other stuff
```In this exercise you will experience working with keras, a useful tool for designing and training neural networks.```

```~Ittai Haran```

## Stage 1- predicting engineered functions
```Here you will design a simple fully connected network to predict a few simple functions. You will try different activation functions and different architectures (number of layers, size of layers).```

```The first experiment will be guided:```
- ```Read the first dataframe - function_1.csv.``` (in: https://drive.google.com/open?id=19Y9f2pkUwP7nrgbKCmCGHEomHmA0pwqx)
- ```Plot y against x. can you guess the function y(x)?```
- ```Split you data to train segment (70%) and test segment (30%).```
- ```Write a fully connected neural network with one hidden layer with 3 units. I suggest using the``` [functional API](https://keras.io/getting-started/functional-api-guide/) - ```you can also find there examples for simple working with keras.```
- ```Use tanh as an activation function for the hidden layer and a linear activation for the output layer.```
- ```Use model.summary() to look at your model's architecture.```
- ```Use mean squared error as the loss function and SGD (stochastic gradient descent) as the optimizer.```
- ```Try training the network with different batch's sizes (don't be afraid to use many epochs- you don't have a lot of data).```
- ```Plot y against x and f(x) against x on the same graph.```
- ```Compute the loss on the test segment.```
- ```Can you use a smaller hidden layer?```

In [0]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from keras.layers import Dense, Input
from keras.models import Model
import matplotlib.pyplot as plt
%matplotlib inline

```For the second experiment use the dataframe function_2.csv. Here you might need to have more layers. You might want to consider different``` [optimizers](https://keras.io/optimizers/), ```rather than SGD (for example, Adam).```

(```function_2.csv can be found in:``` https://drive.google.com/open?id=1Fk32cP-4DSZ6v175UDVz_OChHE-dFLzq)

```For the third experiment use the dataframe function_3.csv. Try different activation functions. How many layers you had to take?```

(```function_3.csv can be found in:``` https://drive.google.com/open?id=1-XTfy6Wf0_WhEKJ9AnUGENBZRSNXkwNe)

```As you maybe could've guessed, the function you fitted is``` $y = sin(x)$. ```Hence one might think it would be easy to fit a network with only one neuron, with activation of the sinus function. Try it: define a neural network with a single input and a single output, with no hidden layers, and use a sinus activation, to approximate function_3. Next we will understand why using sinus as an activation function is not a good idea.```

```You can use sin_activation provided below as the sinus activation: instead of writing activation='tanh' or activation='sigmoid', use activation=sin_activation.```

In [0]:
import keras.backend as K
def sin_activation(vec):
    return K.sin(vec)

```Let's explore the sinus activation. For this we will simplify the settings. We will generate 3 samples, with ```
$x = [0, 0.5, 1]$ ``` and ``` $y=[sin(0), sin(0.5), sin(1)]$. ```We will keep using the simplest neural network, that has one input and one output, with no hidden layers. Hence the possible functions that the network can give us are functions of the form```
<center>$y=sin(a\cdot x+b)$.<center>

```We can use the simple setting to compute and visualize the loss surface. As you recall, the loss is given by```
<center>$\sum_i{(sin(a\cdot x_i+b)-y_i)^2}$<center>

```Create a matrix so that in the (i,j) entry there will be the loss for ``` $a = \frac{2\pi}{600}\cdot i$ and $b= \frac{2\pi}{600}\cdot j$. ```Visualize this matrix using plt.imshow.```

```Do the same for the beloved activation, tanh. Based on the images, what makes sinus a poor activation function and tanh a good activation function? answer in ``` $\underline{a\ cell\ below}$.

### Bonus:
```When training neural networks using keras we can use callback functions: There are functions that are called automatically during the training and can be of several usages. For example, we can use a callback function to save our model to a file upon the ending of an epoch.
Below you will find a custom callback that keeps the weights for a and b in your sinus model. Use it like this:```

``` python
wh = WeightsHistory()
model.fit(X_train, Y_train, batch_size = 100, epochs = 10, verbose=2, callbacks=[wh])
```

```After you try training your model using the sinus activation you can plot the path your model took on the a-b plane. Add to your loss surface visualization a plot showing the a-b values during training.```

```Note: after using plt.imshow you got an image of the size``` $600\times 600$. ```You will have to scale your a-b path accordingly to see it on the same graph. Try zooming (by changing the scale of the image, enlarging the size of your image or using any other mean) to get a better view of the a-b path. Can you explain why didn't you get a convergence?```

In [0]:
from keras.callbacks import Callback
class WeightsHistory(Callback):
    def on_train_begin(self, logs={}):
        self.a_s = []
        self.b_s = []

    def on_batch_end(self, batch, logs={}):
        weights = self.model.get_weights()
        self.a_s.append(weights[0][0][0])
        self.b_s.append(weights[1][0])

## Stage 2- predict digits from MNIST
```Use your knowledge to create a good prediction for the MNIST dataset. Note that this a classification problem, and you will have to use a different loss: for example, the binary cross entropy (log loss). Furthermore, you might want to use softmax to generate predictions. You can use activation='softmax' in the last layer of your network.
good luck!```

In [0]:
from keras.datasets import mnist
(X_train_images, Y_train_num), (X_test_images, Y_test_num) = mnist.load_data()
plt.imshow(X_train_images[0])
plt.show()

X_train = X_train_images.reshape(-1,784)
X_test = X_test_images.reshape(-1,784)
Y_train = pd.get_dummies(Y_train_num)
Y_test = pd.get_dummies(Y_test_num)

## Stage 3 - exploring deep neural networks
```In this part we will explore the problems with training deep networks.```

```We will work with the simplest data possible and will try to approximate the identity function, with one feature.
Generate 10,000 samples using np.random.random and have ``` $y=x$. ```Create a neural network with a single hidden layer with 3 neurons in it and sigmoid activation. Approximate the identity function.```

```Add 50 hidden layers to your network, again, with 3 neurons in each layer and sigmoid activatoin. Try approximating the identity function using this network.```

```Why isn't it working? Explain it in ``` $\underline{a\ cell\ below}$. ```In your explanation regard the process of back propagation and the formula of gradients found in the first layers of the network. Regard also the possible values of the derivative of the sigmoid function.```

```Run the cell below. In weight_grads you will have a list of the gradients computed in every layer. For each layer take the maximum absolute value of the gradients. Plot a graph of the maximum values against the number of layer. Also plot it in logarithmic scale. Use it to justify your answer from before in ``` $\underline{a\ cell\ below}$.

In [0]:
import numpy as np
from keras import backend as K

def get_weight_grad(model, inputs, outputs):
    """ Gets gradient of model for given inputs and outputs for all weights"""
    grads = model.optimizer.get_gradients(model.total_loss, model.trainable_weights)
    symb_inputs = (model._feed_inputs + model._feed_targets + model._feed_sample_weights)
    f = K.function(symb_inputs, grads)
    x, y, sample_weight = model._standardize_user_data(inputs, outputs)
    output_grad = f(x + y + sample_weight)
    return output_grad

weight_grads = get_weight_grad(model, X_train, Y_train)