### Part b): Writing your own Neural Network code

Your aim now, and this is the central part of this project, is to
write your own FFNN code implementing the back
propagation algorithm discussed in the lecture slides from week 41 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html> and week 42 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html>.

We will focus on a regression problem first, using the one-dimensional Runge function

$$
f(x) = \frac{1}{1+25x^2},
$$

from project 1.

Use only the mean-squared error as cost function (no regularization terms) and 
write an FFNN code for a regression problem with a flexible number of hidden
layers and nodes using only the Sigmoid function as activation function for
the hidden layers. Initialize the weights using a normal
distribution. How would you initialize the biases? And which
activation function would you select for the final output layer?
And how would you set up your design/feature matrix? Hint: does it have to represent a polynomial approximation as you did in project 1? 

Train your network and compare the results with those from your OLS
regression code from project 1 using the one-dimensional Runge
function.  When comparing your neural network code with the OLS
results from project 1, use the same data sets which gave you the best
MSE score. Moreover, use the polynomial order from project 1 that gave you the
best result.  Compare these results with your neural network with one
and two hidden layers using $50$ and $100$ hidden nodes, respectively.

Comment your results and give a critical discussion of the results
obtained with the OLS code from project 1 and your own neural network
code.  Make an analysis of the learning rates employed to find the
optimal MSE score. Test both stochastic gradient descent
with RMSprop and ADAM and plain gradient descent with different
learning rates.

You should, as you did in project 1, scale your data.

## Imports

In [1]:
import numpy as np
import random
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# --- Our own code ---
from neural_network import NeuralNetwork
from optimizers import SGD, RMSprop, Adam
from losses import mse, mse_deriv
from activations import sigmoid, sigmoid_deriv, linear, linear_deriv, relu, relu_deriv
from prepare_data import prepare_data

## Prepare data

In [None]:
seed = 6114
n_datapoints = 100
x, y, x_train, x_test, y_train, y_test = prepare_data(n=n_datapoints)

x_train = np.transpose(x_train)

scaler_x = StandardScaler()
x_test = scaler_x.transform(x_test)

ValueError: Expected 2D array, got 1D array instead:
array=[ 0.63636364 -0.61616162  0.93939394  0.27272727 -0.8989899  -0.47474747
  0.85858586 -0.83838384  0.25252525 -0.7979798  -0.77777778  0.09090909
 -0.93939394  0.6969697   0.39393939  0.49494949  0.07070707  0.91919192
 -0.29292929 -0.91919192  0.21212121 -0.13131313  0.17171717  0.19191919
  0.43434343 -0.45454545  0.87878788 -0.27272727 -0.09090909  0.45454545
 -0.97979798 -0.57575758  0.47474747  0.35353535 -0.35353535  0.65656566
 -0.11111111 -0.21212121  0.01010101 -0.87878788 -0.73737374  0.15151515
  0.29292929  0.5959596  -0.43434343 -0.01010101 -0.5959596  -0.05050505
 -0.75757576  0.05050505  0.41414141  0.33333333  0.67676768  0.37373737
  0.13131313  0.95959596  0.03030303  0.77777778 -0.6969697   0.71717172
 -0.33333333  0.53535354  0.73737374 -0.07070707 -0.39393939 -0.23232323
  0.61616162  0.31313131 -0.19191919  0.23232323 -0.49494949  0.55555556
  0.81818182  0.7979798  -0.85858586  0.83838384 -0.17171717  0.51515152
  0.75757576  0.57575758].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

## Building the Neural Network

In [None]:
layer_output_sizes = [5,1]
activation_funcs = [sigmoid, linear]
activation_ders = [sigmoid_deriv, linear_deriv]
cost_func = mse
cost_func_der = mse_deriv

nn = NeuralNetwork(network_input_size=1, 
                   layer_output_sizes=layer_output_sizes, 
                   activation_funcs=activation_funcs,
                   activation_ders=activation_ders,
                   cost_fun=cost_func,
                   cost_der=cost_func_der)


## Training the network

In [None]:
lr = 1e-2
optimizer = Adam(lr=lr)
history = nn.fit(x_train, 
                 y_train,
                 epochs=50,
                 batch_size=10,
                 optimizer=optimizer,
                 X_val = x_test,
                 Y_val = y_test,
                 log_every=10)