# Part b and c
This notebook contains a step-by-step walkthrough of task b) and c) using both code and text.

In [1]:
%load_ext autoreload
%autoreload 2
#Add own modules to path
import sys
sys.path.append('../..')
sys.path.append('../../src/')

### Loading the data and constructing the input matrix
The same data as from previous exercises is reused. However, as a neural network will be fitted in this case, the design matrix is not created in the same manner.

When a neural network trains on terrain data, it takes in the x and y coordinates and tries to predict the output z. As such, the design matrix should be of shape (n_samples,2). In other words, extract the data without creating a design matrix, or simply use the two columns from the original design matrix X[:,1:3].

Note that both the target data and design matrix is scaled follow the same arguments as in part a. Scaling the target data might be more beneficial with neural networks as it can help avoid exploding gradients.

In [1]:
from src.data.create_dataset import create_dataset
from sklearn.model_selection import  train_test_split
from src.visualization.visualize import plot_surf_from_X
from sklearn.preprocessing import StandardScaler

X, z = create_dataset('../../data/raw/SRTM_data_Norway_1.tif')
X_train, X_test, z_train, z_test = train_test_split(X,z, test_size=0.2)

X_scl = StandardScaler().fit(X_train)
z_scl = StandardScaler().fit(z_train)

X_train = X_scl.transform(X_train)
X_test = X_scl.transform(X_test)
z_train = z_scl.transform(z_train)
z_test = z_scl.transform(z_test)

plot_surf_from_X(X,z,'All data')
plot_surf_from_X(X_train,z_train,'Train data')
plot_surf_from_X(X_test,z_test,'Test data')

ModuleNotFoundError: No module named 'src'

## Regression using Neural Network
In this part a neural network is fitted to the terrain data. In other words, it is used as regression as opposed to classification which is perhaps the more familiar.

#### Neural network in short
 A neural network consists of layers of nodes fully connected to all nodes in the layers at either side. These weighted connections goes through an activation function causing the non-linearities in the model. A node's state is the weighted sum av all activations coming in to it. Additionally there is an extra bias added to each node as a sort of "intercept".

 These weights and biases is fitted to a data set, meaning they are tuned so that when predicting output, a cost function is minimized. SGD is used very similarly as with the linear regression case, where each parameter is subtracted the gradient. In the case of neural network the gradient needs to be calculated for every connection, meaning a lot more gradients than with linear regression.

 Prediction is perform using the algorithm feedforward, which feeds the input value forward through the connections and activation functions. The update of the parameters is then done using the backpropagation algorithm, which starts by calculating the error and gradient of the last layer, and uses that result to calculate the gradient and error in the previous layer, and so forth.

 As with linear regression, momentum, regularization and different learning rate schemes may be employed.

In [None]:
from src.modelling.nn import NeuralNetwork
from src.modelling.linreg import LinReg
from src.model_evaluation.metrics import MSE_R2
import numpy as np

#Model parameters
batch_size = 64
n_epochs = 1000
lr0 = 0.01
loss_function = 'squared_loss'
hidden_layers = (50,)
hidden_activation = 'sigmoid'
output_activation = 'linear'
lmb = 0.01
lr0 = 0.01

#Fitting neural network
nn = NeuralNetwork(batch_size = batch_size,
                   n_epochs = n_epochs,
                   hidden_layers = hidden_layers, 
                   w_init = 'normal',
                   loss_func = loss_function,
                   val_fraction=0,
                   hidden_activation=hidden_activation,
                   output_activation=output_activation,
                   lmb = lmb,
                   lr0= lr0)

nn.fit(X_train,z_train)
tilde_nn = nn.predict(X_train)
pred_nn = nn.predict(X_test)

#Fitting ols for comparison
ols =LinReg(regularization = None).fit(X_train,z_train)
tilde_ols = ols.predict(X_train)
pred_ols = ols.predict(X_test)

train_mse_nn, train_r2_nn = MSE_R2(z_train,tilde_nn)
test_mse_nn, test_r2_nn = MSE_R2(z_test ,pred_nn)
train_mse_ols, train_r2_ols = MSE_R2(z_train,tilde_ols)
test_mse_ols, test_r2_ols = MSE_R2(z_test ,pred_ols)

#Printing the scores
print('NN train[MSE,R2]:',(train_mse_nn, train_r2_nn), '\nNN test[MSE,R2]:',(test_mse_nn, test_r2_nn))
print('OLS train[MSE,R2]:',(train_mse_ols, train_r2_ols), '\nOLS test[MSE,R2]:',(test_mse_ols, test_r2_ols))

#Concatenating X_train and X_test so that we 
#can plot the whole predicted surface
X_ = np.concatenate((X_train,X_test))

plot_surf_from_X(X_,np.concatenate((tilde_ols,pred_ols)),'OLS')
plot_surf_from_X(X_,np.concatenate((tilde_nn,pred_nn)),'Neural Network')

Hope result is bad due to w_init

In [None]:
from sklearn.neural_network import MLPRegressor
#Model parameters
batch_size = 64
n_epochs = 1000
lr0 = 0.01
loss_function = 'squared_loss'
hidden_layers = (50,)
lmb = 0.01
lr0 = 0.01

#Setting new parameters
nn.set_params(batch_size = batch_size,
              n_epochs = n_epochs,
              hidden_layers = hidden_layers, 
              w_init = 'glorot',
              loss_func = loss_function,
              val_fraction=0,
              hidden_activation=hidden_activation,
              output_activation=output_activation,
              lmb = lmb,
              lr0= lr0)
#Fitting network
nn.fit(X_train,z_train)

nn_sk = MLPRegressor(batch_size = batch_size,
                     max_iter = n_epochs,
                     hidden_layer_sizes = hidden_layers, 
                     activation='logistic',
                     solver = 'sgd',
                     batch_size=batch_size,
                     learning_rate_init=lr0,
                     tol = 0,
                     momentum = 0.5,
                     validation_fraction = 0)
nn_sk.fit(X_train,z_train)

#Predicting
tilde_nn = nn.predict(X_train)
pred_nn = nn.predict(X_test)
tilde_sk = nn_sk.predict(X_train)
pred_sk = nn_sk.predict(X_test)
#Calcluating MSE and R2
train_mse_nn, train_r2_nn = MSE_R2(z_train,tilde_nn)
test_mse_nn, test_r2_nn = MSE_R2(z_test ,pred_nn)
train_mse_sk, train_r2_sk = MSE_R2(z_train,tilde_sk)
test_mse_sk, test_r2_sk = MSE_R2(z_test ,pred_sk)

#Printing the scores
print('NN train[MSE,R2]:',(train_mse_nn, train_r2_nn), '\nNN test[MSE,R2]:',(test_mse_nn, test_r2_nn))
print('NN train[MSE,R2]:',(train_mse_sk, train_r2_sk), '\nNN test[MSE,R2]:',(test_mse_sk, test_r2_sk))
print('OLS train[MSE,R2]:',(train_mse_ols, train_r2_ols), '\nOLS test[MSE,R2]:',(test_mse_ols, test_r2_ols))
#Plotting the predicted surfaces
plot_surf_from_X(X_,np.concatenate((tilde_ols,pred_ols)),'OLS')
plot_surf_from_X(X_,np.concatenate((tilde_nn,pred_nn)),'Neural Network')
plot_surf_from_X(X_,np.concatenate((tilde_sk,pred_sk)),'Sklearn MLPRegressor')

Much better with glorot because and probably equal to sklearn

### Tuning parameters
#### Gridsearching learning rate and lambda
Similarily as with SGD regression performing a grid search on the lr and lmb is a common first step.

In [None]:
from src.model_evaluation.param_analysis import grid_search_df
