# Assignment 8: Neural Networks

Only use the already imported library `numpy` and the Python standard library. For the evaluation you may also use scikit-learn (`sklearn`) and `matplotlib`. Make sure that the dataset `airfoil_self_noise.csv` is in the same directory as the notebook.

List your team members (name and immatriculation number) and indicate whether you are a B.Sc. Data Science or other group in the following cell:

- Kuang-Yu Li, st169971@stud.uni-stuttgart.de, 3440829 
- Ya-Jen Hsu, st169013@stud.uni-stuttgart.de, 3449448 
- Gabriella Ilena, st169935@stud.uni-stuttgart.de, 3440942

In [32]:
import numpy as np

def load_dataset(path):
    from sklearn.model_selection import train_test_split
    
    data = np.genfromtxt(path)
    X, y = data[:, :5], data[:, 5]
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=2020)

    return X_train, X_test, y_train, y_test
    

X_train, X_test, y_train, y_test = load_dataset('airfoil_self_noise.csv')

## Task 3: Feedforward Neural Network: Programming

In this task, you will implement a feedforward neural network for regression. The hyperparameters of the model are:
- `input_dim`: The dimension of the input vector.
- `output_dim`: The dimension of the output vector.
- `width`: The dimension of each hidden layer.
- `depth`: The number of hidden layers. For B.Sc. Data Science students, this parameter is constant with a value of 1.
- `learning_rate`: The learning rate for gradient descent.
- `epochs`: The number of epochs/iterations performed during training.

B.Sc. Data Science only have to implement for a single hidden layer, i.e. `depth = 1`. All other students have to implement the network for any `depth >= 1`.

The activation function for each hidden layer is ReLU (g(x) = max(0, x)). The output layer uses the identity as activation, since our objective is regression.

You have to implement the `FeedforwardNeuralNetworkRegressor`.

The `__init__` method initializes the network.
Initialize each weight and bias randomly with a standard Gaussian distribution using the numpy function `numpy.random.normal` with default parameters.

The `fit` method trains the network.
Use backpropagation with gradient descent similar to Task 2.
Use the whole training data set for each training epoch.
Use the mean squared error as loss function.

The `predict` method computes the forward-pass of the network.

Evaluate your classifier on the test data with the mean squared error and compare your results to your linear regression model from assignment 3. Try out different hyper-parameters and compare the results. You may want to normalize your input and output data for better performance.

In [33]:
class FeedforwardNeuralNetworkClassifier(object):
    def __init__(self, input_dim, output_dim, width, depth, learning_rate, epochs):
        # Add your code, such as initialization of weights here.
        self.input_dim = input_dim
        self.output_dim = output_dim
        self.width = width
        self.depth = depth
        self.learning_rate = learning_rate
        self.epochs = epochs
        
        # Initialize the weights and biases for each layer randomly based on the normal distribution
        # General rule for the dimensions:
        # Weights have shape (n, m), where n is the number of output neurons and m is the number of input neurons
        # Biases have shape (n, 1)
        np.random.seed(5) # Seed to get reproducable results
        self.w = list()
        self.b = list()

        self.w.append(np.random.normal(size=(input_dim, width))) # Input layer to 1st hidden layer
        self.b.append(np.random.normal(size=(1, width)))
        
        for i in range(depth-1): # Between hidden layers
          self.w.append(np.random.normal(size=(width, width)))
          self.b.append(np.random.normal(size=(1, width)))
        
        self.w.append(np.random.normal(size=(width, output_dim))) # To the output layer
        self.b.append(np.random.normal(size=(1, output_dim)))
        
        self.params = {'w': self.w, 'b': self.b}

    def relu(self, X):
        return np.maximum(0, X)

    def relu_derivative(self, X):
        X[X<=0] = 0.
        X[X>0] = 1.
        return X

    def fit(self, X, y):
        # Implement your training here

        for i in range(self.epochs):
          # Forward propagation
          # Output = activation_function(X.W + b)
          a = dict()
          
          for j in range(self.depth):
            if j==0:
              a[j] = self.relu(np.dot(X, self.params['w'][j]) + self.params['b'][j])  # Output of 1st hidden layer.
            else:
              a[j] = self.relu(np.dot(a[j-1], self.params['w'][j]) + self.params['b'][j])  # Output of 2nd until the last hidden layer
            
          a[self.depth] = np.dot(a[self.depth-1], self.params['w'][self.depth]) + self.params['b'][self.depth]  # Output of the last layer

          # Back-propagation
          # Calculate loss
          y.shape = (-1, self.output_dim)
          loss = np.mean(np.square(a[self.depth] - y)) # Total MSE
          assert a[self.depth].shape == y.shape, "Vectors need to have the same shape"
          
          m = y.shape[0] # Number of samples
          lambd = 0.7 # Regularization term
          loss_grad = dict()
          param_grads = dict()

          # From the output layer
          loss_grad[self.depth] = a[self.depth] - y
          param_grads[self.depth] = (1.0/m) * np.dot(a[self.depth-1].T, loss_grad[self.depth]) + (lambd/m)*self.params['w'][self.depth]
          self.params['w'][self.depth] = self.params['w'][self.depth] - self.learning_rate*(param_grads[self.depth]) # Update output weights
          self.params['b'][self.depth] = self.params['b'][self.depth] - self.learning_rate*((1.0/m)*loss_grad[self.depth].sum(axis=0)) # Update output bias 

          # Now, starting from the last hidden layer, calculate the loss gradients and the gradients w.r.t parameters, and update params accordingly
          for k in range(self.depth-1, -1, -1):
            loss_grad[k] = np.multiply(self.relu_derivative(a[k]), loss_grad[k+1])
            if k != 0:
              param_grads[k] = (1.0/m)*np.dot(a[k-1].T, loss_grad[k]) + (lambd/m)*self.params['w'][k]  
            else:
              param_grads[k] = (1.0/m)*np.dot(X.T, loss_grad[k]) + (lambd/m)*self.params['w'][k]
            self.params['w'][k] = self.params['w'][k] - self.learning_rate*(param_grads[k]) # Update output weights
            self.params['b'][k] = self.params['b'][k] - self.learning_rate*(loss_grad[k].sum(axis=0)) # Update output bias
          
          print("Epoch:", i)
          print("Train loss:", loss)

    def predict(self, X):
        output = dict()

        # Compute the forward-pass
        for i in range(self.depth):
            if i==0:
              output[i] = self.relu(np.dot(X, self.params['w'][i]) + self.params['b'][i])  # Output of 1st hidden layer.
            else:
              output[i] = self.relu(np.dot(output[i-1], self.params['w'][i]) + self.params['b'][i])  # Output of 2nd until the last hidden layer    
        output[self.depth] = np.dot(output[self.depth-1], self.params['w'][self.depth]) + self.params['b'][self.depth]  # Output of the last layer
        return output[self.depth]

In [36]:
# Implement your training and evaluation here.
import sklearn.metrics as sk_metric
from sklearn.preprocessing import MinMaxScaler

# Data normalization
scaler = MinMaxScaler()
y_train.shape = (-1,1)
X_train_norm = scaler.fit_transform(X_train)
y_train_norm = scaler.fit_transform(y_train)

# Training
print("Start training...")
myFNNC = FeedforwardNeuralNetworkClassifier(input_dim=X_train_norm.shape[1], output_dim=1, width=10, depth=2, learning_rate=0.001, epochs=3)
myFNNC.fit(X_train_norm, y_train_norm)
print("End training...")
y_hat = myFNNC.predict(X_test)
mse = sk_metric.mean_squared_error(y_hat, y_test)
print("Test loss:", mse)

Start training...
Epoch: 0
Train loss: 288.1242977132619
Epoch: 1
Train loss: 13583.819796220845
Epoch: 2
Train loss: 4447514396.799527
End training...
Test loss: 344464132.38457805
