# **Preface**

Before you start working on the tutorial, please ensure you complete the following preparatory steps:

- Install Python version 3.11.7. You can download it from the official Python website: https://www.python.org/downloads/release/python-3117/.

- Obtain the `Venv Setup Tutorial.pdf` and `requirements.txt` files, also available in the same section: BrightSpace -> Content -> Week 1 -> Extra materials.

- Follow the instructions detailed in the `Venv Setup Tutorial.pdf` to set up your virtual environment.

These steps are crucial for ensuring that your working environment is correctly configured for the tutorial.

# Tutorial 3

In the second lecture you were introduced to the applications of NNs for regression and classification purposes. This tutorial has been split into two sections. The first one focusses on applications of neural networks for regression purposes, the assessment of the goodness-of-fit of the model being developed, and its extrapolating capabilities. In the second part, you will work on a QSPR analysis by using neural networks.

This tutorial covers the following topics:

* 1D regression
* Assessment of the extrapolation capabilities of the regression
* QSPR method using NNs

## Regression with artificial neural networks in PyTorch

An artificial neural network (ANN) is an universal function approximator. Contrary to curve-fitting regression, ANNs have the advantage that no a priori knowledge of the process is needed. Here we show an example of a simple ANN to approximate its target distribution. For the demonstration we use the programming language Python since it is the most popular machine learning language and offers a large and active developer community.

Here we use [PyTorch](https://pytorch.org/) to work with neural networks. Pytorch is an open source machine learning framework with many predefined functions which make the work with machine learning way easier. You can learn pytorch from the [tutorial link](https://pytorch.org/tutorials/). The Documentation can be found at [Docs](https://pytorch.org/docs/stable/index.html). A popular alternative to Pytorch is Keras (building on Tensorflow). 

## 1.1 - Python preparation

### Import packages

In the next step, we import the previously installed packages into our script.

In [None]:
import numpy as np  # numerical calculations in python
import matplotlib.pyplot as plt  # plotting similar to matlab
import torch  # PyTorch: the general machine learning framework in Python
import torch.optim as optim  # contains optimizers for the backpropagation
import torch.nn as nn  # the artificial neural network module in PyTorch
from tqdm import tqdm  # produces progress bars for for-loops in Python
from sklearn.model_selection import train_test_split  # randomly splits a dataset

### Set the seed

To make our results reproducible, we need to set a so-called "seed". Machine Learning includes stochastic processes in the weight/bias initialization and the backpropagation. Also the random number generation which we will use for the dataset is a stochastic process. By setting a seed in the program we make sure that always the same random numbers are chosen. Otherwise, we would get different results everytime we run this script, which is not nice for teaching purposes.

In [None]:
torch.manual_seed(0)
np.random.seed(0)

## 1.2 - Prepare datasets

First and foremost, we need a dataset to work on. Here, we simply make up our dataset from a self defined model.


Now we define the model which we want to approximate. We add the option to add some noise to the output.

In [None]:
# Defining the model we want to approximate
def model(x, noise=False):
    y = np.sin(x)+np.sin(10/3*x) # model function
    if noise:
        y += 0.3*(np.random.uniform(-1, 1, x.size)) # add noise if noise = True
    return y # return output

We now use our model to generate a dataset

In [None]:
n = 100 # number of data points
xmin = 2 # minimum value
xmax = 6 # maximum value
x = np.linspace(xmin, xmax, n) # generate equally spaced input values
y = model(x, noise=True) # get output from our model with noise

For a better understanding of the data set we plot it with matplotlib.

In [None]:
plt.figure(figsize=(10,4)) # define figure size
plt.plot(x, model(x), color='r', linestyle='--', label='ground truth')
plt.scatter(x, y, alpha=0.5, color='b', label='complete data') # scattered plot
plt.title('Initial dataset') # add plot title
plt.xlabel('x') # add x axis label
plt.ylabel('y') # add y axis label
plt.legend() # add plot legend
plt.show()

We need to split the dataset into training, validation and test sets. We initially divide the data into a training set and a remaining dataset, with train_size=0.8 to ensure that 80% of the data is in the training set. In order to ensure that both the validation and test sets are of equal size, constituting 10% each of the overall data, we specify test_size=0.5 which is equivalent to 50% of the remaining data.

In [None]:
# Placeholder variables for the training, validation and test sets
x_train, x_val, x_test = None, None, None
y_train, y_val, y_test = None, None, None

###################################################################################################
# TODO: Split the data into training, validation and test sets.                                   #
# Helpful Link:                                                                                   #
# https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html #
# 1. Split the data into training set and remaining data, set the random_state to 2024.           #
# 2. Split the remaining data into validation and test set.                                       #
###################################################################################################

# Replace the pass statement with your code
pass

###################################################################################################
#                             END OF YOUR CODE                                                    #
###################################################################################################

We also plot the dataset division

In [None]:
plt.figure(figsize=(10,4)) # define figure size
plt.plot(x, model(x), color='r', linestyle='--', label='ground truth') # line plot
plt.scatter(x_train, y_train, alpha=0.5, label='training data') # scattered plot
plt.scatter(x_val, y_val, alpha=0.5, label='validation data') # scattered plot
plt.scatter(x_test, y_test, alpha=0.5, label='test data') # scattered plot
plt.title('Dataset division') # add plot title
plt.xlabel('x') # add x axis label
plt.ylabel('y') # add y axis label
plt.legend() # add plot legend
plt.show()

So far our datasets are stored in numpy arrays. However, PyTorch works with tensors instead of arrays and we need to transform our data. [Here](https://medium.com/@quantumsteinke/whats-the-difference-between-a-matrix-and-a-tensor-4505fbdc576c) you can find a blog post discussing the differences. We need to do two small technical changes as well. Numpy arrays usually use the double/float64 datatype whereas PyTorch uses float/float32. Therefore, we change the datatype *dtype*. In addition, we have to change the shape of the tensor from (#data) to (#data,1) using *unsqueeze*.

In [None]:
xt_train = torch.tensor(x_train, dtype=torch.float).unsqueeze(-1)
xt_val = torch.tensor(x_val, dtype=torch.float).unsqueeze(-1)
xt_test = torch.tensor(x_test, dtype=torch.float).unsqueeze(-1)
yt_train = torch.tensor(y_train, dtype=torch.float).unsqueeze(-1)
yt_val = torch.tensor(y_val, dtype=torch.float).unsqueeze(-1)
yt_test = torch.tensor(y_test, dtype=torch.float).unsqueeze(-1)

## 1.3 - Set up ANN

Besides the data we also need to prepare our ANN. The *nn.Module* is the standard class for an ANN in PyTorch. The abbreviation *nn* stands for neural network. We build a child class of it where we specify our desired model architecture. Pytorch uses a base model object and adds the layers and activations as other objects in a sequential manner. The first layer must get an input dimensions matching the data, whereas the following can deduce their input size from the previous layer. The output layer then must match the dimension of the target values. Each ANN class needs a *forward* function which defines, how a signal propagates through the network.

You might recognize that the definition of the network is based on a *class*. Classes are a fundamental concepts in Object Oriented programming, and also apply to Python.

In case you want to refresh or get to know about the basics of classes, check out the following link:
https://www.geeksforgeeks.org/python-classes-and-objects/

The following link gives a bsaic introduction to building a model class in pytorch:
https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html#define-the-class

In [None]:
# Neural network definition
class NeuralNetwork(nn.Module):
    def __init__(self, n_input, n_hidden, n_output):
        super(NeuralNetwork, self).__init__()
        self.architecture = nn.Sequential(
            # Sequential model definition: add up layers & activation functions
            nn.Linear(in_features=n_input, out_features=n_hidden, bias=True),  # input layer
            nn.Tanh(), # activation function
            nn.Linear(in_features=n_hidden, out_features=n_hidden, bias=True),  # hidden layer
            nn.Tanh(), # activation function
            nn.Linear(in_features=n_hidden, out_features=n_output, bias=True)   # output layer
        )
    def forward(self, input): # feed forward path
        output = self.architecture(input)
        return output

## 1.4 - Train ANN

Now the fun begins. We put the dataset and ANN architecture together to "train" our ANN.

We use the training data to train our neural network. This process is nothing else then optimizing the weights and biases in our network. Before starting the training process, we need to define a few things:
- *optimizer*: Here we use *SGD* which stands for [stochastic gradient descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent). With the optimizer, we also need to define the learning rate *lr*. It determines how fast we adopt the weights and biases during the training. If it is too high, the learning becomes instable and the loss increases. If it is too low, we need to many epochs and we do not reach a satisfying precision.
- *loss function*: This is the objective of our training/optimization. For continuous outputs as in our example, you usually use the mean squared error (MSE). For discrete outputs, another function is needed, like the cross entropy.
- *epochs*: How often do we want to repeat the training with our dataset?

Optimizing these parameters is called hyerparameter tuning.

---
**Adjust the following hyperparameters to find the optimal combination:**

In [None]:
hidden_size = 32 # number of neurons in the hidden layer
learning_rate = 0.09 # learning rate for the backpropagation

The following cell prepares everything for the training. Before, *let's reflect on how many input variables and outputs should your network handle.*
* Thus, **create a simple sketch of the network structure.**
* Then, **proceed with the following cell.**  

In [None]:
# Neural network training
net = NeuralNetwork(1, hidden_size, 1) # Create instance of neural network
optimizer = optim.SGD(net.parameters(), lr=learning_rate) # Choose optimizer and learning rate
loss_fun = nn.MSELoss() # Define loss function
epochs = 5000 # Set number of epochs
net

This is the main part of using an ANN: the actual "training". We give an input to the network and see how the output differs from our expected output. The difference is used to calculate the loss. Then we update the weights and biases such that the loss will be smaller in the next epoch.

In the training process we use two datasets: the **training** and the **validation** data. The training data are used to calculate the training loss, which is then used for the backpropagation and the network update. The validation data are used to detect overfitting. We just calculate the loss for the validation data, but do not use it for the backpropagation. If the training and the validation loss diverge, we know that the network updates do not generalize for unseen data.

In [None]:
train_loss = []
val_loss = []

# train the network
for epoch in tqdm(range(epochs)):

    # Training
    # Placeholder variables for the predictions on the training set
    y_pred = None    

    ####################################################################################
    # TODO: Setup the training process of an ANN.                                      # 
    # Helpful Link:                                                                    #
    # https://pytorch.org/tutorials/beginner/introyt/trainingyt.html#the-training-loop #
    # 1. Clear the gradients for the next training epoch.                              #         
    # 2. Compute the value of the predictions y for the training set.                  #
    # 3. Calculate the loss between the predictions and the true values.               #
    # 4. Backward pass, compute gradients.                                             #
    # 5. Apply gradients to update weights and biases.                                 #
    ####################################################################################
    
    # Replace the pass statement with your code
    pass

    ######################################################################
    #                         END OF YOUR CODE                           #
    ######################################################################

    # Save loss for later evaluation
    train_loss.append(loss.item())

    # Validation 
    # Placeholder variables for the predictions on the validation set
    y_pred_val = None

    ######################################################################
    # TODO: Setup the validation process of an ANN.                      #          
    # 1. Compute the value of the predictions y for the validation set.  #
    # 2. Calculate the loss between the predictions and the true values. #
    # 3. Save the loss for later evaluation.                             #
    ######################################################################
    
    # Replace the pass statement with your code
    pass

    ######################################################################
    #                         END OF YOUR CODE                           #
    ######################################################################

 Afterwards, we plot the loss to see the training progress. The loss plot shows if adjustments need to be made to the hyperparameters.

In [None]:
# Visualize the training process
plt.figure(figsize=(10,4))
plt.plot(train_loss, label='training')
plt.plot(val_loss, label='validation', linestyle='--')
plt.xlabel('epoch')
plt.ylabel('loss')
plt.legend()
plt.title('Loss plot')
plt.show()

# Printing the final value of the loss during training
print('Final training loss: ')
print(train_loss[-1])

---
Now you can play around with the hyperparameters *hidden_size* and *learning_rate* to get a feeling how they affect the prediction quality. The following cell summarizes the results:

*hidden_size* $\in[1, 1024]$

*learning_rate* $\in(0, 1]$

Please report the results which are printed below for comparison via this form: https://forms.office.com/e/s0Vjw05MHL. You can view the results in this sheet: https://forms.office.com/Pages/AnalysisPage.aspx?AnalyzerToken=PksOFfUOGgo8OKYPa4lQ65GzcefJ9gSX&id=TVJuCSlpMECM04q0LeCIexFyDp5Q34RIuGykwiiuPeZUNTBMU1BKOTRXTUZTNktOWFI1WTlJVFZOSiQlQCN0PWcu.

In [None]:
# Summary of your NN layout and results
print('Hidden layer size: ', hidden_size)
print('Learning rate: ', learning_rate)
print('Validation MSE: ', val_loss[-1]) # last element from the validation loss

## 1.5 - Evaluate ANN

Now we evaluate the trained ANN by using it to make predictions on our test set.

In [None]:
# Testing the network
with torch.no_grad():
    y_pred_test = net(torch.Tensor(x_test).unsqueeze(-1))

For a first qualitative evaluation we plot both actual test data and the prediction by the ANN on the test set. We can play around with the hyperparameters and see how they affect the prediction quality.

In [None]:
# Visualize the performance of the network
plt.figure(figsize=(10,4))
plt.plot(x, model(x), color='r', linestyle='--', label='ground truth')  # Ground truth from defined model
plt.scatter(x_test, y_test, alpha=0.5, color='b', label='test data ground truth')  # Test points actual values
plt.scatter(x_test, y_pred_test, alpha=0.5, color='r', label='test data prediction')  # Test points predicted values
plt.title('ANN evaluation')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

## 1.6 - Discussion

#### Quantitative assessment

We were a little loose here by relying on a visual assessment of the fit. To systematically improve it we need a quantitative analysis of the errors. Therefore, we evaluate the mean squared error *MSE* of the test predictions.

$MSE = \frac{1}{n}\sum_{i=1}^n{(Y_i-\hat{Y}_i)^2}$

For this we use the feed forward function of the ANN and get the prediction for the different datasets.

In [None]:
# Using the trained Neural Network for computing predictions
with torch.no_grad():
    y_pred_train = net(xt_train)
    y_pred_val = net(xt_val)
    y_pred_test = net(xt_test)

In [None]:
def mse(y_true, y_pred):
    # Placeholder variable for the mean squared error
    mse_loss = None
    
    ####################################################################
    # TODO: Implement the mean squared error function.                 #
    # Do not use any built-in PyTorch functions except torch.mean()    #
    ####################################################################
    
    # Replace the pass statement with your code
    pass
    
    ####################################################################
    #                         END OF YOUR CODE                         #
    ####################################################################

    # The .item() method converts single element tensors to Python scalers for printing.
    return mse_loss.item()

In [None]:
print('Training MSE: ', mse(yt_train, y_pred_train))
print('Validation MSE: ', mse(yt_val, y_pred_val))
print('Test MSE: ', mse(yt_test, y_pred_test))

Another way to evaluate the model accuracy is to use a parity graph. In a parity graph, the ground truth is compared to the prediction. Ideally, the points should lay on the diagonal.

In [None]:
v_min = np.min(y)
v_max = np.max(y)
plt.plot([v_min, v_max], [v_min, v_max], c="b",label='Ideal Fit')

################################################################################
# TODO: Plot the parity graph.                                                 #
# 1. Add a scatter plot of the ground truth vs. the predictions to the plot.   #
################################################################################

# Replace the pass statement with your code
pass

################################################################################
#                             END OF YOUR CODE                                 #
################################################################################

plt.legend()
plt.xlabel('ground truth')
plt.ylabel('prediction')
plt.title('Parity graph')
plt.show()

#### Interpolation vs. Extrapolation

One important point to keep in mind is that purely data driven models cannot extrapolate. We see this in the following example. We increase the data range by 50% in both directions.

In [None]:
# Creating the extrapolation dataset +/- 50% original range
x_ext = np.linspace(np.min(x)-abs(x[-1]-x[0])/2, np.max(x)+abs(x[-1]-x[0])/2, 100)

Again, in order to work with PyTorch we transform our array to a tensor.

In [None]:
xt_ext = torch.tensor(x_ext, dtype=torch.float).unsqueeze(-1)

Now we use the trained model to predict values outside the training range.

In [None]:
# neural network evaluation of the extrapolated range
with torch.no_grad():
    y_pred_ext = net(xt_ext)

We plot the ANN regression. The vertical bars mark the border of the training range.

In [None]:
# Creating plot for comparison of the extrapolation capabilities

plt.figure(figsize=(10,4))
plt.text(np.mean(x)-0.1*abs((np.max(x)-np.min(x))), np.min(y)+0.1*abs((np.max(y)-np.min(y))), 'interpolation', bbox=dict(facecolor = 'w'))
plt.text(np.min(x)-0.8*abs((np.min(x_ext)-np.min(x))), np.min(y)+0.1*abs((np.max(y)-np.min(y))), 'extrapolation', bbox=dict(facecolor = 'w'))
plt.text(np.max(x)+0.3*abs((np.max(x_ext)-np.max(x))), np.min(y)+0.1*abs((np.max(y)-np.min(y))), 'extrapolation', bbox=dict(facecolor = 'w'))
plt.plot(x_ext, y_pred_ext, color='k', label='ANN prediction')
plt.plot(x_ext, model(x_ext), color='r', label='ground truth', linestyle='--')
plt.scatter(x_train, y_train, alpha=0.5, color='b', label='training data')
plt.axvline(np.min(x), linestyle='--')
plt.axvline(np.max(x), linestyle='--')
plt.title('Regression Analysis')
plt.xlabel('x')
plt.ylabel('y')
plt.legend(loc='upper right')
plt.show()

In the plot we clearly see that the neural network is not able to extrapolate.

## 1.7 - Conclusion

In the first exercise of the lab we demonstrated how to use an ANN for regression. We introduced the key parameters to train an ANN and experienced, how they affect the training process. We also discussed extrapolation as a shortcoming of ANNs. Maybe you also experienced overfitting during your hyperparameter tuning?

We hope you enjoyed it!

For solving the first part, the contents were covered during the first half of Lecture 2.

## 2 - QSPR employing Neural Networks

In the tutorial overview, you were introduced to "Quatitative Structure-Property Relationships" (QSPR). You might recall that this method is based on the defining a set of descriptors that depend on the structure of the molecule, followed by a regression model that predicts a property given a set of descriptors. Previously, we used a Linear Regression approach to define a model that aimed to predict the boiling point ($T_b$) [in K] of refrigerants. In this exercise, you are asked to generate a model - based on a feedforward ANN - to provide the boiling point of a refrigerant, given a set of descriptors ($D$).

Let's start by importing the necessary libraries.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.optim as optim
import torch.nn as nn
import pandas as pd
from sklearn.metrics import root_mean_squared_error
from sklearn.metrics import r2_score
import warnings
warnings.filterwarnings("ignore")
# Standardizes features to a mean of 0 and variance of 1.
from sklearn.preprocessing import StandardScaler  
from sklearn.model_selection import train_test_split
from tqdm import tqdm

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


### 2.1 - Load the dataset

Let's start by loading the dataset containing the descriptors and boiling points for 192 compounds.

**Keep in mind that:** The table contains several refrigerants with different descriptors and their normal boiling point obtained from experiments.

* The descriptors are the following:

| Molecular Descriptor | Descriptor type | Descriptor definition |
| --- | --- | --- |
| R1e+ | GETAWAY descriptors | R maximal autocorrelation of lag 1/weighted by atomic Sanderson electronegativities |
| MATS1m | 2D autocorrelation indices | Moran autocorrelation - lag 1/weighted by atomic masses |
| X1sol | Connectivity indices | Solvation connectivity index chi-1 |
| Me | Constitutional descriptors | Mean atomic Sanderson electronegativity (scaled on Carbon atom) |
| ESpm02d | Edge adjacency indices | Spectral moment 02 from edge adj. matrix weighted by dipole moments |

In [None]:
datafile_path = './data/boiling_point_data.csv'

# Placeholder variables
data = None

###########################################################################################
# TODO: Loading the dataset                                                               #
# 1. Use the pandas library to load the data from the 'data/boiling_point_data.csv' file. #  
###########################################################################################

# Replace the pass statement with your code
pass

###########################################################################################
#                             END OF YOUR CODE                                            #
###########################################################################################

data

### 2.2 - Data processing

**Note:** To ensure reproducibility of our results, set the seed to 2024.

In this part, your code should include:
- $x_{train}$, $y_{train}$, $x_{val}$, $y_{val}$, $x_{test}$, and $y_{test}$: Use *sklearn.model_selection.train_test_split* function to get 80% training, 10% validation, and 10% test datast and set the random_state = 20.
- Normalized $x_{train}$, $x_{val}$, and $x_{test}$- You can use the StandardScaler from the *sklearn* library
- Create the tensors needed for training and testing the model(s) you will develop.

In [None]:
# Random seed for reproducibility
torch.manual_seed(2024)
np.random.seed(2024)

# Extracting data
x_all = data[['descriptor: R1e+','descriptor: MATS1m','descriptor: X1sol','descriptor: Me','descriptor: ESpm02d']].values
y_all = data[['experiment: Tboil /K']].values

#initialize the normalization
st = StandardScaler()

#########################################################################################################################
# TODO: Data splitting and normalization                                                                                #
# Helpful Link:                                                                                                         #
# https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html                       #
# 1. Split the data into training set (80%), validation set (10%) and test set (10%), set the random state to 20.       #
# 2. Create methods for normalizing the inputs (x) and outputs (y) of a model using the                                 #
# MaxAbsScaler from the scikit-learn library.                                                                           #
# 3. Transform the datasets using norm_entries for x_train, x_val and x_test.                                           #
# 4. Create the tensors for the training and test datasets using the torch library.                                     #
# Do not squeeze or unsqueeze the tensors.                                                                              #
#########################################################################################################################

# Replace the pass statement with your code
pass

###########################################################################################
#                                  END OF YOUR CODE                                       #
###########################################################################################

### 2.3 - Model builiding and training

Now with the tensors being properly defined, we can build and train the FFNN for predicting the boiling point of the molecule. In a similar fashion as in the previous exercises, you can tune the number of nodes per layer, the number of hidden layers, and the learning rate. In order to assess the accuracy of your model, calculate the MSE for the test set.

In this task, your script should return:

* `n_input`, `n_output`, `n_nodes`, `n_hidden_layers`, and `epochs` used for your FF-NN.
* `loss_train` (Array): Containing the MSE of the training set per epoch.
* `loss_test` (Array): Containing the MSE of the testing set per epoch.

In [None]:
#################################################################################
# TODO: Define the architecture of the neural network.                          #
# 1. Define the architecture of the neural network using the nn.Sequential()    #
# method.                                                                       #
# 2. Use the following layers:                                                  #
#            One linear input layer + One Activation function (ReLU)            #
#            4 (linear hidden layer + Activation function (ReLU))               #
#            One linear output layer                                            #
# Feel free to add more hidden layers with activation functions.                #
# 3. Define the forward method to perform the feed forward pass.                #
#################################################################################

# Replace the pass statement with your code
pass

#################################################################################
#                             END OF YOUR CODE                                  #
#################################################################################

In [None]:
n_input = 5 # number of entries
n_output = 1 # number of outputs
n_nodes = 50 # number of neurons in the hidden layer
learning_rate = 5e-4 # learning rate for the backpropagation

# Neural network Definition
net = NeuralNetwork(n_input, n_nodes, n_output) # create instance of neural network
optimizer = optim.Adam(net.parameters(), lr=learning_rate) # sets optimizer and learning rate
loss_fun = nn.MSELoss() # define loss function
epochs = 600 # set number of epochs
net

In [None]:
# Training
loss_train = np.empty(epochs)
loss_test = np.empty(epochs)

#################################################################################
# TODO: Define the training process of the ANN.                                 #
# Use the normalised datasets for training and testing.                         #
# Define the training loop the same way as in section 1.                        #
# For every epoch:                                                              #
# 1. Clear the gradients for the next training epoch.                           #
# 2. Compute the value of the predictions y for the training set.               #
# 3. Calculate the loss between the predictions and the true values.            #
# 4. Backward pass, compute gradients.                                          #
# 5. Apply gradients to update weights and biases.                              #
# 6. Save the training loss for later evaluation.                               #
# 7. Compute the value of the predictions y for the test set.                   #
# 8. Calculate the loss between the predictions and the true values.            #
# 9. Save the test loss for later evaluation.                                   #
#################################################################################

# Replace the pass statement with your code
pass

#################################################################################
#                         END OF YOUR CODE                                      #
#################################################################################

print('Final training loss: ')
print(train_loss[-1])
print('Results from your QSPR model using an ANN with: ')
print('Hidden {:d} layers'.format(n_nodes))
print('Learning rate: {:.3e}'.format(learning_rate))
print('Training MSE: {:.3e}'.format(train_loss[-1])) # last element from the training loss
print('Validation MSE: {:.3e}'.format(val_loss[-1])) # last element from the validation loss


### 2.4 - Visualizing the results

With the previous arrays containing the loss for both, training and testing phases, we can evaluate how the network fits the (normalized) boiling point to the (normalized) descriptors. Your final visualization should resemble the trends shown in the provided 'Loss results' graph.

In this step,

* Plot the RSME values for the training set and test set predictions against the epochs.
* Use a logarithmic scale for the loss axis.  
* Label your axes properly and provide a legend to distinguish between the training and testing RSME.

In [None]:
#################################################################################
# TODO: Create the loss plot for the training of your QSPR model                #
# Plot the loss for the training and test set.                                  #
# Use a logarithmic scale for the y-axis.                                       #
#################################################################################

# Replace the pass statement with your code
pass

#################################################################################
#                         END OF YOUR CODE                                      #
#################################################################################

## 2.5 Prediction on test dataset
Now that we have identified our optimal model, it's time to proceed with predictions on the test dataset. We will use the previously determined best degree and model for this purpose. Additionally, we will calculate the coefficient of determination $R^2$ and create a parity plot to illustrate the model's final performance.

In this task,

- $y_{predict}$: Utilize the *best_degree* and *best_model* you obtained before to get the final prediction on test dataset.


In [2]:

##################################################################################
# TODO: Model evaluation                                                         #
# Helpful Tutorial: https://data36.com/polynomial-regression-python-scikit-learn/#
# 1. utilize the trained net to make prediction on test dataset                  #
# 2. Transform the test dataset into numpy format                                #
##################################################################################

# Replace the pass statement with your code
pass

##################################################################################
#                              END OF YOUR CODE                                  #
##################################################################################

A parity plot is a type of scatter plot commonly used in regression analysis to visually assess the accuracy of a predictive model. It plots the actual values of the target variable against the predicted values obtained from the model. It provides a quick visual assessment of model performance across the range of observed values.

Ideally, if a model predicts perfectly, all points in a parity plot would lie on a diagonal line indicating perfect agreement between actual and predicted values. This line is often referred to as the "line of perfect fit" or "ideal fit" line.

In this task,

- $r2$: Utilize the *r2_score* to calculate the coefficient of determination.
- $RMSE$: Utilize the *root_mean_squared_error* to calculate the RMSE
- parity polt: Include a scatter plot of actual values against predicted values, and an actual fit line.

In [3]:
#################################################################################
# TODO: Basic Plotting and evaluation                                           #
# Helpful Tutorial: https://matplotlib.org/stable/users/explain/quick_start.html#
# 1. Calculate and print the R2 score and RMSE for the test data                #
# 2. Make the parity plot using actual boiling points for the x-axis and        #
#    predicted boiling points for the y-axis.                                   #
# 3. Add a line through the plot to represent the ideal fit.                    #
#################################################################################


# Replace the pass statement with your code
pass

#################################################################################
#                         END OF YOUR CODE                                      #
#################################################################################