## Fermentation PAT course 2022
### Application of PAT to fermentation processes

Technical University of Denmark (DTU)

Department of Chemical and Biochemical Engineering, PROSYS

__Author:__ Pau Cabaneros Lopez

This exercise uses spectral data collected in _real-time_ during a lignocellulosic ethanol fermentation. These samples were taken using attenuated total refractance, mid infrared (ATR-MIR) spectroscopy. The reference data was taken using high performance liquid chromatography (HPLC). 

The datasets used in this project consist of:

- Training dataset (containing both, spectra and the corresponding HPLC measurements).

- Fermentation dataset: containing a timeseries of spectra collected in real time, and another timeseries of off-line HPLC measurements. 

For detailed information about the training and fermentation datasets, see  _Transforming data to information: A parallel hybrid model for real-time state estimation in lignocellulosic ethanol fermentation_.



## Step 0. Set up the environment

When we use Google Colab, we need to setup the environment by installing the required packages and importing the necessary libraries.

### 0.1. Installing the dependencies

In [1]:
!pip install dtuprosys

Collecting dtuprosys
  Downloading dtuprosys-0.1.18-py3-none-any.whl (6.5 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.5/6.5 MB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m0:01[0m:01[0mm
Installing collected packages: dtuprosys
Successfully installed dtuprosys-0.1.18


### 0.2. Importing the libraries

In [2]:
from dtuprosys.chemometrics.modelling.cross_validation import cross_validation
from dtuprosys.chemometrics.datasets import load_fermentation_data, load_train_data
from dtuprosys.chemometrics.plotting import plot_spectra, plot_predictions, plot_fermentation
from dtuprosys.chemometrics.preprocessing import RangeCut, Derivative, DriftCorrection
from mbpls.mbpls import MBPLS

## Step 1. Explore the training data

### Exercise 1.1. Load the training data

Training data is directly loaded using the ```load_train_data()``` function. In the cell below, create two variables ```train_spectra``` and ```train_hplc``` to store the training data.

e.g. ```train_spectra, train_hplc = load_train_data()```

In [3]:
# Write your code below:


### Exercise 1.2. Inspect the training data

To inspect the training data, we will first visualize each table. Below I show an example of how to visualize the training spectra:

In [4]:
# Run this cell to visualize the variable train_spectra
train_spectra

NameError: name 'train_spectra' is not defined

In [None]:
# Run this cell to visualize the variable train_hplc.
# Write your code below:


Then, we would like to identify the dimensions of each table. We can do so with the ```.shape``` attribute. Below I show an example of how to inspect the dimensions of the training spectra:

In [None]:
# Run this cell to get the dimensions of the train_spectra table and assign it to the variable train_spectra.
dimensions_train_spectra = train_spectra.shape

# Find the dimensions of the train_hplc and assign it to the variable dimensions_train_hplc.
# Write your code below:
dimensions_train_hplc = 

# Do not modify the code below!
print("The train_spectra table has {} rows and {} columns.".format(dimensions_train_spectra[0], dimensions_train_spectra[1]))
print("The train_hplc table has {} rows and {} columns.".format(dimensions_train_hplc[0], dimensions_train_hplc[1]))

### Exercise 1.3 Plot the training data

Plotting the spectra is a good way to get a first impression of the data. Fill in the code below to show the plots of the ```train_spectra``` using the function ```plot_spectra()```. Here is an example of how to use this function:

```plot_spectra(range_cut_spectra, title, x_label, y_label, reference=training_hplc)```

In [None]:
# Run this cell to visualize train_spectra.
# Write your code below:
title = ""
x_label = ""
y_label = ""




## Step 2. Preprocess the training data

Preprocessing is fundamental to remove noise and improve the quality of the data. In this step, we will preprocess the training data to prepare it for the model training.

### Exercise 2.1. Range Cut

Cut the spectra to the range of interest. The range of interest can be whatever you want, but it is recommended to use the range of the spectra between 900 and 1700 cm.


In [None]:
# Write your code below:
# Range cut
min_wavenumber =
max_wavenumber =
range_cut = RangeCut(min_wavenumber, max_wavenumber)

# Apply the range cut to train_spectra using the .apply_to() method and assign it to the variable range_cut_spectra.
range_cut_spectra = range_cut.apply_to(train_spectra)

# Use the plot_spectra() function to visualize the range cut.
title = ""
x_label = ""
y_label = ""


### Exercise 2.2. Drift Correction


Drift correction is a simple way to remove scatter from the spectra. Below, drift correct the ```range_cut_spectra``` and assign it to the variable ```drift_corrected_spectra```.

In [None]:
# Write your code below:
# Drift correction
drift_correction = 

# Apply the drift correction to range_cut_spectra using the .apply_to() method and assign it to the variable drift_correct_spectra.
drift_corrected_spectra = 

# Use the plot_spectra() function to visualize the drift corrected spectra.
title = ""
x_label = ""
y_label = ""


### Exercise 2.3. Derivative

Derivating a spectra is a very common way to remove additive and multiplicative scatter from the spectra. Below, derivate the ```drift_corrected_spectra``` and assign it to the variable ```derivative_spectra```. Please, calculate the second derivate of the spectra.

In [None]:
# Write your code below:
# Derivative
derivative_order =
derivative = 

# Apply the derivative to drift_corrected_spectra using the .apply_to() method and assign it to the variable derivative_spectra.
derivative_spectra =

# Use the plot_spectra() function to visualize the derivative spectra.
title = ""
x_label = ""
y_label = ""


In [None]:
# In this cell, we will define the variable preprocessed_spectra, as the output of the final preprocessing step.
preprocessed_spectra = derivative_spectra

## Step 3. Train the model

### Exercise 3.1. Cross-validation

The next step is training the PLS model. To do so, we first need to specify the number of components to use in the PLS model. This can be achieved using a corss-validation procedure. This is done using the function ```cross_validation()```. This function will output a plot showing the RMSE and RMSECV (Root Mean Squared Error and Root Mean Squared Error of the Cross Validation) for the different number of components.

Use the cross validation function to find the optimal number of components. You can run a cross validation using the following code:


```cross_validation(derivate_spectra, train_hplc.glucose)```


In [None]:
# Cross validation
# Write your code below:


What is the number of latent variables that you would select for this model? Assign the number to the variable ```nr_latent_variables```.

In [None]:
# Writ your code below:
nr_latent_variables =

# Do not modify the code below!
print("The number of latent variables is {}.".format(nr_latent_variables))

### Exercise 3.2. Train the PLS model

Now we will train and evaluate the PLS model. To do so, we will use the ```MBPLS()``` function. This function will allow us to train the PLS model and to predict the glucose concentration. Then, we will use the ```plot_prediction()``` function to visualize the results.

In [None]:
# We will train a MBPLS model with the number of latent variables you have chosen. Then we will use the fit_predict() method to 
# - train the model
# - predict the glucose concentration from the preprocessed spectra

model = MBPLS(n_components=nr_latent_variables, method='NIPALS')
prediction = model.fit_predict(preprocessed_spectra, train_hplc)

# Then, we will evaluate the performance of the model using the plot_predictions() function.
# Write your code below:


## Step 4. Explore the fermentation data

### Exercise 4.1. Load the fermentation data

fermentation data is directly loaded using the ```load_fermentation_data()``` function. In the cell below, create two variables ```fermentation_spectra``` and ```fermentation_hplc``` to store the fermentationing data.

e.g. ```fermentation_spectra, fermentation_hplc = load_fermentation_data()```

In [None]:
# Write your code below:

Now, lets explore the fermentation data:

In [None]:
# Run this cell to get the dimensions of the fermentation_spectra table and assign it to the variable fermentation_spectra.
dimensions_fermentation_spectra = fermentation_spectra.shape

# Find the dimensions of the fermentation_hplc and assign it to the variable dimensions_fermentation_hplc.
# Write your code below:
dimensions_fermentation_hplc = 

# Do not modify the code below!
print("The fermentation_spectra table has {} rows and {} columns.".format(dimensions_fermentation_spectra[0], dimensions_fermentation_spectra[1]))
print("The fermentation_hplc table has {} rows and {} columns.".format(dimensions_fermentation_hplc[0], dimensions_fermentation_hplc[1]))

## Step 5. Preprocess the fermentation data

### Exercise 5.1. Preprocess the data using the sequence defined in Step 2

Remember that the preprocessing steps are stored in the variables:

- ```range_cut```
- ```drift_correction```
- ```derivative```

and they can be applyed using the ```.apply_to(spectra)``` method.



In [None]:
# Write your code below:
# Range cut
fermentation_range_cut = 

# Drift correction
fermentation_drift_corrected =

# Derivative
fermentation_derivative = 

# In this cell, we will define the variable preprocessed_fermentation_spectra, as the output of the final preprocessing step
preprocessed_fermentation_spectra =

## Step 6. Predict the glucose concentration from the fermentation data

### Exercise 6.1. Predict the glucose concentration with the PLS model trained in Step 3

Use the ```model.predict()``` function to predict the glucose concentration from the fermentation spectra.

In [None]:
# Write your code below:
prediction = 

### Exercise 6.2. Evaluate the predictions using the PLS models

Use the ```plot_fermentation()``` function to visualize the results.

In [None]:
# Write your code below:


### END OF EXERCISE