# Exercise worksheet no 4

# Hackathon I

### Learning an ozone parameterization for an Earth system model

*Machine learning in climate and environmental sciences, winter semester 2023, Jun.-Prof. Peer Nowack, peer.nowack@kit.edu*

*Chair for AI in Climate and Environmental Sciences, https://ki-klima.iti.kit.edu*

**Learning objectives:** The goal of this hackathon is to provide you with a realistic climate science example for training and assessing a range of machine learning (ML) algorithms. You will tackle an actual research question: *what is the best ML set-up to parameterize ozone variability in a global Earth System Model (ESM)?* For this, you receive training data from simulations conducted with the United Kingdom Earth System Model (UKESM), which participated in the [Coupled Model Intercomparison Project phase 6](https://www.wcrp-climate.org/wgcm-cmip/wgcm-cmip6).
#### Background

From the UKESM data, you will learn to predict ozone - [an important atmospheric trace gas](https://www.ametsoc.org/index.cfm/ams/about-ams/ams-statements/statements-of-the-ams-in-force/atmospheric-ozone1/) - as a function of the state of the atmosphere one day earlier. These functions could later be used to model ozone within UKESM at much reduced computational cost, because simulating ozone is one of the [most expensive ESM components](https://gmd.copernicus.org/articles/11/3089/2018/).

Specifically, you will compare at least **three different ML regression algorithms of your choice** to predict daily mean ozone (O$_3$) concentrations in six distinct regions of the atmosphere (i.e. in six grid boxes of UKESM, which is a numerical model). In other words, you will learn functions $f_{ML}$

$$
\text{O}_3(t)_{i,j,k} = f_{ML}(\mathbf{X};t-1)
$$

where $\mathbf{X}$ contains a large number of meteorological variables from the previous day ($t-1$) that might represent processes driving ozone variability. The indices $i,j,k$ indicate the position (latitude, longitude, height) of the six grid boxes for which we aim to predict ozone. The predictor variables included in $\mathbf{X}$ are: 

- atmospheric temperature
- [chlorine content](https://www.revistascca.unam.mx/atm/index.php/atm/article/view/38656) (important for chemical ozone loss in the stratosphere)
- [zonal wind](https://glossary.ametsoc.org/wiki/Zonal_wind) (characterizes wind speed and direction along the east-west direction; key for dynamics, transport of ozone)
- [meridional wind](https://glossary.ametsoc.org/wiki/Meridional_wind) (characterizes wind speed and direction along the north-south direction)
- [Eliassen-Palm flux](https://www.gfdl.noaa.gov/bibliography/related_files/dga8301.pdf) (characterizes wave propagation and angular momentum transfer onto the stratosphere, which drives the stratospheric [Brewer-Dobson circulation](https://www.fz-juelich.de/en/iek/iek-7/research/atmospheric-coupling-processes/brewer-dobson-circulation))
- specific humidity (the atmospheric water vapour distribution carries imprints of current and past atmospheric circulation variability and water vapour is involved in atmospheric chemistry driving changes in ozone)
- shortwave heating rates (characterizes heating of the grid box due to absorption of sunlight by the atmosphere, coupled to ozone photochemistry)
- longwave heating rates (characterizes heating of the grid box due to the absorption of terrestrial outgoing longwave radiation by the atmosphere)
- pressure on theta/model levels (characterizes the atmospheric dynamical state)

These predictors are provided to you across the entire **atmospheric column** around the ozone grid point in question (an example of such a model column is illustrated below, image source: UK Met Office). Simply speaking, we use column information because we can: ESMs are numerical models which require a high level of parallelization of computations. Due to how parallelization is implemented in UKESM, we will always have the vertical column information available at runtime to predict ozone at a given grid location, but horizontally distant information might not be accessible before other parallel computations have been completed. For more details, do get in touch :)

<div>
<img src="./images/metofficegovuk.jpeg" width="300"/>
</div>

#### Hackathon set-up and rules

Everyone of you has access to the same training data for the predictand time series (ozone concentrations at six grid locations of the atmosphere) and predictors (the meteorological variables across matching atmospheric columns). **This creates six separate regression problems, one for each ozone target grid cell.**

**Your task** is to use this data to build the **best possible ML model** in terms of generalizable predictive skill. The only *restrictions* are that we ask you to 

- submit one final best model type. For example, you are allowed to submit six different feedforward neural networks or six different random forest regressors for your "best model", but not a mix of these two options.
- use the same set of predictors for all grid cells, i.e. don't use temperature only for one grid cell and temperature and specific humidity for another.
- not take ozone from the previous timesteps as input feature as this can lead to runaway instabilities in actual ESM simulations! Submissions which include ozone as predictor itself will not be accepted.

Apart from that, you have complete freedom to use the training data provided as you wish, i.e. you can choose 
- the regression model
- the cross-validation strategy
- the variables you include in your fit
- the variable scaling approach
- if you want, potential dimension reduction pre-processing methods such [Principal Component Analysis](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) for the predictors, which might be helpful to speed-up your calculations for more complex algorithms.
- to experiment with e.g. logarithmic or exponential transformations of the predictors

You may use regression algorithms already discussed in the module, such as [multiple linear regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html), [ridge regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html), [lasso regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html), [elastic nets](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html), [random forest regression](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html), [XGBoost](https://www.kaggle.com/code/stuarthallows/using-xgboost-with-scikit-learn) and [AdaBoost](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostRegressor.html), or [feedforward neural networks](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html). However, you are also invited to try out algorithms you might be personally particularly interested in, e.g. as part of your Master's thesis project. For more flexible neural network implementations, we have additionally included *three simple ways to implement a feedforward neural network in Python* at the end of this notebook. Feel free to experiment with the one(s) you like most, and to adapt to more complex network architectures.

You are also allowed to use [multi-output/multivariate regression](https://machinelearningmastery.com/multi-output-regression-models-with-python/) approaches, i.e. to predict all six grid cells with one model object at once. For example, the `sklearn` implementations of `Ridge()` and `RandomForestRegressor()` allow for that, you simply need to use output matrices Y of dimensions (nr_samples, nr_grid_cells). You can read more about this in their function documentations.

Below we run through the entire process once using a `Ridge()` regession model with only temperatures as predictors. We also include an example for the preparation of your submission files: once you have found your personal "best model", save the model object(s) using packages such as `joblib` to the subfolder `model_objects`. In the saved object, or in the notebook, make sure to include information on how we can easily re-create your predictor matrix. Then upload the entire folder, including your notebook with three solutions, as usual as zip file to Ilias. In your Jupyter notebook, make sure to include an example of how we can load the best model and use it to predict (ideally with helpful comments). If you needed to install additional Python `packages` then please include an extra commented cell with `!conda` statements to install them.

#### How we create a hackathon ranking
The only way to objectively rank your solutions is to hold back separate test data, which we will apply your submitted best solution to. We will measure prediction skill in terms of the `mean_squared_error`. The errors will be compared on standard-scaled ozone time series so that, despite very different ozone concentrations across the different locations, all six grid points will be considered approximately equally in the final scoring function.

The best performing solutions will be discussed in the exercise class and will be - as is characteristic for this module - rewarded with chocolate (or, if desired, an appropriate alternative). If successful, you might even inspire the approach for a current research project on ozone parameterizations.

## Good luck!

Below an example run through the data, plus the different ways to fit a feedforward neural network. In case of questions, feel free to post on the **Discussion Board** on Ilias.

#### Load Python and the "ML-climate" kernel

As always: if you are working on your own computer, now select the "ML-climate" kernel. This option should exist for you if you followed the Anaconda 3 and subsequent installation instructions provided on Ilias. Alternatively, you can run the notebook on Google Colab. As usual, you will need to use e.g. the Colab data loader whenever files need to be read from the exercise folder (see Worksheets 1 and 2), and certain packages might still need to be installed using pip.

In [None]:
# ## On Google Colab uncomment the following lines
# !pip install netcdf4
# !pip install keras==2.12.0
# !pip install skorch

In [None]:
## Please comment out after successful installation
!conda install tensorflow -y
!conda install -c conda-forge py-xgboost -y
!conda install pytorch -y
!conda install -c conda-forge skorch -y

In [None]:
# load a few Python packages that might be useful for you
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as colors
from matplotlib.ticker import MultipleLocator, FormatStrFormatter
from matplotlib import rcParams
import xarray as xr
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from xgboost import XGBRFRegressor
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import Ridge, Lasso, LinearRegression
from sklearn.decomposition import PCA
from sklearn.model_selection import GridSearchCV, KFold, train_test_split, cross_val_score, TimeSeriesSplit
from sklearn.preprocessing import StandardScaler, MinMaxScaler
import joblib
import netCDF4
### neural network packages
import tensorflow as tf
import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from keras.wrappers.scikit_learn import KerasRegressor
from keras.regularizers import l2
from sklearn.neural_network import MLPRegressor
from tensorflow.keras import layers
from tensorflow.keras import activations
import torch
import torch.nn as nn
import torch.optim as optim
import skorch
from skorch import NeuralNetRegressor

In [None]:
# you might need to adjust the paths depending on your operating system
# on Google Colab you will need the manual loader function (as in previous Worksheets)
file_X_cell1 = netCDF4.Dataset('./data/X_cell1_students.nc','r')
file_X_cell2 = netCDF4.Dataset('./data/X_cell2_students.nc','r')
file_X_cell3 = netCDF4.Dataset('./data/X_cell3_students.nc','r')
file_X_cell4 = netCDF4.Dataset('./data/X_cell4_students.nc','r')
file_X_cell5 = netCDF4.Dataset('./data/X_cell5_students.nc','r')
file_X_cell6 = netCDF4.Dataset('./data/X_cell6_students.nc','r')
#
file_Y_cell1 = netCDF4.Dataset('./data/Y_cell1_students.nc','r')
file_Y_cell2 = netCDF4.Dataset('./data/Y_cell2_students.nc','r')
file_Y_cell3 = netCDF4.Dataset('./data/Y_cell3_students.nc','r')
file_Y_cell4 = netCDF4.Dataset('./data/Y_cell4_students.nc','r')
file_Y_cell5 = netCDF4.Dataset('./data/Y_cell5_students.nc','r')
file_Y_cell6 = netCDF4.Dataset('./data/Y_cell6_students.nc','r')

In [None]:
### read out each cells' 3D spatial coordinates
cell1_lon = file_Y_cell1['lon'][:]
cell2_lon = file_Y_cell2['lon'][:]
cell3_lon = file_Y_cell3['lon'][:]
cell4_lon = file_Y_cell4['lon'][:]
cell5_lon = file_Y_cell5['lon'][:]
cell6_lon = file_Y_cell6['lon'][:]
#
cell1_lat = file_Y_cell1['lat'][:]
cell2_lat = file_Y_cell2['lat'][:]
cell3_lat = file_Y_cell3['lat'][:]
cell4_lat = file_Y_cell4['lat'][:]
cell5_lat = file_Y_cell5['lat'][:]
cell6_lat = file_Y_cell6['lat'][:]
#
cell1_z = file_Y_cell1['hybrid_ht'][:]/1000.0
cell2_z = file_Y_cell2['hybrid_ht'][:]/1000.0
cell3_z = file_Y_cell3['hybrid_ht'][:]/1000.0
cell4_z = file_Y_cell4['hybrid_ht'][:]/1000.0
cell5_z = file_Y_cell5['hybrid_ht'][:]/1000.0
cell6_z = file_Y_cell6['hybrid_ht'][:]/1000.0

In [None]:
### Select which variables to include as input features
### field2200 = Lumped Cl - grid cell chlorine content in [kg/kg air]
### field1079 = Divergence of Eliassen-Palm flux - a quantity measuring the stirring/momentum transfer on the atmospheric flow
### q = specific humidity [kg/kg air] 
### lwhr = Long-wave heating rates [K/s]
### p_2 = Pressure [Pa]
### swhr = Short-wave heating rates [K/s]
### temp = Temperature [K]
### u = Westerly wind component [m/s]
### v = Southerly wind component [m/s]
# columns = ['field2200', 'field1079', 'q', 'lwhr', 'p_2', 'swhr', 'temp', 'u', 'v']
### Here, we first only include temperature as predictor
columns = ['temp']

In [None]:
'''for reasons in which the regression in an ESM is parallelized on a supercomputer we are only interested in regressions
using variable information from the same atmospheric column = up to 85 vertical UKESM model levels. 
Therefore, we have array dimensions of (number time steps, 85) initially for temperature.
Let's read in all the input features we are interested in - therefore the second dimension increases as we stack up more features'''
X_cell1 = file_X_cell1.variables[columns[0]][:,:,0,0]
X_cell2 = file_X_cell2.variables[columns[0]][:,:,0,0]
X_cell3 = file_X_cell3.variables[columns[0]][:,:,0,0]
X_cell4 = file_X_cell4.variables[columns[0]][:,:,0,0]
X_cell5 = file_X_cell5.variables[columns[0]][:,:,0,0]
X_cell6 = file_X_cell6.variables[columns[0]][:,:,0,0]
print(X_cell1.shape)
for i in columns[1:]:
    X_add_cell1 = file_X_cell1.variables[i][:,:,0,0]
    X_cell1 = np.hstack((X_cell1,X_add_cell1))
    print(X_cell1.shape)
    #
    X_add_cell2 = file_X_cell2.variables[i][:,:,0,0]
    X_cell2 = np.hstack((X_cell2,X_add_cell2))
    #
    X_add_cell3 = file_X_cell3.variables[i][:,:,0,0]
    X_cell3 = np.hstack((X_cell3,X_add_cell3))
    #
    X_add_cell4 = file_X_cell4.variables[i][:,:,0,0]
    X_cell4 = np.hstack((X_cell4,X_add_cell4))
    #
    X_add_cell5 = file_X_cell5.variables[i][:,:,0,0]
    X_cell5 = np.hstack((X_cell5,X_add_cell5))
    #
    X_add_cell6= file_X_cell6.variables[i][:,:,0,0]
    X_cell6 = np.hstack((X_cell6,X_add_cell6))

In [None]:
### read in ozone time series for each grid cell = field2101 [kg of ozone/kg of air]
Y_cell1 = file_Y_cell1.variables['field2101'][:,0,0,0]
print(Y_cell1.shape)
Y_cell2 = file_Y_cell2.variables['field2101'][:,0,0,0]
Y_cell3 = file_Y_cell3.variables['field2101'][:,0,0,0]
Y_cell4 = file_Y_cell4.variables['field2101'][:,0,0,0]
Y_cell5 = file_Y_cell5.variables['field2101'][:,0,0,0]
Y_cell6 = file_Y_cell6.variables['field2101'][:,0,0,0]

In [None]:
#### offset ozone and input features by one day to actually train the predictions on the desired prediction timescale
lag=1
Y_cell1 = Y_cell1[lag:]
Y_cell2 = Y_cell2[lag:]
Y_cell3 = Y_cell3[lag:]
Y_cell4 = Y_cell4[lag:]
Y_cell5 = Y_cell5[lag:]
Y_cell6 = Y_cell6[lag:]
#
X_cell1 = X_cell1[:-lag,:]
X_cell2 = X_cell2[:-lag,:]
X_cell3 = X_cell3[:-lag,:]
X_cell4 = X_cell4[:-lag,:]
X_cell5 = X_cell5[:-lag,:]
X_cell6 = X_cell6[:-lag,:]
# number timesteps in total after lagging = nt
nt=len(Y_cell1[:])

In [None]:
#### plot ozone time series
### clearly you can see that ozone mass mixing ratios take on very different values in different grid cells
# the time behaviour is also very different...
plt.figure(figsize=(15,5))
plt.plot(np.arange(0,nt),Y_cell1,label='cell 1')
plt.plot(np.arange(0,nt),Y_cell2,label='cell 2')
plt.plot(np.arange(0,nt),Y_cell3,label='cell 3')
plt.plot(np.arange(0,nt),Y_cell4,label='cell 4')
plt.plot(np.arange(0,nt),Y_cell5,label='cell 5')
plt.plot(np.arange(0,nt),Y_cell6,label='cell 6')
plt.xlabel('Number timesteps (days)',size=16)
plt.ylabel('Ozone mmr (kg/kg)',size=16)
plt.legend(fontsize=16)
plt.show()

In [None]:
### example for a possible split of the data into training and test datasets
### once you have found your best parameter settings you might just want to retrain your model on the entire dataset
### i.e. set test_size=0.0
X_cell1_train, X_cell1_test = train_test_split(X_cell1,test_size=0.2,shuffle=False)
X_cell2_train, X_cell2_test = train_test_split(X_cell2,test_size=0.2,shuffle=False)
X_cell3_train, X_cell3_test = train_test_split(X_cell3,test_size=0.2,shuffle=False)
X_cell4_train, X_cell4_test = train_test_split(X_cell4,test_size=0.2,shuffle=False)
X_cell5_train, X_cell5_test = train_test_split(X_cell5,test_size=0.2,shuffle=False)
X_cell6_train, X_cell6_test = train_test_split(X_cell6,test_size=0.2,shuffle=False)
print(X_cell1_train.shape,X_cell1_test.shape)
Y_cell1_train, Y_cell1_test = train_test_split(Y_cell1,test_size=0.2,shuffle=False)
Y_cell2_train, Y_cell2_test = train_test_split(Y_cell2,test_size=0.2,shuffle=False)
Y_cell3_train, Y_cell3_test = train_test_split(Y_cell3,test_size=0.2,shuffle=False)
Y_cell4_train, Y_cell4_test = train_test_split(Y_cell4,test_size=0.2,shuffle=False)
Y_cell5_train, Y_cell5_test = train_test_split(Y_cell5,test_size=0.2,shuffle=False)
Y_cell6_train, Y_cell6_test = train_test_split(Y_cell6,test_size=0.2,shuffle=False)
print(Y_cell1_train.shape,Y_cell1_test.shape)
nt_train = len(Y_cell1_train)
nt_test = len(Y_cell1_test)

In [None]:
### scale the ozone grid cell values to make their errors comparable
### each grid cell should be of equal importance for the final score
### so make sure you always do this
scaler_y_cell1 = StandardScaler().fit(Y_cell1_train[:,np.newaxis])
Y_cell1_train_norm = scaler_y_cell1.transform(Y_cell1_train[:,np.newaxis])
Y_cell1_test_norm = scaler_y_cell1.transform(Y_cell1_test[:,np.newaxis])
#
scaler_y_cell2 = StandardScaler().fit(Y_cell2_train[:,np.newaxis])
Y_cell2_train_norm = scaler_y_cell2.transform(Y_cell2_train[:,np.newaxis])
Y_cell2_test_norm = scaler_y_cell2.transform(Y_cell2_test[:,np.newaxis])
#
scaler_y_cell3 = StandardScaler().fit(Y_cell3_train[:,np.newaxis])
Y_cell3_train_norm = scaler_y_cell3.transform(Y_cell3_train[:,np.newaxis])
Y_cell3_test_norm = scaler_y_cell3.transform(Y_cell3_test[:,np.newaxis])
#
scaler_y_cell4 = StandardScaler().fit(Y_cell4_train[:,np.newaxis])
Y_cell4_train_norm = scaler_y_cell4.transform(Y_cell4_train[:,np.newaxis])
Y_cell4_test_norm = scaler_y_cell4.transform(Y_cell4_test[:,np.newaxis])
#
scaler_y_cell5 = StandardScaler().fit(Y_cell5_train[:,np.newaxis])
Y_cell5_train_norm = scaler_y_cell5.transform(Y_cell5_train[:,np.newaxis])
Y_cell5_test_norm = scaler_y_cell5.transform(Y_cell5_test[:,np.newaxis])
#
scaler_y_cell6 = StandardScaler().fit(Y_cell6_train[:,np.newaxis])
Y_cell6_train_norm = scaler_y_cell6.transform(Y_cell6_train[:,np.newaxis])
Y_cell6_test_norm = scaler_y_cell6.transform(Y_cell6_test[:,np.newaxis])

del Y_cell1_train, Y_cell1_test, Y_cell2_train, Y_cell2_test, Y_cell3_train, Y_cell3_test
del Y_cell4_train, Y_cell4_test, Y_cell5_train, Y_cell5_test, Y_cell6_train, Y_cell6_test

In [None]:
#### plot normalized ozone time series after scaling
plt.figure(figsize=(10,5))
plt.plot(np.arange(0,nt_train),Y_cell1_train_norm,label='cell 1')
plt.plot(np.arange(0,nt_train),Y_cell2_train_norm,label='cell 2')
plt.plot(np.arange(0,nt_train),Y_cell3_train_norm,label='cell 3')
plt.plot(np.arange(0,nt_train),Y_cell4_train_norm,label='cell 4')
plt.plot(np.arange(0,nt_train),Y_cell5_train_norm,label='cell 5')
plt.plot(np.arange(0,nt_train),Y_cell6_train_norm,label='cell 6')
plt.xlabel('Number timesteps (days)',size=16)
plt.ylabel('Ozone mmr (normalized)',size=16)
plt.legend(fontsize=16)
plt.show()

In [None]:
#### depending on the input features and regression function you choose, you might also want to scale the input features
#### this would for example not be necessary for random forests
scaler_x_cell1 = StandardScaler().fit(X_cell1_train[:,:])
X_cell1_train_norm = scaler_x_cell1.transform(X_cell1_train[:,:])
X_cell1_test_norm = scaler_x_cell1.transform(X_cell1_test[:,:])
#
scaler_x_cell2 = StandardScaler().fit(X_cell2_train[:,:])
X_cell2_train_norm = scaler_x_cell2.transform(X_cell2_train[:,:])
X_cell2_test_norm = scaler_x_cell2.transform(X_cell2_test[:,:])
#
scaler_x_cell3 = StandardScaler().fit(X_cell3_train[:,:])
X_cell3_train_norm = scaler_x_cell3.transform(X_cell3_train[:,:])
X_cell3_test_norm = scaler_x_cell3.transform(X_cell3_test[:,:])
#
scaler_x_cell4 = StandardScaler().fit(X_cell4_train[:,:])
X_cell4_train_norm = scaler_x_cell4.transform(X_cell4_train[:,:])
X_cell4_test_norm = scaler_x_cell4.transform(X_cell4_test[:,:])
#
scaler_x_cell5 = StandardScaler().fit(X_cell5_train[:,:])
X_cell5_train_norm = scaler_x_cell5.transform(X_cell5_train[:,:])
X_cell5_test_norm = scaler_x_cell5.transform(X_cell5_test[:,:])
#
scaler_x_cell6 = StandardScaler().fit(X_cell6_train[:,:])
X_cell6_train_norm = scaler_x_cell6.transform(X_cell6_train[:,:])
X_cell6_test_norm = scaler_x_cell6.transform(X_cell6_test[:,:])
### note you still have the non-normalized input features available as X_cell1_train, ...

## Define the six regression models
Example implementation using a typical `sklearn` pipeline.

In [None]:
### we use the same forward-directed TimeSeriesSplit() CV method here
### alternatively you might try KFold() etc
### https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html
cv_obj = TimeSeriesSplit(n_splits=4)
### you might want to see how changing the number of splits affects your results

### now we will define separate sets of hyperparameters for the GridSearch
### define dictionary to collect the best estimators for the six regressions
best_estimators = {}

### set parameters for a typical Ridge regression
alpha_i=[0.03,0.1,0.3,1,10,30,100,300,1000,3000,10000,30000]
parameters = {
    'alpha': alpha_i,
    'fit_intercept': [False,True],
    'max_iter':[1000],
    'random_state':[100]
             }

regressor_type = Ridge()

### Triple step: 1) Regression object defintion, 2) Fit function, 3) Make prediction
### Cell 1
regr1 = GridSearchCV(regressor_type,parameters,cv=cv_obj,n_jobs=-1,refit=True)
regr1.fit(X_cell1_train_norm,Y_cell1_train_norm)
# might want to check error on training data below, to compare to test error 
Y_pred1_train = regr1.best_estimator_.predict(X_cell1_train_norm)
Y_pred1 = regr1.best_estimator_.predict(X_cell1_test_norm)
print(regr1.best_estimator_.alpha)
best_estimators['cell1'] = regr1.best_estimator_
### Cell 2
regr2 = GridSearchCV(regressor_type,parameters,cv=cv_obj,n_jobs=-1,refit=True)
regr2.fit(X_cell2_train_norm,Y_cell2_train_norm)
Y_pred2_train = regr2.best_estimator_.predict(X_cell2_train_norm)
Y_pred2 = regr2.best_estimator_.predict(X_cell2_test_norm)
print(regr2.best_estimator_.alpha)
best_estimators['cell2'] = regr2.best_estimator_
### Cell 3
regr3 = GridSearchCV(regressor_type,parameters,cv=cv_obj,n_jobs=-1,refit=True)
regr3.fit(X_cell3_train_norm,Y_cell3_train_norm)
Y_pred3_train = regr3.best_estimator_.predict(X_cell3_train_norm)
Y_pred3 = regr3.best_estimator_.predict(X_cell3_test_norm)
print(regr3.best_estimator_.alpha)
best_estimators['cell3'] = regr3.best_estimator_
### Cell 4
regr4 = GridSearchCV(regressor_type,parameters,cv=cv_obj,n_jobs=-1,refit=True)
regr4.fit(X_cell4_train_norm,Y_cell4_train_norm)
Y_pred4_train = regr4.best_estimator_.predict(X_cell4_train_norm)
Y_pred4 = regr4.best_estimator_.predict(X_cell4_test_norm)
print(regr4.best_estimator_.alpha)
best_estimators['cell4'] = regr4.best_estimator_
### Cell 5
regr5 = GridSearchCV(regressor_type,parameters,cv=cv_obj,n_jobs=-1,refit=True)
regr5.fit(X_cell5_train_norm,Y_cell5_train_norm)
Y_pred5_train = regr5.best_estimator_.predict(X_cell5_train_norm)
Y_pred5 = regr5.best_estimator_.predict(X_cell5_test_norm)
print(regr5.best_estimator_.alpha)
best_estimators['cell5'] = regr5.best_estimator_
### Cell 6
regr6 = GridSearchCV(regressor_type,parameters,cv=cv_obj,n_jobs=-1,refit=True)
regr6.fit(X_cell6_train_norm,Y_cell6_train_norm)
Y_pred6_train = regr6.best_estimator_.predict(X_cell6_train_norm)
Y_pred6 = regr6.best_estimator_.predict(X_cell6_test_norm)
print(regr6.best_estimator_.alpha)
best_estimators['cell6'] = regr6.best_estimator_

In [None]:
### visualize CV the results for cell 1 in a pandas table
### you can use this to see which parameters are easiest to tune
pd.DataFrame(regr1.cv_results_)

#### A few visualizations of the results for test data + example of error calculation

In [None]:
### a few list that make it easier to iterate through plots etc
lat_list = [cell1_lat[0], cell2_lat[0], cell3_lat[0], cell4_lat[0], cell5_lat[0], cell6_lat[0]]
lon_list = [cell1_lon[0], cell2_lon[0], cell3_lon[0], cell4_lon[0], cell5_lon[0], cell6_lon[0]]
z_list = [cell1_z[0], cell2_z[0], cell3_z[0], cell4_z[0], cell5_z[0], cell6_z[0]]
ypred_list = [Y_pred1, Y_pred2, Y_pred3, Y_pred4, Y_pred5, Y_pred6]
y_norms = [Y_cell1_test_norm, Y_cell2_test_norm, Y_cell3_test_norm, Y_cell4_test_norm, Y_cell5_test_norm, Y_cell6_test_norm]

In [None]:
### plot time series of predictions and the ground truth (i.e. the UKESM simulations) for the test data
### good sanity check for the quality of the fits
plt.rcParams['figure.figsize'] = [25,50]
figs, axs = plt.subplots(6)
for i in range(0,6):
    axs[i].set_title('Cell '+str(i+1)+ '; lat: '+str("%.2f" % lat_list[i])+', lon: '+str("%.2f" % lon_list[i])+', altitude: '+str("%.2f" % z_list[i])+'km',size=20)
    axs[i].plot(np.arange(0,nt_test),y_norms[i],color='k',linewidth=2,label='UKESM data')
    axs[i].plot(np.arange(0,nt_test),ypred_list[i],color='r',linewidth=2,label='ML predictions')
    axs[i].set_xlabel('Timesteps (days)',size=20)
    axs[i].set_ylabel('Ozone mmr (normalized)',size=20)
    axs[i].legend(fontsize=20)
plt.show()

In [None]:
#### measure your testset error for each grid cell individually
err1 = mean_squared_error(Y_pred1,Y_cell1_test_norm)
err1_train = mean_squared_error(Y_pred1_train,Y_cell1_train_norm)
print('Test error grid cell 1: ', "%.5f" % err1, '; Error on training data: ',"%.5f" % err1_train)
err2 = mean_squared_error(Y_pred2,Y_cell2_test_norm)
err2_train = mean_squared_error(Y_pred2_train,Y_cell2_train_norm)
print('Test error grid cell 2: ', "%.5f" % err2, '; Error on training data: ',"%.5f" % err2_train)
err3 = mean_squared_error(Y_pred3,Y_cell3_test_norm)
err3_train = mean_squared_error(Y_pred3_train,Y_cell3_train_norm)
print('Test error grid cell 3: ', "%.5f" % err3, '; Error on training data: ',"%.5f" % err3_train)
err4 = mean_squared_error(Y_pred4,Y_cell4_test_norm)
err4_train = mean_squared_error(Y_pred4_train,Y_cell4_train_norm)
print('Test error grid cell 4: ', "%.5f" % err4, '; Error on training data: ',"%.5f" % err4_train)
err5 = mean_squared_error(Y_pred5,Y_cell5_test_norm)
err5_train = mean_squared_error(Y_pred5_train,Y_cell5_train_norm)
print('Test error grid cell 5: ', "%.5f" % err5, '; Error on training data: ',"%.5f" % err5_train)
err6 = mean_squared_error(Y_pred6,Y_cell6_test_norm)
err6_train = mean_squared_error(Y_pred6_train,Y_cell6_train_norm)
print('Test error grid cell 6: ', "%.5f" % err6, '; Error on training data: ',"%.5f" % err6_train)

In [None]:
### calculate average error for all grid cells
err_average = (err1+err2+err3+err4+err5+err6)/6.0
print('This is the final score you are trying to improve (but be aware of overfitting the test data)')
print(err_average)

### Your turn!

Implement your own three solutions in the cells below.

In [None]:
# Model 1
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Model 2
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Model 3
# YOUR CODE HERE
raise NotImplementedError()

### Save your best model

Note: save the model(s) that you think is (are) best only. It should be clear from the dictionary you save which regression belongs to which grid cell.

In addition, make sure to add an entry of which predictors you used, as we do below. For transformations, name the additional predictors sensibly and also alert us to this in your notebook.

In [None]:
print(best_estimators)

In [None]:
### add predictor names to dictionary
best_estimators['predictors'] = columns
### specify a sensible group name for your submission
### CHANGE
group_name = 'your_names_final'
### save the model that you think is BEST only.
### for most model types you should be able to use joblib
### and the dictionary format used in the example above
joblib.dump(best_estimators, './model_objects/'+group_name+'_model.pkl')

## Three ways to fit a feedforward neural network

Three different examples written using `sklearn`, `keras`, and `pytorch`. 

#### sklearn

See also Worksheet 1. You can find the function documentation [here](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html), including an explanation of the many hyperparameters.

In [None]:
# https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html
#
reg_type = MLPRegressor()
parameters = {
    'hidden_layer_sizes': [(10,10),(100,100)],
    'activation': ['tanh','relu'],
    'alpha': [0,1e-3,1e-2],
    'shuffle': [False],
    'solver': ['adam'],
    'max_iter': [1000]
}
regr_cv = GridSearchCV(reg_type,parameters,cv=cv_obj,n_jobs=-1,refit=True)
regr_cv.fit(X_cell1_train_norm, Y_cell1_train_norm[:,0])
y_pred_NN = regr_cv.best_estimator_.predict(X_cell1_test_norm)
mse_NN = mean_squared_error(Y_cell1_test_norm,y_pred_NN)
print('MLP regressor sklearn error on test data: ',round(mse_NN,5))
plt.rcParams["figure.figsize"] = (8,6)
plt.plot(np.arange(0,y_pred_NN.shape[0]),Y_cell1_test_norm,color='k',label='Ground truth')
plt.plot(np.arange(0,y_pred_NN.shape[0]),y_pred_NN,color='r',label='Predictions')
plt.legend()
plt.show()
# pd.DataFrame(regr_cv.cv_results_)

#### Keras

Keras is a powerful yet easy-to-use Python library for developing and evaluating deep learning models.

It is part of the `TensorFlow` library and can also use a `Theano` backend. There are many good online tutorials. If you have never used it before, this [blog post](https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/) by Jason Brownlee provides a nice walk-through. 

In [None]:
### define a simple two-layer neural network using keras with the tensorflow backend
### models in Keras are defined as a sequence of network layers
### So we create a 'Sequential' model https://keras.io/models/sequential/
### or so for example tutorials like this one here https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
### concerning how to find the optimal number of layers, see e.g. https://machinelearningmastery.com/how-to-configure-the-number-of-layers-and-nodes-in-a-neural-network/
def create_NN(dropout_rate=0.0,kernel_reg1=0.0,bias_reg1=0.0,kernel_reg2=0.0,bias_reg2=0.0):
    # print(kernel_reg1,bias_reg1,kernel_reg2,bias_reg2,'Regularization parameters')
    # print(dropout_rate,'dropout rate')
    model = Sequential()
    ### the first hidden layer receives the inputs. This must allow as many inputs in as we have input features
    ### we use the relu activation function. Feel free to try out others, see https://keras.io/activations/
    ### for kernel_initializers see https://keras.io/initializers/
    ### concerning the role of weight/bias regularizers, see e.g. 
    ### here: https://stats.stackexchange.com/questions/383310/difference-between-kernel-bias-and-activity-regulizers-in-keras
    ### e.g. you could also use an activity regularizer that works on the loss function itself - just as we did for Ridge regression
    nodes_layer1 = 8
    nodes_layer2 = 8
    ### fully connected layers are defined using the Dense class
    model.add(Dense(nodes_layer1,kernel_initializer='random_uniform',kernel_regularizer=l2(kernel_reg1),
                    bias_regularizer=l2(bias_reg1),input_dim=(X_cell1_train_norm.shape[1]),activation='relu'))
    ### we can use dropout regularization; default is none here
    model.add(Dropout(dropout_rate))
    ### then we add a second hidden layer
    model.add(Dense(nodes_layer2,kernel_initializer='random_uniform',kernel_regularizer=l2(kernel_reg2),
                    bias_regularizer=l2(bias_reg2),activation='relu'))
    ### then we define the connection to the outputs; if you run a classification you would want to make this e.g. sigmoid or softmax
    ### function for sure; here you could as well use a linear activation function; we have one output -> each grid cell
    ### activation = ['softmax', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'linear']
    model.add(Dense(1,kernel_initializer='random_uniform',activation='linear'))
    ### now construct the model. Define a loss function https://keras.io/losses/
    ### you can also choose different gradient descent optimizers and metrics to evaluate your models' performance 
    ### https://keras.io/metrics/
    ### metrics are similar to a loss function, except that the results from evaluating a metric 
    ### not used when training the model. 
    ### You may use any of the loss functions as a metric function.
    model.compile(loss='mean_squared_error',optimizer='adam',metrics=['mse'])
    return model
### define neural network parameters for cross-validation    
### for more tuning parameters see for example: https://machinelearningmastery.com/grid-search-hyperparameters-deep-learning-models-python-keras/
### https://machinelearningmastery.com/how-to-reduce-overfitting-in-deep-learning-with-weight-regularization/
parameters_NN = {
        'epochs': [2,10],
        'batch_size': [10],
        'dropout_rate': [0,0.2],
        'kernel_reg1': [0.00],
        'bias_reg1': [0.00],
        'kernel_reg2': [0.00],
        'bias_reg2': [0.00]
        
}
model_NN = KerasRegressor(build_fn=create_NN,verbose=10)
regr1 = GridSearchCV(model_NN,parameters_NN,cv=cv_obj,n_jobs=-1,refit=True)
regr1.fit(X_cell1_train_norm,Y_cell1_train_norm)
history = regr1.best_estimator_.model.history
Y_pred1 = regr1.best_estimator_.predict(X_cell1_test_norm)
### one might also want to test the predictions on the training dataset. Do we overfit in comparison to the test data?
Y_pred1_train = regr1.best_estimator_.predict(X_cell1_train_norm)
print(regr1.best_estimator_)
# best_estimators['cell1'] = regr1.best_estimator_
# plot history
# summarize history for loss
plt.plot(history.history['loss'], label='loss')
plt.xlabel('No epochs',size=16)
plt.ylabel('Error',size=16)
plt.legend()
plt.show()
plt.plot(np.arange(0,Y_pred1.shape[0]),Y_cell1_test_norm,color='k',label='Ground truth')
plt.plot(np.arange(0,Y_pred1.shape[0]),Y_pred1,color='r',label='Predictions')
plt.legend()
plt.show()
mse_NN = mean_squared_error(Y_cell1_test_norm,Y_pred1)
print('Keras regressor error on test data: ',round(mse_NN,5))

#### PyTorch

PyTorch is a Deep Learning framework developed by Facebook’s AI Research Lab. 

Once again, there are many great online tutorials, see e.g. its own [quickstart tutorial](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html). Below, we provide an example implementation combined with `skorch`, which allows for the use of `sklearn` cross-validation methods. Forr detailed explanations see for example [this blog post](https://machinelearningmastery.com/use-pytorch-deep-learning-models-with-scikit-learn/).

Feel free to use the Hackathon to try out your own ideas and network architectures! Data science always involves independent experimentation, and online searches for interesting network designs...

In [None]:
### need to convert data to PyTorch tensor
X_cell1_train_norm_pyT = torch.tensor(X_cell1_train_norm, dtype=torch.float32)
X_cell1_test_norm_pyT = torch.tensor(X_cell1_test_norm, dtype=torch.float32)
Y_cell1_train_norm_pyT = torch.tensor(Y_cell1_train_norm, dtype=torch.float32)
Y_cell1_test_norm_pyT = torch.tensor(Y_cell1_test_norm, dtype=torch.float32)
### training parameters
### define the model
class OzoneRegressor(nn.Module):
    def __init__(self, dropout_rate=0.5, weight_constraint=1.0, nodes_layer1 = 8, nodes_layer2 = 8):
        super().__init__()
        self.layer1 = nn.Linear(X_cell1_train_norm.shape[1], nodes_layer1)
        self.act1 = nn.ReLU()
        self.layer2 = nn.Linear(nodes_layer1, nodes_layer2)
        self.act2 = nn.ReLU()
        self.dropout2 = nn.Dropout(dropout_rate)
        self.output = nn.Linear(nodes_layer2, 1)
        self.weight_constraint = weight_constraint
        
    def forward(self, x):
        x = self.act1(self.layer1(x))
        x = self.act2(self.layer2(x))
        x = self.dropout2(x)
        x = self.output(x)
        return x
 
# create model with skorch
model = NeuralNetRegressor(
    OzoneRegressor,
    criterion=nn.MSELoss(),
    optimizer=optim.Adam,
    max_epochs=100,
    batch_size=360,
    verbose=False
)
 
# define the grid search parameters, thanks to skorch we can use GridSearchCV
param_grid = {
    'module__weight_constraint': [0.0,0.1],
    'module__dropout_rate': [0.0,0.1],
    'module__nodes_layer1': [8,16],
    'module__nodes_layer2': [8,16]
}
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X_cell1_train_norm_pyT, Y_cell1_train_norm_pyT)
 
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))
 
Y_pred1 = grid.best_estimator_.predict(X_cell1_test_norm_pyT)
plt.plot(np.arange(0,Y_pred1.shape[0]),Y_cell1_test_norm,color='k',label='Ground truth')
plt.plot(np.arange(0,Y_pred1.shape[0]),Y_pred1,color='r',label='Predictions')
plt.legend()
plt.show()
mse_NN = mean_squared_error(Y_cell1_test_norm,Y_pred1)
print('PyTorch regressor error on test data: ',round(mse_NN,5))