# MLOps Walkthrough - Model Development


## How to train your ~~dragon🐉~~ machine learning model 🤖 (reproducibly, efficiently at scale)

### Key Principles
* reusable components
* running at scale
* keeping track of experiements for reporoducability


### Key Tasks for RSEs
* Setting up infrastrcuture for training models
* Facilitating running code on ML targeted platforms e.g. GPU, TPU etc.
* Support good practices for ML development e.g. experiment tracking
* Applying FAIR principles to all ML assets (data, code, trained models)
* applying good code management practices to ML code
* setting uop test suites for ML pipelines


### Key Terms
* Experiment tracking
* machine learning pipeline
* train/test split
* training
* model architecture
* hyper parameters
* hyper parameter tuning

### Key Tools
* ML frameworks (scikit-learn, tensorflow, pytorch)
* Workflow tools (ray)
* Experiment tracking (mlflow)

### Running this notebook
This notebook should run from a conda environment created with the [requirements_model_development.yml file](requirements_model_development.yml). See the readme file for info on how to set up a conda environment for using this notebook.

## Example problem - Predicting wind rotors

brief intro - refer to data prep notebook
what are we doing in this notebook


In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
import pathlib
import datetime
import os
import functools
import math

In [3]:
import matplotlib
%matplotlib inline

In [4]:
import numpy
import pandas
import dask

In [5]:
import sklearn
import sklearn.preprocessing
import sklearn.model_selection

In [6]:
import tensorflow

import tensorflow.keras
import tensorflow.keras.layers
import tensorflow.keras.models
import tensorflow.keras.optimizers
import tensorflow.keras.metrics
import tensorflow.keras.layers
import tensorflow.keras.constraints

In [7]:
import tensorboard

In [8]:
# Load the TensorBoard notebook extension
%load_ext tensorboard

In [9]:
import mlflow
mlflow.tensorflow.autolog()



In [10]:
import intake

In [11]:
mlflow_dash_port = 5001
tensorboard_dash_port = 5002
ray_dash_port = 5003

### Load the data

In [12]:
try:
    rse_root_data_dir = pathlib.Path(os.environ['RSE22_ROOT_DATA_DIR'])
    print('reading from environment variable')
except KeyError as ke1:
    rse_root_data_dir = pathlib.Path(os.environ['HOME'])  / 'data' / 'ukrse2022'
    print('using default path')
rse_root_data_dir

using default path


PosixPath('/Users/stephen.haddad/data/ukrse2022')

In [13]:
rotors_catalog = intake.open_catalog(rse_root_data_dir / 'rotors_catalog.yml')
rotors_catalog 

rotors_catalog:
  args:
    path: /Users/stephen.haddad/data/ukrse2022/rotors_catalog.yml
  description: ''
  driver: intake.catalog.local.YAMLFileCatalog
  metadata: {}


In [14]:
list(rotors_catalog)

['rotors', 'rotors_preprocessed']

We see that our catalog contains preprocessed data ready to use in our machine learning development pipeline.

In [15]:
rotors_df = rotors_catalog['rotors_preprocessed'].read()

In [16]:
rotors_df

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,air_temp_obs,dewpoint_obs,wind_direction_obs,wind_speed_obs,wind_gust_obs,air_temp_1,air_temp_2,air_temp_3,...,v_wind_18,u_wind_19,v_wind_19,u_wind_20,v_wind_20,u_wind_21,v_wind_21,u_wind_22,v_wind_22,time
0,1,1,283.9,280.7,110.0,4.1,-9999999.0,284.000,283.625,283.250,...,5.756768,-1.953409,5.673111,-2.674064,5.482644,-3.000000e+00,5.196152,-2.987221,4.971570,2015-01-01 00:00:00
1,2,2,280.7,279.7,90.0,7.7,-9999999.0,281.500,281.250,280.750,...,6.502872,-1.460878,5.094687,-0.790064,3.716961,-7.837740e-16,3.200000,0.727691,3.423517,2015-01-01 03:00:00
2,3,3,279.8,278.1,100.0,7.7,-9999999.0,279.875,279.625,279.125,...,5.481273,-1.423505,5.312592,-0.174497,4.996954,7.293223e-01,4.136193,2.462646,3.152043,2015-01-01 06:00:00
3,4,4,279.9,277.0,120.0,7.2,-9999999.0,279.625,279.250,278.875,...,2.475770,-1.311123,3.245143,-0.407661,3.878635,6.883116e-01,4.345829,1.723190,4.265046,2015-01-01 09:00:00
4,5,5,279.9,277.4,120.0,8.7,-9999999.0,279.250,278.875,278.375,...,-0.775695,-1.997259,0.104672,-1.928942,1.252670,-1.287595e+00,2.142918,-0.899056,2.225241,2015-01-01 12:00:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17481,20101,20101,276.7,275.5,270.0,3.6,-9999999.0,277.875,277.750,277.625,...,-8.555992,-8.047581,-8.629974,-7.479073,-8.603689,-7.111320e+00,-8.781749,-6.538771,-9.338333,2020-12-31 06:00:00
17482,20102,20102,277.9,276.9,270.0,3.1,-9999999.0,277.875,277.625,277.875,...,-6.956383,-8.273280,-6.942106,-8.886116,-7.456336,-8.995651e+00,-8.388580,-8.029567,-8.917738,2020-12-31 09:00:00
17483,20103,20103,283.5,277.1,220.0,3.6,-9999999.0,281.125,280.625,280.125,...,-8.332875,-7.326372,-9.377328,-8.397556,-9.660283,-7.962654e+00,-8.843423,-7.495332,-7.495332,2020-12-31 12:00:00
17484,20104,20104,286.1,276.9,250.0,3.6,-9999999.0,284.625,284.125,283.625,...,-6.646804,-5.294689,-6.776892,-4.398330,-7.038799,-5.356255e+00,-6.855694,-7.265332,-7.016050,2020-12-31 15:00:00


In [17]:
list(rotors_df.columns)

['Unnamed: 0',
 'Unnamed: 0.1',
 'air_temp_obs',
 'dewpoint_obs',
 'wind_direction_obs',
 'wind_speed_obs',
 'wind_gust_obs',
 'air_temp_1',
 'air_temp_2',
 'air_temp_3',
 'air_temp_4',
 'air_temp_5',
 'air_temp_6',
 'air_temp_7',
 'air_temp_8',
 'air_temp_9',
 'air_temp_10',
 'air_temp_11',
 'air_temp_12',
 'air_temp_13',
 'air_temp_14',
 'air_temp_15',
 'air_temp_16',
 'air_temp_17',
 'air_temp_18',
 'air_temp_19',
 'air_temp_20',
 'air_temp_21',
 'air_temp_22',
 'sh_1',
 'sh_2',
 'sh_3',
 'sh_4',
 'sh_5',
 'sh_6',
 'sh_7',
 'sh_8',
 'sh_9',
 'sh_10',
 'sh_11',
 'sh_12',
 'sh_13',
 'sh_14',
 'sh_15',
 'sh_16',
 'sh_17',
 'sh_18',
 'sh_19',
 'sh_20',
 'sh_21',
 'sh_22',
 'winddir_1',
 'windspd_1',
 'winddir_2',
 'windspd_2',
 'winddir_3',
 'windspd_3',
 'winddir_4',
 'windspd_4',
 'winddir_5',
 'windspd_5',
 'winddir_6',
 'windspd_6',
 'winddir_7',
 'windspd_7',
 'winddir_8',
 'windspd_8',
 'winddir_9',
 'windspd_9',
 'winddir_10',
 'windspd_10',
 'winddir_11',
 'windspd_11',
 'winddi

In [18]:
# one small bit of cleaning: ensuring the correct datetime type for our time feature
rotors_df['time'] = pandas.to_datetime(rotors_df['time'])

In [19]:

temp_feature_names = [f'air_temp_{i1}' for i1 in range(1,23)]
humidity_feature_names = [f'sh_{i1}' for i1 in range(1,23)]
wind_direction_feature_names = [f'winddir_{i1}' for i1 in range(1,23)]
wind_speed_feature_names = [f'windspd_{i1}' for i1 in range(1,23)]
u_wind_feature_names = [f'u_wind_{i1}' for i1 in range(1,23)]
v_wind_feature_names = [f'v_wind_{i1}' for i1 in range(1,23)]
target_feature_name = 'rotors_present'

### Train/test split

Split based on year to avoid correlations between train and test sets.

In [20]:
train_df = rotors_df[rotors_df['time'] < datetime.datetime(2020,1,1,0,0)]
val_df = rotors_df[rotors_df['time'] > datetime.datetime(2020,1,1,0,0)]

In [21]:
input_feature_names = temp_feature_names + humidity_feature_names + u_wind_feature_names + v_wind_feature_names

In [22]:
preproc_dict = {}
for if1 in input_feature_names:
    scaler1 = sklearn.preprocessing.StandardScaler()
    scaler1.fit(train_df[[if1]])
    preproc_dict[if1] = scaler1

In [23]:
target_encoder = sklearn.preprocessing.LabelEncoder()
target_encoder.fit(train_df[[target_feature_name]])

LabelEncoder()

In [24]:
def preproc_input(data_subset, pp_dict):
    return numpy.concatenate([scaler1.transform(data_subset[[if1]]) for if1,scaler1 in pp_dict.items()],axis=1)

def preproc_target(data_subset, enc1):
     return enc1.transform(data_subset[[target_feature_name]])

In [25]:
X_train = preproc_input(train_df, preproc_dict)
y_train = numpy.concatenate(
    [preproc_target(train_df, target_encoder).reshape((-1,1)),
    1.0 - (preproc_target(train_df, target_encoder).reshape((-1,1))),],
    axis=1
)

In [26]:
X_val = preproc_input(val_df, preproc_dict)
y_val = numpy.concatenate(
    [preproc_target(val_df, target_encoder).reshape((-1,1)),
    1.0 - (preproc_target(val_df, target_encoder).reshape((-1,1))),],
    axis=1
)

### Set up experiment tracking - ML Flow

In [27]:
rse_rotors_experiment_name = 'rse_mlops_demo_rotors'

In [28]:
timestamp_template = '{dt.year:04d}{dt.month:02d}{dt.day:02d}T{dt.hour:02d}{dt.minute:02d}{dt.second:02d}'

In [29]:
rse_run_name_template = 'rse_rotors_{network_name}_' + timestamp_template

In [30]:
mlflow_server_address = '127.0.0.1'
mlflow_server_port = mlflow_dash_port
mlflow_server_uri = f'http://{mlflow_server_address}:{mlflow_server_port:d}'
mlflow_server_uri

'http://127.0.0.1:5001'

In [31]:
mlflow.set_tracking_uri(mlflow_server_uri)

In [32]:
try: 
    print('creating experiment')
    rse_rotors_exp_id = mlflow.create_experiment(rse_rotors_experiment_name)
    rse_rotors_exp = mlflow.get_experiment(rse_rotors_exp_id)
except mlflow.exceptions.RestException:
    rse_rotors_exp = mlflow.get_experiment_by_name(rse_rotors_experiment_name)
rse_rotors_exp



creating experiment


<Experiment: artifact_location='/Users/stephen.haddad/data/ukrse2022/artifacts/1', experiment_id='1', lifecycle_stage='active', name='rse_mlops_demo_rotors', tags={}>

### Set up model architecture hyperparameters

In [33]:
def build_ffnn_model(hyperparameters, input_shape):
    """
    Build a feed forward neural network model in tensorflow for predicting the occurence of turbulent orographically driven wind gusts called Rotors.
    """
    model = tensorflow.keras.models.Sequential()
    model.add(tensorflow.keras.layers.Dropout(hyperparameters['drop_out_rate'], 
                                              input_shape=input_shape))
    for i in numpy.arange(0,hyperparameters['n_layers']):
        model.add(tensorflow.keras.layers.Dense(hyperparameters['n_nodes'], 
                                                activation=hyperparameters['activation'], 
                                                kernel_constraint=tensorflow.keras.constraints.max_norm(3)))
        model.add(tensorflow.keras.layers.Dropout(hyperparameters['drop_out_rate']))
    model.add(tensorflow.keras.layers.Dense(2, activation='softmax'))             # This is the output layer
    return model


In [34]:
nx = X_train.shape[1]
input_shape = (nx,)

In [35]:
hyperparameters_dict = {
    'initial_learning_rate': 1.0e-4,
    'drop_out_rate': 0.2,
    'n_epochs': 100,
    'batch_size': 1000,
    'n_nodes': 1000,
    'n_layers': 4,
    'activation': 'relu',
    'loss': 'mse'
}

### Train model and setup monitoring

Tools:
* tensorboard
* tensorflow
* mlflow


In [36]:
log_dir_tensorboard = rse_root_data_dir / 'log_tensorboard' 
if not log_dir_tensorboard.is_dir():
    log_dir_tensorboard.mkdir()
    print(f'created tensorboard log directory {log_dir_tensorboard}')    
log_dir_tensorboard

PosixPath('/Users/stephen.haddad/data/ukrse2022/log_tensorboard')

In [37]:
tensorboard_callback = tensorflow.keras.callbacks.TensorBoard(log_dir=log_dir_tensorboard, 
                                                              histogram_freq=1)

In [38]:
log_dir_tensorboard

PosixPath('/Users/stephen.haddad/data/ukrse2022/log_tensorboard')

In [39]:
%tensorboard --logdir /Users/stephen.haddad/data/ukrse2022/log_tensorboard

In [40]:
%time 
current_run_name = rse_run_name_template.format(network_name='ffnn',
                                                dt=datetime.datetime.now()
                                               )
with mlflow.start_run(experiment_id=rse_rotors_exp.experiment_id, run_name=current_run_name) as current_run:
    rotors_ffnn_model = build_ffnn_model(hyperparameters=hyperparameters_dict,
                                     input_shape=input_shape,
                                    )
    rotors_ffnn_optimizer = tensorflow.optimizers.Adam(
        learning_rate=hyperparameters_dict['initial_learning_rate'])  
    
    rotors_ffnn_model.compile(optimizer=rotors_ffnn_optimizer, 
                          loss=hyperparameters_dict['loss'], 
                          metrics=[tensorflow.keras.metrics.RootMeanSquaredError()])
    
    history=rotors_ffnn_model.fit(
        X_train, 
        y_train, 
        validation_data=(X_val, 
                          y_val), 
        epochs=hyperparameters_dict['n_epochs'], 
        batch_size=hyperparameters_dict['batch_size'], 
        shuffle=True,
        verbose=0,
        callbacks=[tensorboard_callback],
    )    
    

CPU times: user 3 µs, sys: 1 µs, total: 4 µs
Wall time: 7.87 µs


2022-08-17 19:08:52.222486: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-17 19:12:25.175249: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.


INFO:tensorflow:Assets written to: /var/folders/w0/2x361bn95wj7lfgl33vksx1w0000gn/T/tmpl9fy9nkn/model/data/model/assets


In [44]:
rse_rotors_exp

<Experiment: artifact_location='/Users/stephen.haddad/data/ukrse2022/artifacts/1', experiment_id='1', lifecycle_stage='active', name='rse_mlops_demo_rotors', tags={}>

#  Example do hyperparameteer tuning in ray

https://docs.ray.io/en/latest/tune/examples/tune_mnist_keras.html#tune-mnist-keras


In [None]:
import ray
import ray.tune
import ray.tune.schedulers 
import ray.tune.integration.keras

In [None]:
import rotors_hpt

In [None]:
import importlib

In [None]:
importlib.reload(rotors_hpt)

In [None]:
num_training_terations=20

In [None]:
ray.init(num_cpus=4, dashboard_port=ray_dash_port, dashboard_host='127.0.0.1')


In [None]:
# rse_rotors_sched = 
# set up local ray cluster

In [None]:
rotors_hpt_config = {
    'initial_learning_rate': ray.tune.uniform(1e-5,1e-3),
    'n_nodes': ray.tune.randint(100,500),
    'n_layers': ray.tune.randint(2,6)
}

In [None]:
rotors_hpt_analysis = ray.tune.run(
    rotors_hpt.run_ml_pipeline,
    name="rse_rotors_hpt",
    scheduler=ray.tune.schedulers.AsyncHyperBandScheduler(
        time_attr="training_iteration", 
        max_t=400, 
        grace_period=20,
    ),
    metric="root_mean_squared_error",
    mode="max",
    stop={"root_mean_squared_error": 0.99, 
          "training_iteration": num_training_terations},
    num_samples=10,
    resources_per_trial={"cpu": 2,
                         "gpu": 0},
    config=rotors_hpt_config,
    progress_reporter=ray.tune.JupyterNotebookReporter(overwrite=True),
    )

In [None]:
rotors_hpt_analysis.

# Example save the model 

* use ONNX
* Use MLFlow


In [None]:
# save model in ml flow

In [None]:
# stretch goal - save model in ONNX?

### Next Steps / Further Reading

### References
* mlflow
* scikit-learn
* tensorflow
* tensorboard