# ASMSA: Tune AAE model hyperparameters

**Previous step**
- [prepare.ipynb](prepare.ipynb): Download and sanity check input files

**Next steps**
- [train.ipynb](train.ipynb): Use results of previous tuning in more thorough training
- [md.ipynb](md.ipynb): Use a trained model in MD simulation with Gromacs

## Notebook setup

In [None]:
threads = 2
import os
os.environ['OMP_NUM_THREADS']=str(threads)
import tensorflow as tf

# PyTorch favours OMP_NUM_THREADS in environment
import torch

# Tensorflow needs explicit cofig calls
tf.config.threading.set_inter_op_parallelism_threads(threads)
tf.config.threading.set_intra_op_parallelism_threads(threads)

In [None]:
import tensorflow_probability as tfp
import matplotlib.pyplot as plt
import mdtraj as md
import numpy as np
import urllib.request
from tensorflow import keras
import keras_tuner
import asmsa
from datetime import datetime
import tensorflow as tf

## Input files

All input files are prepared (up- or downloaded) in [prepare.ipynb](prepare.ipynb). 


In [None]:
exec(open('inputs.py').read())

## Load dataset

In [None]:
# load train dataset
X_train = tf.data.Dataset.load('datasets/intcoords/train')
X_train_np = np.stack(list(X_train))
X_train_np = X_train_np[:int(0.5*len(X_train_np))]

# load validation dataset
X_validate = tf.data.Dataset.load('datasets/intcoords/validate')
X_validate_np = np.stack(list(X_validate))

X_train_np.shape, X_validate_np.shape

## Hyperparameter definition
Specify hyperparameter ranges

In [None]:
medium_hp = {
    'activation' : ['relu','gelu','selu'],
    'ae_neuron_number_seed' : [32,96,128],
    'disc_neuron_number_seed' : [32,96],
    'ae_number_of_layers' : [2,5],
    'disc_number_of_layers' : [2,5],
    'batch_size' : [64,128,256,512],
    'optimizer' : ['Adam'],
    'learning_rate' : 0.0002,
    'ae_loss_fn' : ['MeanSquaredError'],
    'disc_loss_fn' : ['BinaryCrossentropy']
}

tiny_hp = {
    'activation' : ['relu'],
    'ae_neuron_number_seed' : [32,96],
    'disc_neuron_number_seed' : [32,96],
    'ae_number_of_layers' : [2,2],
    'disc_number_of_layers' : [3,3],
    'batch_size' : [64,128,256],
    'optimizer' : ['Adam'],
    'learning_rate' : 0.0002,
    'ae_loss_fn' : ['MeanSquaredError'],
    'disc_loss_fn' : ['BinaryCrossentropy']
}

## Sequential hyperparameter tuning

This is robust, it does not require Kubernetes environment for additional job submission but GPU is strongly recommended in the notebook itself to get reasonable speed, not requiring the following (currently broken) parallel tuning section.


In [None]:
# Just testing numbers of epochs and hyperparameter setting trials
# Don't expect anything meaningful
trials=50
epochs=30

# Set RESULTS_DIR env variable for results of tuning
os.environ['RESULTS_DIR'] = datetime.today().strftime("%m%d%Y-%H%M%S")
tuner = keras_tuner.RandomSearch(
    max_trials=trials,
    hypermodel=
        asmsa.AAEHyperModel(
            (X_validate_np.shape[1],),
            hp=medium_hp,
            prior=tfp.distributions.Normal(loc=0, scale=1)),
    objective=keras_tuner.Objective("score", direction="min"),
    directory="./results",
    project_name="Random",
    overwrite=True
)

In [None]:
tuner.search(train=X_train_np,validation=X_validate_np,epochs=epochs,verbose=2)

In [None]:
from asmsa.tuning_analyzer import TuningAnalyzer

# Create analyzer object that analyses results of tuning
# By default it is the latest tuning, but can by choosen with tuning flag,
#  e.g TuningAnalyzer(tuning='analysis/05092023-135249')
analyzer = TuningAnalyzer()

In [None]:
# Get sorted hyperparameters by score, by default 10 best HP, for different number:
#  analyzer.get_best_hp(num_trials=3)
analyzer.get_best_hp()

In [None]:
# Matplotlib visualization - not recommended way, does not look that good and does not scale 
#  that well but at least the colors are consistent accross measures. After more work could look better
# - By default visualizing best 10 trials
# - Can specify only one specific trial... analyzer.visualize_tuning(trial='15d9fa928a7517004bcb28771bb6e5f17ad66dd7013c6aa1572a84773d91393c')
# - Can specify number of best trials to be visualized... analyzer.visualize_tuning(num_trials=3)
analyzer.visualize_tuning()

In [None]:
# Recommended option via Tensorboard. This function populates TB event
#  which can be viewed in native way via Tensorboard. 
# May not work in all Jupyterhub setups, though.

# By default it chooses latest tuning and populates into its directory _TB, e.g: analysis/05092023-135249/_TB
# - Can override directory where to populate... analyzer.populate_TB(out_dir='MyTBeventDir')
# - Can choose only specific trials via list... analyzer.populate_TB(trials=['15d9fa928a7517004bcb28771bb6e5f17ad66dd7013c6aa1572a84773d91393c']),
# - Can select how many best trials to be visualized... analyzer.populate_TB(num_trials=3)
analyzer.populate_TB(num_trials=3)

In [None]:
%load_ext tensorboard
%tensorboard --logdir analysis

## Parallel hyperparameter tuning

**BROKEN**, ignore the rest of this notebook for the time being

In [None]:
# Finally, this is the real stuff
# medium settings known to be working for trpcage

epochs=15
trials=3
hp=medium_hp

# testing only
#epochs=8
#trials=6
#hp=tiny_hp

In [None]:
# number of parallel workers, each runs a single trial at time
# balance between resource availability and size of the problem
# currently each slave runs on 4 cores and 4 GB RAM (hardcoded in src/asmsa/tunewrapper.py)

slaves=3

In [None]:
# XXX: Kubernetes magic: find out names of container image and volume
# check the result, it can go wrong

with open('IMAGE') as img:
    image=img.read().rstrip()

import re
mnt=os.popen('mount | grep /home/jovyan').read()
pvcid=re.search('pvc-[0-9a-z-]+',mnt).group(0)
pvc=os.popen(f'kubectl get pvc | grep {pvcid} | cut -f1 -d" "').read().rstrip()

print(f"""\
image: {image}
volume: {pvc}
""")

In [None]:
# Python wrapper around scripts that prepare and execute parellel Keras Tuner in Kubernetes
from asmsa.tunewrapper import TuneWrapper

wrapper = TuneWrapper(ds=X_validate_np,hp=hp,output=datetime.today().strftime("%m%d%Y-%H%M%S"),epochs=epochs,trials=trials,pdb=conf,top=topol,xtc=traj,ndx=index, pvc=pvc)

In [None]:
# Necessary but destructive cleanup before hyperparameter tuning

# DON'T RUN THIS CELL BLINDLY
# it kills any running processes including the workers, and it purges previous results

!kubectl delete job/tuner
!kill $(ps ax | grep tuning.py | awk '{print $1}')
!rm -rf results

In [None]:
# start the master (chief) of tuners in background
# the computation takes rather long, this is a more robust approach then keeping it in the notebook

wrapper.master_start()

In [None]:
# therefore one should check the status ocassionally; it should show a tuning.py process running
print(wrapper.master_status())

In [None]:
# spawn the requested number of workers as separate Kubernetes job with several pods 
# they receive work from 

wrapper.workers_start(num=slaves)

In [None]:
# This status should show {slaves} number of pods, all of them start in Pending state, and follow through ContainerCreating 
# to Running, and Completed finally

# This takes time, minutes to hours depending on size of the model, number of trials, and number of slaves
# Run this cell repeatedly, waiting until all the pods are completed

wrapper.workers_status()

In [None]:
# Same steps for analysis as with serial tuning
analyzer = TuningAnalyzer()
analyzer.get_best_hp()

In [None]:
# We can choose output dir for TB event this time
out = 'dist_tuning'

analyzer.populate_TB(out_dir=out)

In [None]:
# Might need to kill previous tensorboard instance to change logdir
!pkill -f 'tensorboard'

%load_ext tensorboard
%tensorboard --logdir $out