# ASMSA: Tune AAE model hyperparameters

**Previous step**
- [prepare.ipynb](prepare.ipynb): Download and sanity check input files

**Next steps**
- [train.ipynb](train.ipynb): Use results of previous tuning in more thorough training
- [md.ipynb](md.ipynb): Use a trained model in MD simulation with Gromacs

## Notebook setup

In [1]:
#%cd villin

In [2]:
threads = 2
import os
os.environ['OMP_NUM_THREADS']=str(threads)
import tensorflow as tf

# PyTorch favours OMP_NUM_THREADS in environment
import torch

# Tensorflow needs explicit cofig calls
tf.config.threading.set_inter_op_parallelism_threads(threads)
tf.config.threading.set_intra_op_parallelism_threads(threads)

2025-04-14 14:20:37.039328: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-04-14 14:20:37.054224: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-04-14 14:20:37.058703: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-04-14 14:20:37.071726: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [3]:
import tensorflow_probability as tfp
import matplotlib.pyplot as plt
import mdtraj as md
import numpy as np
import urllib.request
from tensorflow import keras
import keras_tuner
import asmsa
from datetime import datetime
import tensorflow as tf

2025-04-14 14:20:40.472373: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1204 MB memory:  -> device: 0, name: NVIDIA A100 80GB PCIe MIG 1g.10gb, pci bus id: 0000:61:00.0, compute capability: 8.0


## Input files

All input files are prepared (up- or downloaded) in [prepare.ipynb](prepare.ipynb). 


In [4]:
exec(open('inputs-des.py').read())

## Load dataset

In [5]:
# load train dataset
X_train = tf.data.Dataset.load('datasets/intcoords/train')
X_train_np = np.stack(list(X_train))
X_train_np = X_train_np[:int(0.5*len(X_train_np))]

# load validation dataset
X_validate = tf.data.Dataset.load('datasets/intcoords/validate')
X_validate_np = np.stack(list(X_validate))

X_train_np.shape, X_validate_np.shape

2025-04-14 14:20:45.964989: I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
2025-04-14 14:20:46.841494: I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


((15532, 1355), (7830, 1355))

## Hyperparameter definition
Specify hyperparameter ranges

In [10]:
medium_hp = {
    'activation' : ['selu'],
    'ae_neuron_number_seed' : [32,64,96,128],
    'disc_neuron_number_seed' : [32,64,96],
    'ae_number_of_layers' : [2,3,5],
    'disc_number_of_layers' : [2,3,5],
    'batch_size' : [64,128],
    'optimizer' : ['Adam'],
    'learning_rate' : 0.0002,
    'ae_loss_fn' : ['MeanSquaredError'],
    'disc_loss_fn' : ['BinaryCrossentropy']
}

## Sequential hyperparameter tuning

This is robust, it does not require Kubernetes environment for additional job submission but GPU is strongly recommended in the notebook itself to get reasonable speed, not requiring the following (currently broken) parallel tuning section.


In [11]:
# Just testing numbers of epochs and hyperparameter setting trials
# Don't expect anything meaningful
trials=50
epochs=30

# Set RESULTS_DIR env variable for results of tuning
os.environ['RESULTS_DIR'] = datetime.today().strftime("%m%d%Y-%H%M%S")
tuner = keras_tuner.RandomSearch(
    max_trials=trials,
    hypermodel=
        asmsa.AAEHyperModel(
            (X_validate_np.shape[1],),
            hp=medium_hp,
            prior=tfp.distributions.Normal(loc=0, scale=1)),
    objective=keras_tuner.Objective("score", direction="min"),
    directory="./results",
    project_name="Random",
    overwrite=True
)

In [12]:
tuner.search(train=X_train_np,validation=X_validate_np,epochs=epochs,verbose=2)

Trial 2 Complete [00h 00m 12s]

Best score So Far: None
Total elapsed time: 00h 00m 24s

Search: Running Trial #3

Value             |Best Value So Far |Hyperparameter
selu              |selu              |activation
3                 |5                 |ae_number_of_layers
2                 |2                 |disc_number_of_layers
128               |128               |batch_size
Adam              |Adam              |optimizer
0.0002            |0.0002            |learning_rate
MeanSquaredError  |MeanSquaredError  |ae_loss_fn
BinaryCrossentropy|BinaryCrossentropy|disc_loss_fn

Trial ID: bda123c7238d4cf34987bb34685e25a9749df4734052d01ef3e14b3710d3ad7d


2025-04-14 14:22:21.471507: W external/local_tsl/tsl/framework/bfc_allocator.cc:482] Allocator (GPU_0_bfc) ran out of memory trying to allocate 485.67MiB (rounded to 509263360)requested by op Pack
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. 
Current allocation summary follows.
Current allocation summary follows.
2025-04-14 14:22:21.471559: I external/local_tsl/tsl/framework/bfc_allocator.cc:1039] BFCAllocator dump for GPU_0_bfc
2025-04-14 14:22:21.471575: I external/local_tsl/tsl/framework/bfc_allocator.cc:1046] Bin (256): 	Total Chunks: 83, Chunks in use: 83. 20.8KiB allocated for chunks. 20.8KiB in use in bin. 765B client-requested in use in bin.
2025-04-14 14:22:21.471585: I external/local_tsl/tsl/framework/bfc_allocator.cc:1046] Bin (512): 	Total Chunks: 25, Chunks in use: 25. 13.2KiB allocated for chunks. 13.2KiB in use in bin. 12.6KiB client-requested in use in bin.
2025-04-14 14:22:21.471592

RuntimeError: Number of consecutive failures exceeded the limit of 3.
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 274, in _try_run_and_update_trial
    self._run_and_update_trial(trial, *fit_args, **fit_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 239, in _run_and_update_trial
    results = self.run_trial(trial, *fit_args, **fit_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 314, in run_trial
    obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 233, in _build_and_fit_model
    results = self.hypermodel.fit(hp, model, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/asmsa/aae_hyper_model.py", line 212, in fit
    super().fit(hp, model, train, callbacks=callbacks + [logcb], **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/keras_tuner/src/engine/hypermodel.py", line 149, in fit
    return model.fit(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/opt/conda/lib/python3.10/site-packages/asmsa/aae_hyper_model.py", line 89, in on_train_begin
    self.multibatch = tf.stack([self.valdata]*self.model.n_models,axis=1)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: {{function_node __wrapped__Pack_N_12_device_/job:localhost/replica:0/task:0/device:GPU:0}} OOM when allocating tensor with shape[7830,12,1355] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:Pack] name: stack


In [None]:
from asmsa.tuning_analyzer import TuningAnalyzer

# Create analyzer object that analyses results of tuning
# By default it is the latest tuning, but can by choosen with tuning flag,
#  e.g TuningAnalyzer(tuning='analysis/05092023-135249')
analyzer = TuningAnalyzer()

In [None]:
# Get sorted hyperparameters by score, by default 10 best HP, for different number:
#  analyzer.get_best_hp(num_trials=3)
analyzer.get_best_hp()

In [None]:
# Matplotlib visualization - not recommended way, does not look that good and does not scale 
#  that well but at least the colors are consistent accross measures. After more work could look better
# - By default visualizing best 10 trials
# - Can specify only one specific trial... analyzer.visualize_tuning(trial='15d9fa928a7517004bcb28771bb6e5f17ad66dd7013c6aa1572a84773d91393c')
# - Can specify number of best trials to be visualized... analyzer.visualize_tuning(num_trials=3)
analyzer.visualize_tuning()

In [None]:
# Recommended option via Tensorboard. This function populates TB event
#  which can be viewed in native way via Tensorboard. 
# May not work in all Jupyterhub setups, though.

# By default it chooses latest tuning and populates into its directory _TB, e.g: analysis/05092023-135249/_TB
# - Can override directory where to populate... analyzer.populate_TB(out_dir='MyTBeventDir')
# - Can choose only specific trials via list... analyzer.populate_TB(trials=['15d9fa928a7517004bcb28771bb6e5f17ad66dd7013c6aa1572a84773d91393c']),
# - Can select how many best trials to be visualized... analyzer.populate_TB(num_trials=3)
analyzer.populate_TB(num_trials=3)

In [None]:
%load_ext tensorboard
%tensorboard --logdir analysis

## Parallel hyperparameter tuning

**BROKEN**, ignore the rest of this notebook for the time being

In [None]:
# Finally, this is the real stuff
# medium settings known to be working for trpcage

epochs=15
trials=3
hp=medium_hp

# testing only
#epochs=8
#trials=6
#hp=tiny_hp

In [None]:
# number of parallel workers, each runs a single trial at time
# balance between resource availability and size of the problem
# currently each slave runs on 4 cores and 4 GB RAM (hardcoded in src/asmsa/tunewrapper.py)

slaves=3

In [None]:
# XXX: Kubernetes magic: find out names of container image and volume
# check the result, it can go wrong

with open('IMAGE') as img:
    image=img.read().rstrip()

import re
mnt=os.popen('mount | grep /home/jovyan').read()
pvcid=re.search('pvc-[0-9a-z-]+',mnt).group(0)
pvc=os.popen(f'kubectl get pvc | grep {pvcid} | cut -f1 -d" "').read().rstrip()

print(f"""\
image: {image}
volume: {pvc}
""")

In [None]:
# Python wrapper around scripts that prepare and execute parellel Keras Tuner in Kubernetes
from asmsa.tunewrapper import TuneWrapper

wrapper = TuneWrapper(ds=X_validate_np,hp=hp,output=datetime.today().strftime("%m%d%Y-%H%M%S"),epochs=epochs,trials=trials,pdb=conf,top=topol,xtc=traj,ndx=index, pvc=pvc)

In [None]:
# Necessary but destructive cleanup before hyperparameter tuning

# DON'T RUN THIS CELL BLINDLY
# it kills any running processes including the workers, and it purges previous results

!kubectl delete job/tuner
!kill $(ps ax | grep tuning.py | awk '{print $1}')
!rm -rf results

In [None]:
# start the master (chief) of tuners in background
# the computation takes rather long, this is a more robust approach then keeping it in the notebook

wrapper.master_start()

In [None]:
# therefore one should check the status ocassionally; it should show a tuning.py process running
print(wrapper.master_status())

In [None]:
# spawn the requested number of workers as separate Kubernetes job with several pods 
# they receive work from 

wrapper.workers_start(num=slaves)

In [None]:
# This status should show {slaves} number of pods, all of them start in Pending state, and follow through ContainerCreating 
# to Running, and Completed finally

# This takes time, minutes to hours depending on size of the model, number of trials, and number of slaves
# Run this cell repeatedly, waiting until all the pods are completed

wrapper.workers_status()

In [None]:
# Same steps for analysis as with serial tuning
analyzer = TuningAnalyzer()
analyzer.get_best_hp()

In [None]:
# We can choose output dir for TB event this time
out = 'dist_tuning'

analyzer.populate_TB(out_dir=out)

In [None]:
# Might need to kill previous tensorboard instance to change logdir
!pkill -f 'tensorboard'

%load_ext tensorboard
%tensorboard --logdir $out