# Training Neural Networks on DICE

### This notebook runs through two quick examples of training neural networks on a) single machine with six GPU cores and b) scaling out to a cluster of GPU nodes.

Please note option b is still under development at the time of writing but please get in touch if your workflow requires these resources. Alternatively, CERN staff can request cloud compute vouchers for either Amazon Web Services (AWS) or Google Cloud Platfrom (GCP) to utilise Tensor Processing Units (TPUs) -  a snow ticket for resource evaluation to be sent to the IT depeartment.

### Step 1: Environment

Set up either a python virtual or conda environment in the usual way, if you're creating a new environment from scratch, make sure your `environment.yml` file contains the `-tensorflow-gpu` and `keras-tuner` dependencies. If you want to add this dependency to an exisiting environment, then execute: `conda install -c anaconda tensorflow-gpu && conda install -c conda-forge keras-tuner`

### Step 2: Connect to worker

- The IP address of node with GPU cores: `10.129.5.43`
- now execute: `ssh <my_dice_username>@10.129.5.43`
- this machine still mounts all storage systems available on DICE (big files are best kept in the `/scratch/$USER` directory.

- Note: these machine don't yet support dynamic allocation, so check they're not currently in use by others. To monitor the NVIDIA GPU cards, execute: `nvidia-smi` or to constantly track usage `watch -d -n 0.5 nvidia-smi`

- if you want to kill a running GPU job, execute: `kill -9 <PID>`

### Step 3: training

example training script..

In [3]:
#import dependencies 

import tensorflow as tf    
from tensorflow.keras.layers import *
import numpy as np

In [None]:
#Memory management, not too important but must include

all_devices = len(tf.config.list_physical_devices('GPU'))
print("Num GPUs Available: ", all_devices)
physical_devices=tf.config.list_physical_devices('GPU')
gpus= tf.config.experimental.list_physical_devices('GPU')
for i in range(0,all_devices+1):
    tf.config.experimental.set_memory_growth(gpus[i], True)

mirrored_strategy = tf.distribute.MirroredStrategy(devices=[f"/GPU:{GPU_id}" for GPU_id in range (0,6)])


In [None]:
#Define some model, e.g. two convolutional feature extraction layers 
#and a feed forward classification layer
#we will also perform hyperparameter search

def create_model(hp):
    with mirrored_strategy.scope():
        model = tf.keras.Sequential()
        for i in range (1, hp.Int("conv_layers",3,4)):
            if i == 1:
                model.add(Conv2D(4, kernel_size=hp.Choice('kernel_size', values=[2,3,4]), input_shape=(20,12,1),  padding='same'))
            else:
                 model.add(Conv2D(8, kernel_size=hp.Choice('kernel_size', values=[2,3,4]), padding='same'))
            model.add(MaxPooling2D((2,2), padding='same'))
            model.add(BatchNormalization(axis=1))
            model.add(Activation('relu'))

        model.add(Flatten())

        for j in range (1, hp.Int("FCN_layers",3,4)):
            model.add(Dense(128))
            model.add(BatchNormalization(axis=1))
            model.add(Activation('relu'))

        model.add(Dense(1))
        model.add(BatchNormalization(axis=1))
        model.add(Activation('sigmoid'))

        model.build(input_shape=(20,12,1))

        opt = tf.keras.optimizers.SGD(
            learning_rate=hp.Choice("lr", values=[0.1,0.01])
        )

        model.compile(loss='binary_crossentropy', optimizer=opt, metrics=["accuracy"])
        return model


In [13]:
#use Keras tuner to define hyperparameter search
tuner=RandomSearch(create_model,
    objective='val_loss',
    max_trials=5,
    overwrite=True)

In [None]:
#load in your dataset
data = np.load('/software/ys20884/training_data/data_hh4b_20x12_160000.npz')
train_X= data['train_X']      
train_y = data['train_y']   
test_X = data['test_X']
test_y = data['test_y']

In [None]:
#futher split into validation set
#could also use k-fold cross validation
x_train, x_valid, y_train, y_valid = train_test_split(train_X, train_y, test_size=0.15, shuffle=True)


In [None]:
tuner.search(x_train, y_train, epochs=1,  validation_data=(x_valid,y_valid), workers=all_devices)


In [None]:
#print your best found parameters (good idea to also save these!)

print(tuner.results_summary())

bestHP = tuner.get_best_hyperparameters(num_trials=1)[0]
print("[INFO] optimal number of filters in conv_1 layer: {}".format(
bestHP.get("conv_layers")))
print("[INFO] optimal number of filters in conv_2 layer: {}".format(
bestHP.get("FNC_layesr")))
print("[INFO] optimal number of units in dense layer: {}".format(
bestHP.get("kernel_size")))
print("[INFO] optimal learning rate: {:.4f}".format(
bestHP.get("lr")))

In [14]:
# you can now train with these optimal parameters:
model = tuner.hypermodel.build(bestHP)
history = model.fit(train_X, train_y, epochs=50, validation_split=0.2)

### Troubleshooting

If Tesnorflow is "not seeing" the CUDA workers (i.e. all_devices=0) then try exeuting the following commands:

- `pip uninstall tensorflow`
- `pip uninstall tensorflow-gpu`
- `pip install --upgrade --force-reinstall tensorflow-gpu`
- `export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5`

### Part B) Training on cluster w/ Dask

NOTE: backend still under development and will sometimes experiance
RAM memory issues.

In [None]:
from dask.distributed import Client, SSHCluster
import asyncio, asyncssh, sys
cluster = SSHCluster(
    ["10.129.5.2", "deepthought.phy.bris.ac.uk"],
    connect_options={"known_hosts": None,
                     "username": "",
                     "password": ""},
    scheduler_options={"port": 0, "dashboard_address": ":8797"})
    #worker_class="dask_cuda.CUDAWorker")
client = Client(cluster)

In [None]:
client.scheduler_info()['services']

In [None]:
client

In [None]:
from multiprocessing.pool import ThreadPool
import dask
dask.config.set(pool=ThreadPool(1))
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'

In [None]:
model = KerasClassifier(build_fn=create_model, batch_size=2, epochs=1, kernel_size=(3,3), pool_size=(2,2), dropout=0.1, conv_layers=1, hidden_layers=1, FCN_dense=24, CNN_dense=4,lr=0.1, momentum=0.5)

In [None]:
param_grid = {
    'hidden_layers' : [3,4],
    'conv_layers' : [3,4],
    'lr': [0.1,0.01],
    'momentum': [0.05],
    'kernel_size': [(4,4),(3,3)],
    'pool_size': [(2,2)],
    'epochs': [10,20,50],
    'FCN_dense': [64], 
    'batch_size': [32],
    'dropout': [0.01, 0]
}

In [None]:
kfold_splits = 4
grid = GridSearchCV(estimator=model,  
                    return_train_score=True,
                    cv=kfold_splits,
                    param_grid=param_grid)

In [None]:
with joblib.parallel_backend('dask'):
    grid.fit(train_X, train_y)
    print(grid.best_params_)

client.shutdown()
client.close()