# DRYML Tutorial 6 - Hyperparameter Searches with `ray`

## Ray tune

One primary use case for DRYML is hyperparameter tuning of your models. With `ObjectDef` providing an easy way to define model templates, and the uniform training interface of trainables we can slot DRYML into the ray hyperparameter tuning process.

We'll briefly introduce `ray`, and write a simple hyperparameter tuning example.

RAY is a platform for remote process execution. It creates a server which manages connected resources. Jobs can then be sent to those resources in the form of multiple processes confined to specific resources. This is ideal for hyperparameter tuning, and in fact RAY provides the `ray.tune` library for exactly this.

This tutorial won't serve as a tutorial for ray, for that please consult the ray documentation available here: https://docs.ray.io/en/latest/index.html and here: https://docs.ray.io/en/latest/tune/index.html

Let's start the ray server, and write a simple method for generating models.

In [2]:
import ray

In [3]:
ray.init(num_gpus=1, num_cpus=8)

2023-03-21 13:25:29,956	INFO worker.py:1518 -- Started a local Ray instance.


0,1
Python version:,3.8.13
Ray version:,2.0.0


*** SIGTERM received at time=1679423286 on cpu 11 ***
PC: @     0x7f288e4ca1b6  (unknown)  epoll_wait
    @     0x7f288e3faf50  (unknown)  (unknown)
[2023-03-21 13:28:06,692 E 2393539 2393539] logging.cc:361: *** SIGTERM received at time=1679423286 on cpu 11 ***
[2023-03-21 13:28:06,692 E 2393539 2393539] logging.cc:361: PC: @     0x7f288e4ca1b6  (unknown)  epoll_wait
[2023-03-21 13:28:06,692 E 2393539 2393539] logging.cc:361:     @     0x7f288e3faf50  (unknown)  (unknown)


## DRYML support

DRYML provides support for `ray.tune` in the form of the `dryml.ray.tune.Trainer` class. This class defines a callable function compatible with the ray tune functional API. We just need to supply it with a special callable which can provide a few needed callable methods to setup and run the tune experiment.

`dryml.ray.tune.Trainer` expects the arguments:
* `name`: The name of the experiment to use
* `prep_method`: The callable for creating the necessary callables for setting up the experiment. This must be picklable via `dill`.
* `metrics`: A dictionary of metrics to compute after each step of training.

Once created, the user can then design their tune experiment in the usual way, and pass the `Trainer` as the callable trainable method.

We'll create a `prep_method` which can yield all needed callables for a simple experiment: How large a convolutional kernel is appropriate for a two layer convolutional model for classifying MNIST digits.

## Define `prep_method`

In [3]:
import dryml
import os

In [4]:
# Name experiment so we can set model directory
experiment_name = 'TF_ray_test'
model_dir = os.path.realpath(os.path.join('./models', experiment_name))

# A callable to create the train/test `Dataset`
def data_gen():
    import tensorflow_datasets as tfds
    from dryml.data.tf import TFDataset
    
    # Check whether tensorflow support exists
    # For the current GPU.
    dryml.context.context_check({'tf': {}})
    
    (ds_train, ds_test), ds_info = tfds.load(
        'mnist',
        split=['train', 'test'],
        shuffle_files=True,
        as_supervised=True,
        with_info=True)
    
    train_ds = TFDataset(
        ds_train,
        supervised=True
    )
    test_ds = TFDataset(
        ds_test,
        supervised=True
    )
    return {
        'train': train_ds,
        'test': test_ds,
    }


def prep_method():
    # A callable to create a repo. This is needed to store completed models for later use.
    def repo_gen():
        return dryml.Repo(directory=model_dir, create=True)


    # We need another callable since the input datasets have a context
    # Requirement, we want the Trainer function to incorporate this
    # requirement when building the compute context.
    def data_ctx_gen():
        return {'tf': {}}

    # Model generator method which takes a config, and generates a model
    # It also can take a repo keyword argument so already trained components
    # Can be grabbed from the repo.
    def model_gen(config, repo=None):
        import dryml
        import dryml.models
        import dryml.models.tf
        import tensorflow as tf
        
        # Grab the existing Best Category data transformation
        best_cat_def = dryml.ObjectDef(dryml.data.transforms.BestCat)
        best_step = repo.get(best_cat_def)
        
        kernel_size = int(config['kernel_size'])

        filters = 32
        n_layers = 2
        layer_defs = []
        for i in range(n_layers):
            layer_defs.append(
                ['Conv2D', {'filters': filters, 'kernel_size': kernel_size, 'activation': 'relu'}])
        layer_defs.append(['Flatten', {}])
        layer_defs.append(['Dense', {'units': 10, 'activation': 'linear'}])
        
        mdl_def = dryml.ObjectDef(
            dryml.models.tf.keras.base.SequentialFunctionalModel,
            input_shape=(28, 28, 1),
            layer_defs=layer_defs,
        )
        
        # Create model definition
        mdl_def = dryml.ObjectDef(
            dryml.models.Pipe,
            dryml.ObjectDef(
                dryml.models.tf.keras.Trainable,
                # Model definition
                model=mdl_def,
                # Train method
                train_fn=dryml.ObjectDef(
                    dryml.models.tf.keras.base.BasicTraining,
                    epochs=5,
                ),
                # Optimizer
                optimizer=dryml.ObjectDef(
                    dryml.models.tf.Wrapper,
                    tf.keras.optimizers.Adam,
                ),
                # Loss
                loss=dryml.ObjectDef(
                    dryml.models.tf.Wrapper,
                    tf.keras.losses.SparseCategoricalCrossentropy,
                    from_logits=True
                )
            ),
            best_step,
        )
        
        ctx_reqs = {'tf': {'num_gpus': 1}}

        # Return dictionary with the model and optionally, 
        return {
            'model': mdl_def.build(repo=repo),
            'ctx_reqs': ctx_reqs,
        }

    # Return dictionary with defined callables.
    return {
        'repo': repo_gen,
        'data_ctx': data_ctx_gen,
        'data': data_gen,
        'model': model_gen,
    }

Next, let's prepare the `Repo` with needed objects and the `Repo` directory.

In [5]:
# Let's create a repo pointing to the same directory.
repo = dryml.Repo(directory=model_dir, create=True)

# Create a Best Category trainable, and save it to the repo.
best_cat_def = dryml.ObjectDef(dryml.data.transforms.BestCat)
# Repo's get method has a special ability when `build_missing_def=True`.
# If a non-concrete definition is not in the repo, one instance will be
# created and stored in the repo.
repo.get(best_cat_def, build_missing_def=True)

# Save the objects
repo.save()

Now, let's design the tune experiment!

In [6]:
from ray import tune
import dryml.ray
import dryml.metrics
import datetime

local_dir = os.path.realpath(f'./ray_results')

# Define study space
config = {
    'kernel_size': tune.randint(3,8),
}

# Create model trainer
model_trainer = dryml.ray.tune.Tune1Trainer(
    name=experiment_name,
    prep_method=prep_method,
    metrics={'accuracy': dryml.metrics.scalar.categorical_accuracy},
)

analysis = tune.run(
    model_trainer,
    config=config,
    metric='accuracy',
    mode='max',
    resources_per_trial={'cpu': 8, 'gpu': 1},
    num_samples=2,
    keep_checkpoints_num=2,
    local_dir=local_dir,
    log_to_file=True,
    reuse_actors=False,
    fail_fast=True,
    progress_reporter=tune.JupyterNotebookReporter(
        True,
        max_report_frequency=5,
    ),
    # resume=True, # Currently needs more experimentation to work properly
)

Trial name,status,loc,kernel_size,iter,total time (s),accuracy
TF_ray_test_17b5d_00000,TERMINATED,192.168.20.6:4185204,3,6,40.3327,0.973758
TF_ray_test_17b5d_00001,TERMINATED,192.168.20.6:4186752,6,6,43.6363,0.972556


2023-02-21 15:04:15,881	INFO tune.py:701 -- Total run time: 88.96 seconds (88.81 seconds for the tuning loop).


In [8]:
# Get the best trial by max accuracy
results_df = analysis.results_df
best_trial_id = results_df['accuracy'].idxmax()



In [9]:
results_df

Unnamed: 0_level_0,accuracy,dry_id,done,time_this_iter_s,timesteps_total,episodes_total,training_iteration,experiment_id,date,timestamp,time_total_s,pid,hostname,node_ip,time_since_restore,timesteps_since_restore,iterations_since_restore,warmup_time,experiment_tag,config.kernel_size
trial_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
17b5d_00000,0.973758,cdc796d7-e227-4782-8d6f-40f1eebcf43f,True,0.920265,,,6,1d90aba3a0604099bf7c862f9791a8c8,2023-02-21_15-03-29,1677013409,40.332708,4185204,hal06,192.168.20.6,40.332708,0,6,0.004792,0_kernel_size=3,3
17b5d_00001,0.972556,f917488b-06ee-404f-88e5-2353f4015da7,True,1.417008,,,6,b9c052c5dbfd46b6967139f1aef41a15,2023-02-21_15-04-15,1677013455,43.636255,4186752,hal06,192.168.20.6,43.636255,0,6,0.004236,1_kernel_size=6,6


In [10]:
# Get and show config of best trial
best_trial_data = results_df.loc[best_trial_id]
config_data = best_trial_data.loc[best_trial_data.index.str.contains('config')]
config_data

config.kernel_size    3
Name: 17b5d_00000, dtype: object

In [11]:
# Report accuracy and id
print(f"Accuracy: {best_trial_data.loc['accuracy']}")
best_model_id = best_trial_data.loc['dry_id'] 
print(f"id: {best_model_id}")

Accuracy: 0.9737580128205128
id: cdc796d7-e227-4782-8d6f-40f1eebcf43f


In [12]:
# Refresh the repository
repo.load_objects_from_directory()



In [13]:
# Fetch the best performing model
model = repo.get_obj_by_id(best_model_id)

In [14]:
# Lets define a method to test the model's accuracy
@dryml.compute
def test_model(model):
    import dryml.metrics
    import tensorflow_datasets as tfds
    from dryml.data.tf import TFDataset

    # Check whether tensorflow support exists
    # For the current GPU.
    dryml.context.context_check({'tf': {}})

    (ds_test,) = tfds.load(
        'mnist',
        split=['test'],
        shuffle_files=True,
        as_supervised=True)

    test_ds = TFDataset(
        ds_test,
        supervised=True
    )

    return dryml.metrics.scalar.categorical_accuracy(model, test_ds)

In [15]:
# And verify recorded accuracy!
test_model(model)



0.9737580128205128