# DRYML Tutorial

## Ray tune

One primary use case for DRYML is hyperparameter tuning of your models. `ObjectDef` provides an easy way to define and construct models. The `train` method of trainables allows uniform training for all models, and DRYML provides a set of interface methods to easily run hyperparameters searches on your models.

We'll briefly introduce `ray`, and write a simple hyperparameter tuning example.

RAY is a platform for remote process execution. It creates a server which manages connected resources. Jobs can then be sent to those resources in the form of multiple processes confined to specific resources. This is ideal for hyperparameter tuning, and in fact RAY provides the `ray.tune` library for exactly this.

This tutorial won't serve as a tutorial for ray, for that please consult the ray documentation available here: https://docs.ray.io/en/latest/index.html and here: https://docs.ray.io/en/latest/tune/index.html

Let's start the ray server, and write a simple method for generating models.

In [15]:
import ray

In [16]:
ray.init(num_gpus=1, num_cpus=1)

2022-10-04 17:17:50,820	INFO worker.py:1518 -- Started a local Ray instance.


0,1
Python version:,3.8.13
Ray version:,2.0.0


## DRYML support

DRYML provides support for `ray.tune` in the form of the `dryml.ray.tune.Trainer` class. This class defines a callable function compatible with the ray tune functional API. We just need to supply it with a special callable which can provide a few needed callable methods to setup and run the tune experiment.

`dryml.ray.tune.Trainer` expects the arguments:
* `name`: The name of the experiment to use
* `prep_method`: The callable for creating the necessary callables for setting up the experiment. This must be picklable via `dill`.
* `metrics`: A dictionary of metrics to compute after each step of training.

Once created, the user can then design their tune experiment in the usual way, and pass the `Trainer` as the callable trainable method.

We'll create a `prep_method` which can yield all needed callables for a simple experiment: How large a convolutional kernel is appropriate for a two layer convolutional model for classifying MNIST digits.

## Define `prep_method`

In [48]:
# A callable to create the train/test `Dataset`
def data_gen():
    import tensorflow_datasets as tfds
    from dryml.data.tf import TFDataset
    
    # Check whether tensorflow support exists
    # For the current GPU.
    dryml.context.context_check({'tf': {}})
    
    (ds_train, ds_test), ds_info = tfds.load(
        'mnist',
        split=['train', 'test'],
        shuffle_files=True,
        as_supervised=True,
        with_info=True)
    
    train_ds = TFDataset(
        ds_train,
        supervised=True
    )
    test_ds = TFDataset(
        ds_test,
        supervised=True
    )
    return {
        'train': train_ds,
        'test': test_ds,
    }


def prep_method():
    # A callable to create a repo. This is needed to store completed models for later use.
    def repo_gen():
        return dryml.Repo(directory='/data0/matthew/Software/NCSA/DRYML/tutorials/models')


    # We need another callable since the input datasets have a context
    # Requirement, we want the Trainer function to incorporate this
    # requirement when building the compute context.
    def data_ctx_gen():
        return {'tf': {}}

    # Model generator method which takes a config, and generates a model
    # It also can take a repo keyword argument so already trained components
    # Can be grabbed from the repo.
    def model_gen(config, repo=None):
        import dryml
        import dryml.models
        import dryml.models.tf
        import tensorflow as tf
        
        # Grab the existing Best Category data transformation
        best_cat_def = dryml.ObjectDef(dryml.data.transforms.BestCat)
        best_step = repo.get(best_cat_def)
        
        kernel_size = int(config['kernel_size'])

        filters = 32
        n_layers = 2
        layer_defs = []
        for i in range(n_layers):
            layer_defs.append(
                ['Conv2D', {'filters': filters, 'kernel_size': kernel_size, 'activation': 'relu'}])
        layer_defs.append(['Flatten', {}])
        layer_defs.append(['Dense', {'units': 10, 'activation': 'linear'}])
        
        mdl_def = dryml.ObjectDef(
            dryml.models.tf.keras.base.SequentialFunctionalModel,
            input_shape=(28, 28, 1),
            layer_defs=layer_defs,
        )
        
        # Create model definition
        mdl_def = dryml.ObjectDef(
            dryml.models.Pipe,
            dryml.ObjectDef(
                dryml.models.tf.keras.Trainable,
                # Model definition
                model=mdl_def,
                # Train method
                train_fn=dryml.ObjectDef(
                    dryml.models.tf.keras.base.BasicTraining,
                    epochs=5,
                ),
                # Optimizer
                optimizer=dryml.ObjectDef(
                    dryml.models.tf.ObjectWrapper,
                    tf.keras.optimizers.Adam,
                ),
                # Loss
                loss=dryml.ObjectDef(
                    dryml.models.tf.ObjectWrapper,
                    tf.keras.losses.SparseCategoricalCrossentropy,
                    obj_kwargs={
                        'from_logits': True
                    }
                )
            ),
            best_step,
        )
        
        ctx_reqs = {'tf': {'num_gpus': 1}}

        # Return dictionary with the model and optionally, 
        return {
            'model': mdl_def.build(repo=repo),
            'ctx_reqs': ctx_reqs,
        }

    # Return dictionary with defined callables.
    return {
        'repo': repo_gen,
        'data_ctx': data_ctx_gen,
        'data': data_gen,
        'model': model_gen,
    }

In [38]:
import dryml
import os

Next, let's prepare the `Repo` with needed objects and the `Repo` directory.

In [39]:
# Let's create a repo pointing to the same directory.
model_dir = os.path.realpath('./models')
if not os.path.exists(model_dir):
    os.mkdir(model_dir)
repo = dryml.Repo(directory=model_dir)

# Create a Best Category trainable, and save it to the repo.
best_cat_def = dryml.ObjectDef(dryml.data.transforms.BestCat)
# Repo's get method has a special ability when `build_missing_def=True`.
# If a non-concrete definition is not in the repo, one instance will be
# created and stored in the repo.
repo.get(best_cat_def, build_missing_def=True)

# Save the objects
repo.save()

Now, let's design the tune experiment!

In [40]:
from ray.air.checkpoint import Checkpoint
from ray.air.config import RunConfig, CheckpointConfig
from ray.tune.tune_config import TuneConfig
import dryml.ray
import dryml.metrics
import datetime

experiment_name = 'TF_ray_test'
local_dir = os.path.realpath(f'./ray_results')

# Define study space
config = {
    'kernel_size': ray.tune.randint(3,8),
}

# Create model trainer
model_trainer = dryml.ray.tune.Trainer(
    name=experiment_name,
    prep_method=prep_method,
    metrics={'accuracy': dryml.metrics.scalar.categorical_accuracy},
)

# Setup Tuner
checkpoint_config = CheckpointConfig(
    num_to_keep=2,
    checkpoint_score_attribute='accuracy',
    checkpoint_score_order="max",
)

run_config = RunConfig(
    name=experiment_name,
    local_dir=local_dir,
    log_to_file=True,
    checkpoint_config=checkpoint_config,
    verbose=2,
)

# We must set 'reuse_actors=False' so compute contexts can be reset between trials
tune_config = TuneConfig(
    metric='accuracy',
    mode='max',
    num_samples=20,
    time_budget_s=datetime.timedelta(hours=9),
    reuse_actors=False)

tuner = ray.tune.Tuner(
    ray.tune.with_resources(
        model_trainer,
        {'cpu': 1, 'gpu': 1},
    ),
    param_space=config,
    tune_config=tune_config,
    run_config=run_config,
)

In [None]:
# Now we can start the tune experiment.
results = tuner.fit()

In [45]:
# Finally, we can inspect the results of this experiment and have a look at the best model!
best_result = results.get_best_result(metric='accuracy', mode='max')
print(f"Best configuration: {best_result.config}")
print(f"Accuracy: {best_result.metrics['accuracy']}")
best_model_id = best_result.metrics['dry_id']
print(f"id: {best_model_id}")

Best configuration: {'kernel_size': 4}
Accuracy: 0.9782652243589743
id: ce6ba022-cbad-464c-b308-815d80d9a936


In [46]:
# Refresh the repository
repo.load_objects_from_directory()

In [47]:
# Fetch the best performing model
model = repo.get_obj_by_id(best_model_id)

In [55]:
# Lets define a method to test the model's accuracy
@dryml.compute
def test_model(model):
    import dryml.metrics
    import tensorflow_datasets as tfds
    from dryml.data.tf import TFDataset

    # Check whether tensorflow support exists
    # For the current GPU.
    dryml.context.context_check({'tf': {}})

    (ds_test,) = tfds.load(
        'mnist',
        split=['test'],
        shuffle_files=True,
        as_supervised=True)

    test_ds = TFDataset(
        ds_test,
        supervised=True
    )

    return dryml.metrics.scalar.categorical_accuracy(model, test_ds)

In [56]:
# And verify recorded accuracy!
test_model(model)

0.9782652243589743