# DRYML Tutorial

## Ray tune

One primary use case for DRYML is hyperparameter tuning of your models. `ObjectDef` provides an easy way to define and construct models. The `train` method of trainables allows uniform training for all models, and DRYML provides a set of interface methods to easily run hyperparameters searches on your models.

We'll briefly introduce `ray`, and write a simple hyperparameter tuning example.

RAY is a platform for remote process execution. It creates a server which manages connected resources. Jobs can then be sent to those resources in the form of multiple processes confined to specific resources. This is ideal for hyperparameter tuning, and in fact RAY provides the `ray.tune` library for exactly this.

This tutorial won't serve as a tutorial for ray, for that please consult the ray documentation available here: https://docs.ray.io/en/latest/index.html and here: https://docs.ray.io/en/latest/tune/index.html

Let's start the ray server, and write a simple method for generating models.

In [1]:
import ray

In [2]:
ray.init(num_gpus=1, num_cpus=1)

2023-03-21 13:33:07,884	INFO worker.py:1518 -- Started a local Ray instance.


0,1
Python version:,3.8.13
Ray version:,2.0.0


## DRYML support

DRYML provides support for `ray.tune` in the form of the `dryml.ray.tune.Trainer` class. This class defines a callable function compatible with the ray tune functional API. We just need to supply it with a special callable which can provide a few needed callable methods to setup and run the tune experiment.

`dryml.ray.tune.Trainer` expects the arguments:
* `name`: The name of the experiment to use
* `prep_method`: The callable for creating the necessary callables for setting up the experiment. This must be picklable via `dill`.
* `metrics`: A dictionary of metrics to compute after each step of training.

Once created, the user can then design their tune experiment in the usual way, and pass the `Trainer` as the callable trainable method.

We'll create a `prep_method` which can yield all needed callables for a simple experiment: How large a convolutional kernel is appropriate for a two layer convolutional model for classifying MNIST digits.

## Define `prep_method`

In [3]:
# A callable to create the train/test `Dataset`
def data_gen():
    import tensorflow_datasets as tfds
    from dryml.data.tf import TFDataset
    
    # Check whether tensorflow support exists
    # For the current GPU.
    dryml.context.context_check({'tf': {}})
    
    (ds_train, ds_test), ds_info = tfds.load(
        'mnist',
        split=['train', 'test'],
        shuffle_files=True,
        as_supervised=True,
        with_info=True)
    
    train_ds = TFDataset(
        ds_train,
        supervised=True
    )
    test_ds = TFDataset(
        ds_test,
        supervised=True
    )
    return {
        'train': train_ds,
        'test': test_ds,
    }


def prep_method():
    # A callable to create a repo. This is needed to store completed models for later use.
    def repo_gen():
        return dryml.Repo(directory='/data0/matthew/Software/NCSA/DRYML/tutorials/models')


    # We need another callable since the input datasets have a context
    # Requirement, we want the Trainer function to incorporate this
    # requirement when building the compute context.
    def data_ctx_gen():
        return {'tf': {}}

    # Model generator method which takes a config, and generates a model
    # It also can take a repo keyword argument so already trained components
    # Can be grabbed from the repo.
    def model_gen(config, repo=None):
        import dryml
        import dryml.models
        import dryml.models.tf
        import tensorflow as tf
        
        # Grab the existing Best Category data transformation
        best_cat_def = dryml.ObjectDef(dryml.data.transforms.BestCat)
        best_step = repo.get(best_cat_def)
        
        kernel_size = int(config['kernel_size'])

        filters = 32
        n_layers = 2
        layer_defs = []
        for i in range(n_layers):
            layer_defs.append(
                ['Conv2D', {'filters': filters, 'kernel_size': kernel_size, 'activation': 'relu'}])
        layer_defs.append(['Flatten', {}])
        layer_defs.append(['Dense', {'units': 10, 'activation': 'linear'}])
        
        mdl_def = dryml.ObjectDef(
            dryml.models.tf.keras.base.SequentialFunctionalModel,
            input_shape=(28, 28, 1),
            layer_defs=layer_defs,
        )
        
        # Create model definition
        mdl_def = dryml.ObjectDef(
            dryml.models.Pipe,
            dryml.ObjectDef(
                dryml.models.tf.keras.Trainable,
                # Model definition
                model=mdl_def,
                # Train method
                train_fn=dryml.ObjectDef(
                    dryml.models.tf.keras.base.BasicTraining,
                    epochs=5,
                ),
                # Optimizer
                optimizer=dryml.ObjectDef(
                    dryml.models.tf.Wrapper,
                    tf.keras.optimizers.Adam,
                ),
                # Loss
                loss=dryml.ObjectDef(
                    dryml.models.tf.Wrapper,
                    tf.keras.losses.SparseCategoricalCrossentropy,
                    from_logits=True
                )
            ),
            best_step,
        )
        
        ctx_reqs = {'tf': {'num_gpus': 1}}

        # Return dictionary with the model and optionally, 
        return {
            'model': mdl_def.build(repo=repo),
            'ctx_reqs': ctx_reqs,
        }

    # Return dictionary with defined callables.
    return {
        'repo': repo_gen,
        'data_ctx': data_ctx_gen,
        'data': data_gen,
        'model': model_gen,
    }

In [4]:
import dryml
import os
import shutil

Next, let's prepare the `Repo` with needed objects and the `Repo` directory.

In [5]:
# Let's create a repo pointing to the same directory.
model_dir = os.path.realpath('./models')
if os.path.exists(model_dir):
    # Delete the existing model directory as it may have old models inside
    shutil.rmtree(model_dir)
os.mkdir(model_dir)
repo = dryml.Repo(directory=model_dir)

# Create a Best Category trainable, and save it to the repo.
best_cat_def = dryml.ObjectDef(dryml.data.transforms.BestCat)
# Repo's get method has a special ability when `build_missing_def=True`.
# If a non-concrete definition is not in the repo, one instance will be
# created and stored in the repo.
repo.get(best_cat_def, build_missing_def=True)

# Save the objects
repo.save()

Now, let's design the tune experiment!

In [6]:
from ray.air.checkpoint import Checkpoint
from ray.air.config import RunConfig, CheckpointConfig
from ray.tune.tune_config import TuneConfig
import dryml.ray
import dryml.metrics
import datetime

experiment_name = 'TF_ray_test'
local_dir = os.path.realpath(f'./ray_results')

# Define study space
config = {
    'kernel_size': ray.tune.randint(3,8),
}

# Create model trainer
model_trainer = dryml.ray.tune.Tune2Trainer(
    name=experiment_name,
    prep_method=prep_method,
    metrics={'accuracy': dryml.metrics.scalar.categorical_accuracy},
)

# Setup Tuner
checkpoint_config = CheckpointConfig(
    num_to_keep=2,
    checkpoint_score_attribute='accuracy',
    checkpoint_score_order="max",
)

run_config = RunConfig(
    name=experiment_name,
    local_dir=local_dir,
    log_to_file=True,
    checkpoint_config=checkpoint_config,
    verbose=2,
)

# We must set 'reuse_actors=False' so compute contexts can be reset between trials
tune_config = TuneConfig(
    metric='accuracy',
    mode='max',
    num_samples=20,
    time_budget_s=datetime.timedelta(hours=9),
    reuse_actors=False)

tuner = ray.tune.Tuner(
    ray.tune.with_resources(
        model_trainer,
        {'cpu': 1, 'gpu': 1},
    ),
    param_space=config,
    tune_config=tune_config,
    run_config=run_config,
)

In [7]:
# Now we can start the tune experiment.
results = tuner.fit()

2023-03-21 13:33:08,697	INFO registry.py:96 -- Detected unknown callable for trainable. Converting to class.

from ray.air import session

def train(config):
    # ...
    session.report({"metric": metric}, checkpoint=checkpoint)

For more information please see https://docs.ray.io/en/master/ray-air/key-concepts.html#session

  from .autonotebook import tqdm as notebook_tqdm


Trial name,status,loc,kernel_size,iter,total time (s),accuracy
TF_ray_test_d3d32_00000,TERMINATED,192.168.2.31:2448032,4,6,38.9442,0.978666
TF_ray_test_d3d32_00001,TERMINATED,192.168.2.31:2449073,3,6,36.6209,0.977163
TF_ray_test_d3d32_00002,TERMINATED,192.168.2.31:2450063,4,6,38.8038,0.974459
TF_ray_test_d3d32_00003,TERMINATED,192.168.2.31:2451052,3,6,36.975,0.97496
TF_ray_test_d3d32_00004,TERMINATED,192.168.2.31:2452048,7,6,41.0417,0.973157
TF_ray_test_d3d32_00005,TERMINATED,192.168.2.31:2453110,4,6,38.3258,0.976262
TF_ray_test_d3d32_00006,TERMINATED,192.168.2.31:2454088,5,6,41.0753,0.976963
TF_ray_test_d3d32_00007,TERMINATED,192.168.2.31:2455115,7,6,40.8627,0.976162
TF_ray_test_d3d32_00008,TERMINATED,192.168.2.31:2456138,5,6,38.6731,0.977364
TF_ray_test_d3d32_00009,TERMINATED,192.168.2.31:2457118,7,6,40.3585,0.976963


[2m[36m(func pid=2448032)[0m 2023-03-21 13:33:12.563976: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6451 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1


[2m[36m(func pid=2448032)[0m Epoch 1/5


[2m[36m(func pid=2448032)[0m 2023-03-21 13:33:19.780499: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8600


   1/1500 [..............................] - ETA: 33:23 - loss: 39.2893
  40/1500 [..............................] - ETA: 3s - loss: 9.1782  
  74/1500 [>.............................] - ETA: 4s - loss: 5.1303
 105/1500 [=>............................] - ETA: 4s - loss: 3.7029
 139/1500 [=>............................] - ETA: 4s - loss: 2.8441
 173/1500 [==>...........................] - ETA: 3s - loss: 2.3354
 206/1500 [===>..........................] - ETA: 3s - loss: 1.9939
 237/1500 [===>..........................] - ETA: 3s - loss: 1.7534
 274/1500 [====>.........................] - ETA: 3s - loss: 1.5462
 311/1500 [=====>........................] - ETA: 3s - loss: 1.3784
 347/1500 [=====>........................] - ETA: 3s - loss: 1.2544
Trial TF_ray_test_d3d32_00000 reported accuracy=0.98 with parameters={'kernel_size': 4}.
[2m[36m(func pid=2448032)[0m Epoch 2/5
   1/1500 [..............................] - ETA: 53s - loss: 0.0934
  50/1500 [>.............................] - E

[2m[36m(func pid=2449073)[0m 2023-03-21 13:33:53.814426: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6451 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1


[2m[36m(func pid=2449073)[0m Epoch 1/5


[2m[36m(func pid=2449073)[0m 2023-03-21 13:34:00.761492: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8600


  19/1500 [..............................] - ETA: 4s - loss: 11.2825   
  49/1500 [..............................] - ETA: 4s - loss: 4.7901
  89/1500 [>.............................] - ETA: 4s - loss: 2.7912
 130/1500 [=>............................] - ETA: 3s - loss: 2.0043
 163/1500 [==>...........................] - ETA: 3s - loss: 1.6443
 204/1500 [===>..........................] - ETA: 3s - loss: 1.3596
 243/1500 [===>..........................] - ETA: 3s - loss: 1.1702
 283/1500 [====>.........................] - ETA: 3s - loss: 1.0310
 322/1500 [=====>........................] - ETA: 3s - loss: 0.9299
Trial TF_ray_test_d3d32_00001 reported accuracy=0.97 with parameters={'kernel_size': 3}.
[2m[36m(func pid=2449073)[0m Epoch 2/5
   1/1500 [..............................] - ETA: 50s - loss: 0.0220
  50/1500 [>.............................] - ETA: 3s - loss: 0.0550
  93/1500 [>.............................] - ETA: 3s - loss: 0.0629
 140/1500 [=>............................] - ETA

[2m[36m(func pid=2450063)[0m 2023-03-21 13:34:32.760180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6451 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1


[2m[36m(func pid=2450063)[0m Epoch 1/5


[2m[36m(func pid=2450063)[0m 2023-03-21 13:34:39.914530: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8600


  23/1500 [..............................] - ETA: 3s - loss: 6.2191    
  62/1500 [>.............................] - ETA: 3s - loss: 2.5692
 100/1500 [=>............................] - ETA: 3s - loss: 1.6893
 134/1500 [=>............................] - ETA: 3s - loss: 1.3235
 165/1500 [==>...........................] - ETA: 3s - loss: 1.1094
 196/1500 [==>...........................] - ETA: 3s - loss: 0.9655
 231/1500 [===>..........................] - ETA: 3s - loss: 0.8420
 264/1500 [====>.........................] - ETA: 3s - loss: 0.7581
 299/1500 [====>.........................] - ETA: 3s - loss: 0.6851
 341/1500 [=====>........................] - ETA: 3s - loss: 0.6142
Trial TF_ray_test_d3d32_00002 reported accuracy=0.98 with parameters={'kernel_size': 4}.
[2m[36m(func pid=2450063)[0m Epoch 2/5
   1/1500 [..............................] - ETA: 52s - loss: 0.0011
  50/1500 [>.............................] - ETA: 3s - loss: 0.0573
  94/1500 [>.............................] - ETA

[2m[36m(func pid=2451052)[0m 2023-03-21 13:35:13.782699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6451 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1


[2m[36m(func pid=2451052)[0m Epoch 1/5


[2m[36m(func pid=2451052)[0m 2023-03-21 13:35:20.875256: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8600


   1/1500 [..............................] - ETA: 32:46 - loss: 20.3234
  42/1500 [..............................] - ETA: 3s - loss: 4.0437   
  82/1500 [>.............................] - ETA: 3s - loss: 2.2454
 125/1500 [=>............................] - ETA: 3s - loss: 1.5911
 161/1500 [==>...........................] - ETA: 3s - loss: 1.2870
 202/1500 [===>..........................] - ETA: 3s - loss: 1.0639
 236/1500 [===>..........................] - ETA: 3s - loss: 0.9408
 269/1500 [====>.........................] - ETA: 3s - loss: 0.8484
 302/1500 [=====>........................] - ETA: 3s - loss: 0.7764
 341/1500 [=====>........................] - ETA: 3s - loss: 0.7088
Trial TF_ray_test_d3d32_00003 reported accuracy=0.97 with parameters={'kernel_size': 3}.
[2m[36m(func pid=2451052)[0m Epoch 2/5
  29/1500 [..............................] - ETA: 2s - loss: 0.0704 
  78/1500 [>.............................] - ETA: 2s - loss: 0.0751
 120/1500 [=>............................] - 

[2m[36m(func pid=2452048)[0m 2023-03-21 13:35:52.768218: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6451 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1


[2m[36m(func pid=2452048)[0m Epoch 1/5


[2m[36m(func pid=2452048)[0m 2023-03-21 13:35:59.772056: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8600


   1/1500 [..............................] - ETA: 33:49 - loss: 42.4794
  40/1500 [..............................] - ETA: 3s - loss: 5.2141   
  74/1500 [>.............................] - ETA: 4s - loss: 3.1058
 108/1500 [=>............................] - ETA: 4s - loss: 2.2616
 144/1500 [=>............................] - ETA: 3s - loss: 1.7731
 182/1500 [==>...........................] - ETA: 3s - loss: 1.4765
 216/1500 [===>..........................] - ETA: 3s - loss: 1.2871
 249/1500 [===>..........................] - ETA: 3s - loss: 1.1527
 280/1500 [====>.........................] - ETA: 3s - loss: 1.0518
 328/1500 [=====>........................] - ETA: 3s - loss: 0.9343
 345/1500 [=====>........................] - ETA: 3s - loss: 0.8986
Trial TF_ray_test_d3d32_00004 reported accuracy=0.97 with parameters={'kernel_size': 7}.
[2m[36m(func pid=2452048)[0m Epoch 2/5
  26/1500 [..............................] - ETA: 3s - loss: 0.0379 
  63/1500 [>.............................] - 

[2m[36m(func pid=2453110)[0m 2023-03-21 13:36:35.818244: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6451 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1


[2m[36m(func pid=2453110)[0m Epoch 1/5


[2m[36m(func pid=2453110)[0m 2023-03-21 13:36:42.813204: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8600


  24/1500 [..............................] - ETA: 3s - loss: 8.5141    
  60/1500 [>.............................] - ETA: 3s - loss: 3.7278
  93/1500 [>.............................] - ETA: 3s - loss: 2.5311
 132/1500 [=>............................] - ETA: 3s - loss: 1.8510
 171/1500 [==>...........................] - ETA: 3s - loss: 1.4804
 206/1500 [===>..........................] - ETA: 3s - loss: 1.2575
 241/1500 [===>..........................] - ETA: 3s - loss: 1.0985
 278/1500 [====>.........................] - ETA: 3s - loss: 0.9833
 311/1500 [=====>........................] - ETA: 3s - loss: 0.8960
 345/1500 [=====>........................] - ETA: 3s - loss: 0.8249
Trial TF_ray_test_d3d32_00005 reported accuracy=0.98 with parameters={'kernel_size': 4}.
[2m[36m(func pid=2453110)[0m Epoch 2/5
   1/1500 [..............................] - ETA: 52s - loss: 0.0174
  43/1500 [..............................] - ETA: 3s - loss: 0.0456
  80/1500 [>.............................] - ETA

[2m[36m(func pid=2454088)[0m 2023-03-21 13:37:16.823824: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6451 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1


[2m[36m(func pid=2454088)[0m Epoch 1/5


[2m[36m(func pid=2454088)[0m 2023-03-21 13:37:23.845918: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8600


  14/1500 [..............................] - ETA: 5s - loss: 18.8455   
  56/1500 [>.............................] - ETA: 3s - loss: 5.2068
  90/1500 [>.............................] - ETA: 4s - loss: 3.3540
 126/1500 [=>............................] - ETA: 3s - loss: 2.4817
 161/1500 [==>...........................] - ETA: 3s - loss: 1.9951
 194/1500 [==>...........................] - ETA: 3s - loss: 1.6924
 226/1500 [===>..........................] - ETA: 3s - loss: 1.4908
 258/1500 [====>.........................] - ETA: 3s - loss: 1.3372
 288/1500 [====>.........................] - ETA: 3s - loss: 1.2227
 318/1500 [=====>........................] - ETA: 3s - loss: 1.1223
 349/1500 [=====>........................] - ETA: 3s - loss: 1.0376
Trial TF_ray_test_d3d32_00006 reported accuracy=0.98 with parameters={'kernel_size': 5}.
[2m[36m(func pid=2454088)[0m Epoch 2/5
  23/1500 [..............................] - ETA: 3s - loss: 0.0705 
  65/1500 [>.............................] - ETA

[2m[36m(func pid=2455115)[0m 2023-03-21 13:37:59.815096: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6451 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1


[2m[36m(func pid=2455115)[0m Epoch 1/5


[2m[36m(func pid=2455115)[0m 2023-03-21 13:38:06.699339: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8600


   1/1500 [..............................] - ETA: 32:15 - loss: 43.2548
  43/1500 [..............................] - ETA: 3s - loss: 5.2201   
  76/1500 [>.............................] - ETA: 3s - loss: 3.1726
 108/1500 [=>............................] - ETA: 4s - loss: 2.3466
 141/1500 [=>............................] - ETA: 4s - loss: 1.8757
 178/1500 [==>...........................] - ETA: 3s - loss: 1.5406
 211/1500 [===>..........................] - ETA: 3s - loss: 1.3394
 244/1500 [===>..........................] - ETA: 3s - loss: 1.1898
 283/1500 [====>.........................] - ETA: 3s - loss: 1.0534
 314/1500 [=====>........................] - ETA: 3s - loss: 0.9669
 345/1500 [=====>........................] - ETA: 3s - loss: 0.8947
Trial TF_ray_test_d3d32_00007 reported accuracy=0.97 with parameters={'kernel_size': 7}.
[2m[36m(func pid=2455115)[0m Epoch 2/5
  23/1500 [..............................] - ETA: 3s - loss: 0.0663 
  59/1500 [>.............................] - 

[2m[36m(func pid=2456138)[0m 2023-03-21 13:38:42.818620: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6451 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1


[2m[36m(func pid=2456138)[0m Epoch 1/5


[2m[36m(func pid=2456138)[0m 2023-03-21 13:38:49.840862: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8600


   1/1500 [..............................] - ETA: 32:54 - loss: 35.6392
  39/1500 [..............................] - ETA: 3s - loss: 6.5240  
  75/1500 [>.............................] - ETA: 3s - loss: 3.5956
 111/1500 [=>............................] - ETA: 3s - loss: 2.5455
 145/1500 [=>............................] - ETA: 3s - loss: 2.0069
 180/1500 [==>...........................] - ETA: 3s - loss: 1.6629
 213/1500 [===>..........................] - ETA: 3s - loss: 1.4410
 248/1500 [===>..........................] - ETA: 3s - loss: 1.2633
 283/1500 [====>.........................] - ETA: 3s - loss: 1.1328
 316/1500 [=====>........................] - ETA: 3s - loss: 1.0329
Trial TF_ray_test_d3d32_00008 reported accuracy=0.97 with parameters={'kernel_size': 5}.
[2m[36m(func pid=2456138)[0m Epoch 2/5
   1/1500 [..............................] - ETA: 48s - loss: 0.0310
  42/1500 [..............................] - ETA: 3s - loss: 0.0829
  77/1500 [>.............................] - E

[2m[36m(func pid=2457118)[0m 2023-03-21 13:39:23.918716: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6451 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1


[2m[36m(func pid=2457118)[0m Epoch 1/5


[2m[36m(func pid=2457118)[0m 2023-03-21 13:39:30.960036: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8600


  18/1500 [..............................] - ETA: 4s - loss: 6.2048    
  50/1500 [>.............................] - ETA: 4s - loss: 3.0226
  85/1500 [>.............................] - ETA: 4s - loss: 1.9757
 108/1500 [=>............................] - ETA: 4s - loss: 1.6442
 143/1500 [=>............................] - ETA: 4s - loss: 1.3088
 186/1500 [==>...........................] - ETA: 3s - loss: 1.0716
 205/1500 [===>..........................] - ETA: 3s - loss: 0.9919
 248/1500 [===>..........................] - ETA: 3s - loss: 0.8563
 286/1500 [====>.........................] - ETA: 3s - loss: 0.7646
 319/1500 [=====>........................] - ETA: 3s - loss: 0.7044
Trial TF_ray_test_d3d32_00009 reported accuracy=0.97 with parameters={'kernel_size': 7}.
[2m[36m(func pid=2457118)[0m Epoch 2/5
  25/1500 [..............................] - ETA: 3s - loss: 0.0561 
  66/1500 [>.............................] - ETA: 3s - loss: 0.0737
  98/1500 [>.............................] - ETA

[2m[36m(func pid=2458187)[0m 2023-03-21 13:40:06.880747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6451 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1


[2m[36m(func pid=2458187)[0m Epoch 1/5


[2m[36m(func pid=2458187)[0m 2023-03-21 13:40:13.919871: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8600


  19/1500 [..............................] - ETA: 4s - loss: 11.7085   
  53/1500 [>.............................] - ETA: 4s - loss: 4.8593
  85/1500 [>.............................] - ETA: 4s - loss: 3.2135
 117/1500 [=>............................] - ETA: 4s - loss: 2.4354
 131/1500 [=>............................] - ETA: 4s - loss: 2.2166
 161/1500 [==>...........................] - ETA: 4s - loss: 1.8487
 195/1500 [==>...........................] - ETA: 4s - loss: 1.5716
 227/1500 [===>..........................] - ETA: 4s - loss: 1.3855
 256/1500 [====>.........................] - ETA: 4s - loss: 1.2474
 289/1500 [====>.........................] - ETA: 3s - loss: 1.1344
 320/1500 [=====>........................] - ETA: 3s - loss: 1.0504
Trial TF_ray_test_d3d32_00010 reported accuracy=0.97 with parameters={'kernel_size': 6}.
[2m[36m(func pid=2458187)[0m Epoch 2/5
  24/1500 [..............................] - ETA: 3s - loss: 0.0856 
  68/1500 [>.............................] - ETA

[2m[36m(func pid=2459217)[0m 2023-03-21 13:40:49.864912: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6451 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1


[2m[36m(func pid=2459217)[0m Epoch 1/5


[2m[36m(func pid=2459217)[0m 2023-03-21 13:40:56.746343: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8600


   1/1500 [..............................] - ETA: 33:30 - loss: 21.0995
  48/1500 [..............................] - ETA: 3s - loss: 3.9327   
  84/1500 [>.............................] - ETA: 3s - loss: 2.5111
 113/1500 [=>............................] - ETA: 3s - loss: 1.9907
 144/1500 [=>............................] - ETA: 3s - loss: 1.6590
 172/1500 [==>...........................] - ETA: 4s - loss: 1.4339
 207/1500 [===>..........................] - ETA: 3s - loss: 1.2352
 249/1500 [===>..........................] - ETA: 3s - loss: 1.0745
 287/1500 [====>.........................] - ETA: 3s - loss: 0.9609
 321/1500 [=====>........................] - ETA: 3s - loss: 0.8880
Trial TF_ray_test_d3d32_00011 reported accuracy=0.96 with parameters={'kernel_size': 7}.
[2m[36m(func pid=2459217)[0m Epoch 2/5
  25/1500 [..............................] - ETA: 3s - loss: 0.0865 
  67/1500 [>.............................] - ETA: 3s - loss: 0.0957
 131/1500 [=>............................] - 

[2m[36m(func pid=2460315)[0m 2023-03-21 13:41:33.890268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6451 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1


[2m[36m(func pid=2460315)[0m Epoch 1/5


[2m[36m(func pid=2460315)[0m 2023-03-21 13:41:40.949631: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8600


   1/1500 [..............................] - ETA: 32:53 - loss: 29.8278
  36/1500 [..............................] - ETA: 4s - loss: 5.6392  
  82/1500 [>.............................] - ETA: 4s - loss: 2.7257
  99/1500 [>.............................] - ETA: 4s - loss: 2.3110
 114/1500 [=>............................] - ETA: 4s - loss: 2.0365
 149/1500 [=>............................] - ETA: 4s - loss: 1.6252
 184/1500 [==>...........................] - ETA: 4s - loss: 1.3798
 217/1500 [===>..........................] - ETA: 3s - loss: 1.1919
 256/1500 [====>.........................] - ETA: 3s - loss: 1.0518
 299/1500 [====>.........................] - ETA: 3s - loss: 0.9273
 343/1500 [=====>........................] - ETA: 3s - loss: 0.8406
Trial TF_ray_test_d3d32_00012 reported accuracy=0.97 with parameters={'kernel_size': 5}.
[2m[36m(func pid=2460315)[0m Epoch 2/5
   1/1500 [..............................] - ETA: 50s - loss: 0.0034
  45/1500 [..............................] - E

[2m[36m(func pid=2461317)[0m 2023-03-21 13:42:16.898369: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6451 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1


[2m[36m(func pid=2461317)[0m Epoch 1/5


[2m[36m(func pid=2461317)[0m 2023-03-21 13:42:24.209777: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8600


  19/1500 [..............................] - ETA: 4s - loss: 8.1153    
  54/1500 [>.............................] - ETA: 4s - loss: 3.1697
  87/1500 [>.............................] - ETA: 4s - loss: 2.1095
 121/1500 [=>............................] - ETA: 4s - loss: 1.5926
 156/1500 [==>...........................] - ETA: 4s - loss: 1.2759
 188/1500 [==>...........................] - ETA: 3s - loss: 1.1038
 225/1500 [===>..........................] - ETA: 3s - loss: 0.9545
 263/1500 [====>.........................] - ETA: 3s - loss: 0.8427
 299/1500 [====>.........................] - ETA: 3s - loss: 0.7587
 317/1500 [=====>........................] - ETA: 3s - loss: 0.7267
Trial TF_ray_test_d3d32_00013 reported accuracy=0.98 with parameters={'kernel_size': 5}.
[2m[36m(func pid=2461317)[0m Epoch 2/5
  24/1500 [..............................] - ETA: 3s - loss: 0.0425 
  66/1500 [>.............................] - ETA: 3s - loss: 0.0640
 105/1500 [=>............................] - ETA

[2m[36m(func pid=2462325)[0m 2023-03-21 13:42:57.885372: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6451 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1


[2m[36m(func pid=2462325)[0m Epoch 1/5


[2m[36m(func pid=2462325)[0m 2023-03-21 13:43:04.880829: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8600


  21/1500 [..............................] - ETA: 3s - loss: 21.4343   
  54/1500 [>.............................] - ETA: 4s - loss: 8.8194 
  87/1500 [>.............................] - ETA: 4s - loss: 5.6271
 119/1500 [=>............................] - ETA: 4s - loss: 4.2026
 151/1500 [==>...........................] - ETA: 4s - loss: 3.3741
 185/1500 [==>...........................] - ETA: 4s - loss: 2.7944
 220/1500 [===>..........................] - ETA: 3s - loss: 2.3904
 255/1500 [====>.........................] - ETA: 3s - loss: 2.0964
 300/1500 [=====>........................] - ETA: 3s - loss: 1.8091
 342/1500 [=====>........................] - ETA: 3s - loss: 1.6063
Trial TF_ray_test_d3d32_00014 reported accuracy=0.97 with parameters={'kernel_size': 3}.
[2m[36m(func pid=2462325)[0m Epoch 2/5
  24/1500 [..............................] - ETA: 3s - loss: 0.0624 
  64/1500 [>.............................] - ETA: 3s - loss: 0.0611
  97/1500 [>.............................] - ET

[2m[36m(func pid=2463330)[0m 2023-03-21 13:43:36.958527: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6451 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1


[2m[36m(func pid=2463330)[0m Epoch 1/5


[2m[36m(func pid=2463330)[0m 2023-03-21 13:43:44.118539: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8600


  18/1500 [..............................] - ETA: 4s - loss: 7.3311    
  52/1500 [>.............................] - ETA: 4s - loss: 2.9736
  84/1500 [>.............................] - ETA: 4s - loss: 1.9641
 121/1500 [=>............................] - ETA: 4s - loss: 1.4497
 152/1500 [==>...........................] - ETA: 4s - loss: 1.2127
 183/1500 [==>...........................] - ETA: 4s - loss: 1.0445
 215/1500 [===>..........................] - ETA: 3s - loss: 0.9219
 248/1500 [===>..........................] - ETA: 3s - loss: 0.8201
 290/1500 [====>.........................] - ETA: 3s - loss: 0.7238
 331/1500 [=====>........................] - ETA: 3s - loss: 0.6524
Trial TF_ray_test_d3d32_00015 reported accuracy=0.98 with parameters={'kernel_size': 5}.
[2m[36m(func pid=2463330)[0m Epoch 2/5
   1/1500 [..............................] - ETA: 51s - loss: 0.0185
  43/1500 [..............................] - ETA: 3s - loss: 0.0685
  79/1500 [>.............................] - ETA

[2m[36m(func pid=2464340)[0m 2023-03-21 13:44:18.928883: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6451 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1


[2m[36m(func pid=2464340)[0m Epoch 1/5


[2m[36m(func pid=2464340)[0m 2023-03-21 13:44:25.911178: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8600


  20/1500 [..............................] - ETA: 4s - loss: 7.5585    
  57/1500 [>.............................] - ETA: 3s - loss: 3.1331
  95/1500 [>.............................] - ETA: 3s - loss: 2.0602
 126/1500 [=>............................] - ETA: 3s - loss: 1.6477
 157/1500 [==>...........................] - ETA: 3s - loss: 1.3796
 190/1500 [==>...........................] - ETA: 3s - loss: 1.1984
 221/1500 [===>..........................] - ETA: 3s - loss: 1.0708
 253/1500 [====>.........................] - ETA: 3s - loss: 0.9646
 282/1500 [====>.........................] - ETA: 3s - loss: 0.8885
 296/1500 [====>.........................] - ETA: 3s - loss: 0.8567
 326/1500 [=====>........................] - ETA: 3s - loss: 0.7949
Trial TF_ray_test_d3d32_00016 reported accuracy=0.97 with parameters={'kernel_size': 7}.
[2m[36m(func pid=2464340)[0m Epoch 2/5
  27/1500 [..............................] - ETA: 2s - loss: 0.0877 
  69/1500 [>.............................] - ETA

[2m[36m(func pid=2465364)[0m 2023-03-21 13:45:01.898577: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6451 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1


[2m[36m(func pid=2465364)[0m Epoch 1/5


[2m[36m(func pid=2465364)[0m 2023-03-21 13:45:08.844714: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8600


  22/1500 [..............................] - ETA: 3s - loss: 11.8777   
  59/1500 [>.............................] - ETA: 3s - loss: 4.8011
  91/1500 [>.............................] - ETA: 3s - loss: 3.2228
 124/1500 [=>............................] - ETA: 3s - loss: 2.4336
 140/1500 [=>............................] - ETA: 4s - loss: 2.1795
 178/1500 [==>...........................] - ETA: 3s - loss: 1.7682
 217/1500 [===>..........................] - ETA: 3s - loss: 1.4857
 251/1500 [====>.........................] - ETA: 3s - loss: 1.3098
 283/1500 [====>.........................] - ETA: 3s - loss: 1.1812
 315/1500 [=====>........................] - ETA: 3s - loss: 1.0781
 347/1500 [=====>........................] - ETA: 3s - loss: 0.9920
Trial TF_ray_test_d3d32_00017 reported accuracy=0.97 with parameters={'kernel_size': 4}.
[2m[36m(func pid=2465364)[0m Epoch 2/5
   1/1500 [..............................] - ETA: 50s - loss: 0.0130
  52/1500 [>.............................] - ETA

[2m[36m(func pid=2466369)[0m 2023-03-21 13:45:41.924514: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6451 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1


[2m[36m(func pid=2466369)[0m Epoch 1/5


[2m[36m(func pid=2466369)[0m 2023-03-21 13:45:48.921572: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8600


  20/1500 [..............................] - ETA: 4s - loss: 10.8118   
  52/1500 [>.............................] - ETA: 4s - loss: 4.6253
  86/1500 [>.............................] - ETA: 4s - loss: 2.9453
 119/1500 [=>............................] - ETA: 4s - loss: 2.1930
 136/1500 [=>............................] - ETA: 4s - loss: 1.9524
 169/1500 [==>...........................] - ETA: 4s - loss: 1.6206
 203/1500 [===>..........................] - ETA: 3s - loss: 1.3810
 236/1500 [===>..........................] - ETA: 3s - loss: 1.2143
 270/1500 [====>.........................] - ETA: 3s - loss: 1.0896
 308/1500 [=====>........................] - ETA: 3s - loss: 0.9849
Trial TF_ray_test_d3d32_00018 reported accuracy=0.97 with parameters={'kernel_size': 5}.
[2m[36m(func pid=2466369)[0m Epoch 2/5
   1/1500 [..............................] - ETA: 53s - loss: 0.0161
  50/1500 [>.............................] - ETA: 3s - loss: 0.0500
  91/1500 [>.............................] - ETA

[2m[36m(func pid=2467346)[0m 2023-03-21 13:46:22.948498: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6448 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1


[2m[36m(func pid=2467346)[0m Epoch 1/5


[2m[36m(func pid=2467346)[0m 2023-03-21 13:46:30.085639: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8600


  21/1500 [..............................] - ETA: 3s - loss: 5.5651    
  69/1500 [>.............................] - ETA: 3s - loss: 2.1311
 133/1500 [=>............................] - ETA: 3s - loss: 1.3150
 173/1500 [==>...........................] - ETA: 3s - loss: 1.0852
 204/1500 [===>..........................] - ETA: 3s - loss: 0.9768
 248/1500 [===>..........................] - ETA: 3s - loss: 0.8515
 288/1500 [====>.........................] - ETA: 3s - loss: 0.7667
 321/1500 [=====>........................] - ETA: 3s - loss: 0.7149
Trial TF_ray_test_d3d32_00019 reported accuracy=0.96 with parameters={'kernel_size': 7}.
[2m[36m(func pid=2467346)[0m Epoch 2/5
  25/1500 [..............................] - ETA: 3s - loss: 0.1494 
  65/1500 [>.............................] - ETA: 3s - loss: 0.1066
 109/1500 [=>............................] - ETA: 3s - loss: 0.0900
 152/1500 [==>...........................] - ETA: 3s - loss: 0.0851
 216/1500 [===>..........................] - ETA

2023-03-21 13:47:00,813	INFO tune.py:758 -- Total run time: 832.12 seconds (831.49 seconds for the tuning loop).


In [8]:
# Finally, we can inspect the results of this experiment and have a look at the best model!
best_result = results.get_best_result(metric='accuracy', mode='max')
print(f"Best configuration: {best_result.config}")
print(f"Accuracy: {best_result.metrics['accuracy']}")
best_model_id = best_result.metrics['dry_id']
print(f"id: {best_model_id}")

Best configuration: {'kernel_size': 5}
Accuracy: 0.9792668269230769
id: b41fab6c-9b2b-4ea9-86d8-51fee82ccf7e


In [9]:
# Refresh the repository
repo.load_objects_from_directory()

In [10]:
# Fetch the best performing model
model = repo.get_obj_by_id(best_model_id)

In [11]:
# Lets define a method to test the model's accuracy
@dryml.compute
def test_model(model):
    import dryml.metrics
    import tensorflow_datasets as tfds
    from dryml.data.tf import TFDataset

    # Check whether tensorflow support exists
    # For the current GPU.
    dryml.context.context_check({'tf': {}})

    (ds_test,) = tfds.load(
        'mnist',
        split=['test'],
        shuffle_files=True,
        as_supervised=True)

    test_ds = TFDataset(
        ds_test,
        supervised=True
    )

    return dryml.metrics.scalar.categorical_accuracy(model, test_ds)

In [12]:
# And verify recorded accuracy!
test_model(model)

0.9792668269230769