In [1]:
import dryml
from dryml import ObjectDef
import numpy as np

# DRYML Tutorial 4

## DRYML Contexts 2

We've had some experience with contexts, but now we'll discuss how we can avoid creating a context in our current process. This allows us to possibly change how resources are distributed depending on the model. The `dryml.compute_context` decorator generator is provided by DRYML which gives a wrapped method the power to inspect existing compute contexts, or launch itself in a new process with an appropriate context. This makes it easy to interleave code requiring a context with manager code which may require running a variety of models that could have conflicting context requirements.

Now, `dryml.compute_context` is actually a decorator generator meaning, you need to call it to create the decorator you want to use. This allows the user to customize how a given method gets wrapped, and customizes how compute contexts are spawned. DRYML also provides the `dryml.compute` decorator which is just a shortcut to `compute_context()` when generic behavior is fine.

`compute_context` has a couple of important arguments which can be specified when the decorator is created (when calling `compute_context`), and can be overridden when actually calling the function.
* `ctx_context_reqs`: Probably the most important, specifies a specific set of `context_reqs` to use when checking for an existing context or launching a new context. Override at call time with `call_context_reqs`.
* `ctx_use_existing_context` (Default `True`): When `True`, DRYML should try to use an existing context if available. If the existing context doesn't satisfy the given requirements, it will raise a `WrongContextError` exception rather than create a new context. Override at call time with `call_use_existing_context`
* `ctx_dont_create_context` (Default `False`): When `False`, DRYML won't try to create a new context ever. if no context exists, it'll throw a `NoContextError` exception, and if the existing context doesn't satisfy the given requirements, it will raise a `WrongContextError` exception. Override at call time with `call_dont_create_context`
* `ctx_update_objs` (Default `False`): When `True`, DRYML will update objects in the current process with the state of corresponding objects in the remove process upon completion. Override at call time with `call_update_objs`
* `ctx_verbose` (Default `False`): When `True`, DRYML will print some diagnostic information about the whole compute procedure. Override at call time with `call_verbose`.


First, we'll create a function to create our mnist dataset which will first check if an appropriate context is in place. Then we'll write a function for training a model and we'll use the `compute_context` decorator to indicate that this method needs a compute context. Then we'll create three simple models using `sklearn`, `tensorflow`, and `pytorch`. We will then explore the statistical variations of these models by training multiple copies of them and measuring their accuracies.

One complication we'll need to overcome, is that by default, DRYML uses the `spawn` method for creating new subprocesses. This means an entirely new python process is created, and any function definitions defined in this Jupyter notebook will not be available. Typically the way to solve this is to create a python module and place our needed functions in there, however we don't want our users to open an external editor just for the sake of one method. So we've come up with a workaround here you can take with you and use when needed in other contexts!

In [2]:
# Create function to generate datasets
def gen_dataset():
    # import some names
    import dryml
    import tensorflow_datasets as tfds
    from dryml.data.tf import TFDataset

    # Check that the context has tensorflow ability, but don't get specific.
    dryml.context.context_check({'tf': {}})

    (ds_train, ds_test), ds_info = tfds.load(
        'mnist',
        split=['train', 'test'],
        shuffle_files=True,
        as_supervised=True,
        with_info=True)
    
    train_ds = TFDataset(
        ds_train,
        supervised=True,
    )
    
    test_ds = TFDataset(
        ds_test,
        supervised=True,
    )
    
    return train_ds, test_ds

class temp_mod_file(object):
    def __init__(self, name: str):
        self.name = name
        self.file = open(self.name, mode='w')

    def write_obj_source(self, obj):
        import inspect
        obj_source = inspect.getsource(obj)
        self.file.write(obj_source)
        self.file.write("\n")
        self.file.flush()

    def __del__(self):
        self.file.close()
        import os
        os.remove(self.name)
        del self.file

# Delete original file
try:
    del mod_file
except NameError:
    pass
mod_file = temp_mod_file('temp_mod.py')
mod_file.write_obj_source(gen_dataset)
del gen_dataset

In [3]:
# Create function to train a model.
# We use ctx_update_objs=True to indicate any objects we give the method should be updated with their
# state at the end of the method.
@dryml.compute_context(ctx_update_objs=True)
def train_model(model):
    from temp_mod import gen_dataset
    train_ds, _ = gen_dataset()
    
    model.prep_train()
    model.train(train_ds)


# Create function to test model
# Since this method doesn't change the models, we don't have to update them after calling it.
@dryml.compute
def test_model(model):
    from dryml.metrics import categorical_accuracy
    from temp_mod import gen_dataset
    _, test_ds = gen_dataset()
    
    model.prep_eval()
    return categorical_accuracy(model, test_ds)

## Create ML Models

Now we'll create a few model classes using `ObjectDef`s. We'll then use `ObjectDef.build` to create instances of these models.

In [4]:
import dryml.models
import dryml.data
import dryml.models.sklearn
import sklearn.neighbors

In [5]:
# Let's define some common processing steps so we don't have to build full definitions for them every time.
flatten_def = ObjectDef(dryml.data.transforms.Flatten)
best_cat_def = ObjectDef(dryml.data.transforms.BestCat)

In [6]:
# First, we'll build an sklearn model.
sklearn_mdl_def = ObjectDef(
    dryml.models.Pipe,
    flatten_def,
    ObjectDef(
        dryml.models.sklearn.Trainable,
        model=ObjectDef(
            dryml.models.sklearn.ClassifierModel,
            sklearn.neighbors.KNeighborsClassifier,
            n_neighbors=10,
        ),
        train_fn=ObjectDef(
            dryml.models.sklearn.BasicTraining,
            num_examples=500,
            shuffle=True,
            shuffle_buffer_size=5000,
        )
    ),
    best_cat_def,
)

In [7]:
# Now, we can generate, train and test a model.
simple_tf_reqs = {'tf': {}}
temp_model = sklearn_mdl_def.build()
train_model(temp_model, call_context_reqs=simple_tf_reqs)
test_model(temp_model, call_context_reqs=simple_tf_reqs)

2022-09-28 10:13:40.056233: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.


0.8205128205128205

In [8]:
def train_multiple(model_def=None, num_to_train=None, ctx_reqs=None):
    models = []
    accuracies = []
    for i in range(num_to_train):
        new_model = model_def.build()
        train_model(new_model, call_context_reqs=ctx_reqs)
        acc = test_model(new_model, call_context_reqs=ctx_reqs)
        accuracies.append(acc)
        models.append(new_model)

    return models, accuracies

In [9]:
num_to_train = 5

Now, let's write a function which takes a definition, trains some number of models, tests them and returns the trained models as well as the mean accuracy and accuracy deviation.

In [10]:
sklearn_models, sklearn_accuracies = train_multiple(
    model_def=sklearn_mdl_def,
    num_to_train=num_to_train,
    ctx_reqs={'tf': {}})

2022-09-28 10:13:48.730414: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2022-09-28 10:13:57.274460: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2022-09-28 10:14:05.671212: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. I

In [11]:
# compute accuracy mean/stddev
print(f"sklearn accuracy: {np.mean(sklearn_accuracies)}+/-{np.std(sklearn_accuracies)}")

sklearn accuracy: 0.8109575320512821+/-0.009176691219372033


Now, let's build a tensorflow model

In [12]:
import tensorflow as tf
import dryml.models.tf

In [13]:
mdl_def = ObjectDef(
    dryml.models.tf.keras.SequentialFunctionalModel,
    input_shape=(28, 28, 1),
    layer_defs=[
        ['Conv2D', {'filters': 16, 'kernel_size': 3, 'activation': 'relu'}],
        ['Conv2D', {'filters': 16, 'kernel_size': 3, 'activation': 'relu'}],
        ['Flatten', {}],
        ['Dense', {'units': 10, 'activation': 'linear'}],
    ]
)
tf_mdl_def = ObjectDef(
    dryml.models.Pipe,
    ObjectDef(
        dryml.models.tf.keras.Trainable,
        model=mdl_def,
        train_fn=ObjectDef(
            dryml.models.tf.keras.BasicTraining,
            epochs=2
        ),
        optimizer=ObjectDef(
            dryml.models.tf.ObjectWrapper,
            tf.keras.optimizers.Adam,
        ),
        loss=ObjectDef(
            dryml.models.tf.ObjectWrapper,
            tf.keras.losses.SparseCategoricalCrossentropy,
            obj_kwargs={
                'from_logits': True,
            }
        )
    ),
    ObjectDef(
        dryml.data.transforms.BestCat
    )
)

In [14]:
tf_models, tf_accuracies = train_multiple(
    model_def=tf_mdl_def,
    num_to_train=num_to_train,
    ctx_reqs={'tf': {'gpu/1': 1.}}
)

2022-09-28 10:14:32.557766: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 7369 MB memory:  -> device: 1, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:03:00.0, compute capability: 6.1


Epoch 1/2


2022-09-28 10:14:39.641183: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8500


Epoch 2/2


2022-09-28 10:14:52.909037: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 7369 MB memory:  -> device: 1, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:03:00.0, compute capability: 6.1
2022-09-28 10:14:57.836955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 7369 MB memory:  -> device: 1, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:03:00.0, compute capability: 6.1


Epoch 1/2


2022-09-28 10:15:04.720139: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8500


Epoch 2/2


2022-09-28 10:15:18.332227: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 7369 MB memory:  -> device: 1, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:03:00.0, compute capability: 6.1
2022-09-28 10:15:23.170913: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 7369 MB memory:  -> device: 1, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:03:00.0, compute capability: 6.1


Epoch 1/2


2022-09-28 10:15:29.976351: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8500


Epoch 2/2


2022-09-28 10:15:42.584342: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 7369 MB memory:  -> device: 1, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:03:00.0, compute capability: 6.1
2022-09-28 10:15:47.596784: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 7369 MB memory:  -> device: 1, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:03:00.0, compute capability: 6.1


Epoch 1/2


2022-09-28 10:15:54.816107: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8500


Epoch 2/2


2022-09-28 10:16:07.720902: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 7369 MB memory:  -> device: 1, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:03:00.0, compute capability: 6.1
2022-09-28 10:16:12.781477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 7369 MB memory:  -> device: 1, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:03:00.0, compute capability: 6.1


Epoch 1/2


2022-09-28 10:16:19.580021: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8500


Epoch 2/2


2022-09-28 10:16:32.670055: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 7369 MB memory:  -> device: 1, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:03:00.0, compute capability: 6.1


In [15]:
# compute accuracy mean/stddev
print(f"tf accuracy: {np.mean(tf_accuracies)}+/-{np.std(tf_accuracies)}")

tf accuracy: 0.9724959935897436+/-0.0018658863817620085


And now, let's have a look at a similar pytorch model, We'll have to add another step to change the order of the indicies of the data since pytorch expects data in nchw format while tensorflow uses nhwc format. We'll also have to add a `TorchDevice` transformation to make sure the data is on the cpu.

In [16]:
import dryml.models.torch
import dryml.data.torch
import torch

  from .autonotebook import tqdm as notebook_tqdm


In [17]:
mdl_def = ObjectDef(
    dryml.models.torch.generic.Sequential,
    layer_defs=[
        [torch.nn.LazyConv2d, (16, 3), {}],
        [torch.nn.ReLU, (), {}],
        [torch.nn.LazyConv2d, (16, 3), {}],
        [torch.nn.ReLU, (), {}],
        [torch.nn.Flatten, (), {}],
        [torch.nn.LazyLinear, (10,), {}],
    ]
)
torch_mdl_def = ObjectDef(
    dryml.models.Pipe,
    ObjectDef(
        dryml.data.transforms.Transpose,
        axes=(2, 0, 1)
    ),
    ObjectDef(
        dryml.data.transforms.Cast,
        dtype='float32'
    ),
    ObjectDef(
        dryml.models.torch.generic.Trainable,
        model=mdl_def,
        train_fn=ObjectDef(
            dryml.models.torch.generic.BasicTraining,
            optimizer=ObjectDef(
                dryml.models.torch.generic.TorchOptimizer,
                torch.optim.Adam,
                mdl_def,
            ),
            loss=ObjectDef(
                dryml.models.torch.generic.TorchObject,
                torch.nn.CrossEntropyLoss
            )
        )
    ),
    ObjectDef(
        dryml.data.torch.transforms.TorchDevice,
        device='cpu'
    ),
    ObjectDef(
        dryml.data.transforms.BestCat
    )
)

In [18]:
torch_models, torch_accuracies = train_multiple( model_def=torch_mdl_def,
    num_to_train=num_to_train,
    ctx_reqs={'tf': {}, 'torch': {'gpu/1': 1.}}
)

100%|██████████| 1875/1875 [00:16<00:00, 111.09it/s, loss=0.00905]


Epoch 1 - Average Loss: 0.009048364644496662


100%|██████████| 1875/1875 [00:17<00:00, 109.30it/s, loss=0.00799]


Epoch 1 - Average Loss: 0.007985602132917848


100%|██████████| 1875/1875 [00:17<00:00, 108.86it/s, loss=0.00789]

Epoch 1 - Average Loss: 0.007888825566699962



100%|██████████| 1875/1875 [00:17<00:00, 109.16it/s, loss=0.00702]


Epoch 1 - Average Loss: 0.007018250606326425


100%|██████████| 1875/1875 [00:17<00:00, 109.60it/s, loss=0.00712]

Epoch 1 - Average Loss: 0.007122587668206446





In [19]:
# compute accuracy mean/stddev
print(f"torch accuracy: {np.mean(torch_accuracies)}+/-{np.std(torch_accuracies)}")

torch accuracy: 0.9678285256410255+/-0.002065730925720642
