2. Image classification
-----------------------

Let us begin with a classic machine learning task: Image classification
with Convolutional Neural Networks (CNN). The general idea is as
follows: 1. Train a CNN on normal training data. Evaluate its
performance on a conventional ("unmixed") validation set and on a MixUp
("mixed") version of the same validation set. 2. Train a CNN on MixUp
training data. Evaluate its performance on both unmixed and mixed
validation data.

When training on MixUp training data, we compute a new MixUp of each
batch in every epoch. As explained in the introduction, this effectively
augments the training set and hopefully makes the network more robust.
Evaluating the performance of both networks on unmixed and mixed
validation data allows us to compare the generalization properties of
both networks, the working hypothesis being that training on MixUp data
enhances generalization. To reduce the dependence of our results on the
specific choice of hyperparameters, we train several CNNs with varying
numbers of convolutional and dense layers. This is done for both kinds
of training data (unmixed, mixed) in a distributed fashion using Ray
Tune.

In [None]:
# Imports
import tensorflow as tf
import numpy as np
from tensorflow import keras
from tensorflow.keras.layers import Dense,Conv2D,Flatten,BatchNormalization,Dropout
from tensorflow.keras import Sequential
from ray import tune
from ray.tune import CLIReporter
from sklearn.metrics import confusion_matrix
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from functools import partial

# Fixes the issue "AttributeError: 'ConsoleBuffer has no attribute 'fileno'"
import sys
sys.stdout.fileno = lambda: False

  

  

#### The data set

We will use the Intel Image Classification data set \[\[3\]\]. It
consists of 25k 150x150 RBG images from 6 different classes: buildings,
forest, glacier, mountain, sea, or street.

In [None]:
"""
The global parameters for training.
"""

img_height,img_width,channels = 32,32,3
batch_size = 32
train_data_dir,test_data_dir = "/dbfs/FileStore/tables/Group20/seg_train/seg_train/", "dbfs/FileStore/tables/Group20/seg_test/seg_test/"
num_classes = 6
alpha = 0.2 # Degree of mixup is ~ Beta(alpha,alpha)

  

  

#### MixUp data generator

To create MixUp data, we will define a custom data generator. It takes
an underlying image generator as argument, and outputs convex
combinations of two randomly selected (example,label) pairs drawn
according to the underlying generator.

In [None]:
import os, shutil
def copy_data():
  src = "/dbfs/FileStore/tables/Group20/seg_train/seg_train"
  dst = os.path.join(os.getcwd(), 'seg_train')
  print("Copying data/files to local horovod folder...")
  shutil.copytree(src, dst)
  print("Done with copying!")
  train_data_dir = dst

  src = "/dbfs/FileStore/tables/Group20/seg_test/seg_test"
  dst = os.path.join(os.getcwd(), 'seg_test')
  print("Copying data/files to local horovod folder...")
  shutil.copytree(src, dst)
  print("Done with copying!")
  test_data_dir = dst

  return train_data_dir,test_data_dir

  

  

>     Out[3]: <tensorflow.python.keras.engine.sequential.Sequential at 0x7f76ea86d650>

In [None]:
class MixupImageDataGenerator(tf.keras.utils.Sequence):
    def __init__(self, generator, directory, batch_size, img_height, img_width, alpha=0.2, subset=None):
        self.batch_size = batch_size
        self.batch_index = 0
        self.alpha = alpha

        # First iterator yielding tuples of (x, y)
        self.generator1 = generator.flow_from_directory(directory,
                                                        target_size=(
                                                            img_height, img_width),
                                                        class_mode="categorical",
                                                        batch_size=batch_size,
                                                        shuffle=True,
                                                        subset=subset)

        # Second iterator yielding tuples of (x, y)
        self.generator2 = generator.flow_from_directory(directory,
                                                        target_size=(
                                                            img_height, img_width),
                                                        class_mode="categorical",
                                                        batch_size=batch_size,
                                                        shuffle=True,
                                                        subset=subset)

        # Number of images across all classes in image directory.
        self.n = self.generator1.samples


    def __len__(self):
        # returns the number of batches
        return (self.n + self.batch_size - 1) // self.batch_size

    def __getitem__(self, index):
        # Get a pair of inputs and outputs from two iterators.
        X1, y1 = self.generator1.next()
        X2, y2 = self.generator2.next()


        # random sample the lambda value from beta distribution.
        l = np.random.beta(self.alpha, self.alpha, X1.shape[0])

        X_l = l.reshape(X1.shape[0], 1, 1, 1)
        y_l = l.reshape(X1.shape[0], 1)


        # Perform the mixup.
        X = X1 * X_l + X2 * (1 - X_l)
        y = y1 * y_l + y2 * (1 - y_l)
        return X, y

    def reset_index(self):
        """Reset the generator indexes array.
        """

        self.generator1._set_index_array()
        self.generator2._set_index_array()


    def on_epoch_end(self):
        self.reset_index()

In [None]:
"""
A method that gives us the different dataloaders that we need for training and validation.

With for_training set to True, the model gives us the dataloaders
* train_mix_loader: Gives us mixed data for training
* train_loader:     Gives us the unmixed training data
* val_mix_loader:   Gives us mixed validation data
* val_loader:       Gives us unmixed validation data

By setting for_training to False, the method gives us the dataloader
* test_loader: Unmixed and unshuffled dataloader for the testing data. The reason for not shuffeling the data is in order to simplify the validation process.
"""
def get_data_loaders(train_data_dir,test_data_dir,for_training = True):
  
    #For training data
    if for_training:
        datagen_train_val = ImageDataGenerator(rescale=1./255,
                                rotation_range=5,
                                width_shift_range=0.05,
                                height_shift_range=0,
                                shear_range=0.05,
                                zoom_range=0,
                                brightness_range=(1, 1.3),
                                horizontal_flip=True,
                                fill_mode='nearest',
                                validation_split=0.1)

        train_mix_loader = MixupImageDataGenerator(generator = datagen_train_val,
                                                   directory = train_data_dir,
                                                   batch_size = batch_size,
                                                   img_height = img_height,
                                                   img_width = img_width,
                                                   alpha=alpha,
                                                   subset="training")
        
        val_mix_loader = MixupImageDataGenerator(generator = datagen_train_val,
                                                 directory = train_data_dir,
                                                 batch_size = batch_size,
                                                 img_height = img_height,
                                                 img_width = img_width,
                                                 alpha=alpha,
                                                 subset="validation")

        train_loader = datagen_train_val.flow_from_directory(train_data_dir,
                                                        target_size=(img_height, img_width),
                                                        class_mode="categorical",
                                                        batch_size=batch_size,
                                                        shuffle=True,
                                                        subset="training")

        val_loader = datagen_train_val.flow_from_directory(train_data_dir,
                                                        target_size=(img_height, img_width),
                                                        class_mode="categorical",
                                                        batch_size=batch_size,
                                                        shuffle=True,
                                                        subset="validation")
        
        return train_mix_loader,train_loader, val_mix_loader, val_loader

    #For test data
    else:
        datagen_test = ImageDataGenerator(rescale=1./255,
                                rotation_range=0,
                                width_shift_range=0,
                                height_shift_range=0,
                                shear_range=0,
                                zoom_range=0,
                                brightness_range=(1, 1),
                                horizontal_flip=False,
                                fill_mode='nearest',
                                validation_split=0)

        test_loader = datagen_test.flow_from_directory(test_data_dir,
                                                    target_size=(img_height, img_width),
                                                        class_mode="categorical",
                                                        batch_size=batch_size,
                                                        shuffle=False,
                                                        subset=None)

        return test_loader

  

  

##### Network architecture

In [None]:
"""
creates the CNN with number_conv convolutional layers followed by number_dense dense layers. The model is compiled with a SGD optimizer and a categorical crossentropy loss.
"""
def create_model(number_conv,number_dense):
    model = Sequential()
    model.add(Conv2D(24,kernel_size = 3, activation='relu',padding="same", input_shape=(img_height, img_width,channels)))
    model.add(BatchNormalization())
    for s in range(1,number_conv):
        model.add(Conv2D(24+12*s,kernel_size = 3,padding="same", activation = 'relu'))
        model.add(BatchNormalization())
    model.add(Flatten())
    model.add(Dropout(0.4))
    for s in range(number_dense):
        model.add(Dense(units=num_classes, activation='relu'))
        model.add(Dropout(0.4))
    model.add(BatchNormalization())
    model.add(Dense(num_classes,activation= "softmax"))
    model.compile(optimizer="adam", loss='categorical_crossentropy', metrics=['accuracy'])
    return model

  

  

#### Training function

\*\*This should be replaced by Horovod training function if we manage to
complete the installation. Now, I put the Olofs\_implementation
thing\*\*

In [None]:
def training_function(config, checkpoint_dir=None):
    # Hyperparameters
    number_conv, number_dense = config["number_conv"], config["number_dense"]
    train_with_mixed_data = config["train_with_mixed_data"]
    
    
    """
    Get the different dataloaders
    One with training data using mixing
    One with training without mixing
    One with validation data with mixing
    One with validation without mixing
    Set for_training to False to get testing data
    """
    #train_data_dir,test_data_dir = "/dbfs/FileStore/tables/Group20/seg_train/seg_train","/dbfs/FileStore/tables/Group20/seg_test/seg_test"

    #train_data_dir, test_data_dir = copy_data()
    train_mix_dataloader,train_dataloader,val_mix_dataloader,val_dataloader = get_data_loaders(train_data_dir, test_data_dir, for_training = True)

    """
    Construct the model based on hyperparameters
    """
    model = create_model( number_conv,number_dense )

    
    """
    Adds earlystopping to training. This is based on the performance accuracy on the validation dataset. Chould we have validation loss here?
    """
    callbacks = [tf.keras.callbacks.EarlyStopping(patience=10,monitor="val_accuracy",min_delta=0.01,restore_best_weights=True)]

    """
    Train the model and give the training history.
    """
    if train_with_mixed_data:
      history = model.fit_generator(train_mix_dataloader, validation_data = val_dataloader,callbacks = callbacks,verbose = True,epochs = 200)
    else:
      history = model.fit_generator(train_dataloader, validation_data = val_dataloader,callbacks = callbacks,verbose = True,epochs = 200)
    
    """
    Logg the results
    """
    #x_mix, y_mix = mixup_data( x_val, y_val)
    #mix_loss, mix_acc = model.evaluate( x_mix, y_mix )
    train_loss_unmix, train_acc_unmix = model.evaluate( train_dataloader )
    val_mix_loss, val_mix_acc = model.evaluate( val_mix_dataloader )
    ind_max = np.argmax(history.history['val_accuracy'])
    train_mix_acc = history.history['accuracy'][ind_max]
    train_loss = history.history['loss'][ind_max]
    val_acc = history.history['val_accuracy'][ind_max]
    val_loss = history.history['val_loss'][ind_max]
    
    tune.report(mean_loss=train_mix_loss, train_mix_accuracy = train_mix_acc, train_accuracy = train_acc_unmix, val_mix_accuracy = val_mix_acc, val_accuracy = val_acc)


In [None]:
#train_mix_dataloader,train_dataloader,val_mix_dataloader,val_dataloader = get_data_loaders(for_training = True)
train_data_dir,test_data_dir = copy_data()

  

>     Copying data/files to local horovod folder...
>     Done with copying!
>     Copying data/files to local horovod folder...
>     Done with copying!

  

### Connection between MixUp performance and generalization

First, we will train our neural networks using a standard procedure,
with normal training data. We then measure their performance on a
validation set as well as on a MixUp version of the same validation set,
the idea being to study the connection between these metrics.

In [None]:
training_function( config={"number_conv": 2, "number_dense": 2, "train_with_mixed_data": False} )

  

>     Found 12632 images belonging to 6 classes.
>     Found 12632 images belonging to 6 classes.
>     Found 1402 images belonging to 6 classes.
>     Found 1402 images belonging to 6 classes.
>     Found 12632 images belonging to 6 classes.
>     Found 1402 images belonging to 6 classes.

In [None]:
# Limit the number of rows.
reporter = CLIReporter(max_progress_rows=10)
# Add a custom metric column, in addition to the default metrics.
# Note that this must be a metric that is returned in your training results.
reporter.add_metric_column("val_mix_accuracy")
reporter.add_metric_column("val_accuracy")
reporter.add_metric_column("train_accuracy")
reporter.add_metric_column("train_mix_accuracy")

#config = {"number_conv" : 3,"number_dense" : 5}
#training_function(config)

#get_data_loaders()

analysis = tune.run(
    training_function,
    config={
        "number_conv": tune.grid_search(np.arange(2,7,2).tolist()),
        "number_dense": tune.grid_search(np.arange(0,3,1).tolist()),
        "train_with_mixed_data": False
    },
    local_dir='ray_results',
    progress_reporter=reporter
) 
  #resources_per_trial={'gpu': 1})

print("Best config: ", analysis.get_best_config(
    metric="val_accuracy", mode="max"))

#Get a dataframe for analyzing trial results.
df = analysis.results_df


  

>     2021-01-12 18:12:47,354	WARNING tune.py:409 -- Tune detects GPUs, but no trials are using GPUs. To enable trials to use GPUs, set tune.run(resources_per_trial={'gpu': 1}...) which allows Tune to expose 1 GPU to each trial. You can also override `Trainable.default_resource_request` if using the Trainable API.
>     == Status ==
>     Memory usage on this node: 8.0/10.8 GiB
>     Using FIFO scheduling algorithm.
>     Resources requested: 1/4 CPUs, 0/1 GPUs, 0.0/2.49 GiB heap, 0.0/0.83 GiB objects (0/1.0 accelerator_type:T4)
>     Result logdir: /databricks/driver/ray_results/training_function_2021-01-12_18-12-47
>     Number of trials: 1/9 (1 RUNNING)
>     +-------------------------------+----------+-------+---------------+----------------+
>     | Trial name                    | status   | loc   |   number_conv |   number_dense |
>     |-------------------------------+----------+-------+---------------+----------------|
>     | training_function_c6336_00000 | RUNNING  |       |             2 |              0 |
>     +-------------------------------+----------+-------+---------------+----------------+
>
>
>     (pid=6948) 2021-01-12 18:12:49.118221: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
>     (pid=6943) 2021-01-12 18:12:49.275935: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
>     (pid=6945) 2021-01-12 18:12:49.365509: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
>     (pid=6949) 2021-01-12 18:12:49.345345: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
>     (pid=6948) Found 12632 images belonging to 6 classes.
>     (pid=6949) Found 12632 images belonging to 6 classes.
>     (pid=6945) Found 12632 images belonging to 6 classes.
>     (pid=6943) Found 12632 images belonging to 6 classes.
>     (pid=6945) Found 12632 images belonging to 6 classes.
>     (pid=6948) Found 12632 images belonging to 6 classes.
>     (pid=6949) Found 12632 images belonging to 6 classes.
>     (pid=6943) Found 12632 images belonging to 6 classes.
>     (pid=6948) Found 1402 images belonging to 6 classes.
>     (pid=6949) Found 1402 images belonging to 6 classes.
>     (pid=6945) Found 1402 images belonging to 6 classes.
>     (pid=6943) Found 1402 images belonging to 6 classes.
>     (pid=6948) Found 1402 images belonging to 6 classes.
>     (pid=6949) Found 1402 images belonging to 6 classes.
>     (pid=6945) Found 1402 images belonging to 6 classes.
>     (pid=6943) Found 1402 images belonging to 6 classes.
>     (pid=6948) Found 12632 images belonging to 6 classes.
>     (pid=6945) Found 12632 images belonging to 6 classes.
>     (pid=6943) Found 12632 images belonging to 6 classes.
>     (pid=6949) Found 12632 images belonging to 6 classes.
>     (pid=6948) Found 1402 images belonging to 6 classes.
>     (pid=6949) Found 1402 images belonging to 6 classes.
>     (pid=6945) Found 1402 images belonging to 6 classes.
>     (pid=6943) Found 1402 images belonging to 6 classes.
>     (pid=6948) 2021-01-12 18:13:07.019617: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
>     (pid=6948) 2021-01-12 18:13:07.028237: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
>     (pid=6948) 2021-01-12 18:13:07.028287: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: 1120-144117-apses921-10-149-224-88
>     (pid=6948) 2021-01-12 18:13:07.028304: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: 1120-144117-apses921-10-149-224-88
>     (pid=6948) 2021-01-12 18:13:07.028400: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 450.80.2
>     (pid=6948) 2021-01-12 18:13:07.028428: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 450.80.2
>     (pid=6948) 2021-01-12 18:13:07.028438: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 450.80.2
>     (pid=6948) 2021-01-12 18:13:07.028756: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
>     (pid=6948) To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
>     (pid=6948) 2021-01-12 18:13:07.038165: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2499995000 Hz
>     (pid=6948) 2021-01-12 18:13:07.038446: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fa37c3100b0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
>     (pid=6948) 2021-01-12 18:13:07.038471: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
>     (pid=6943) 2021-01-12 18:13:07.027957: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
>     (pid=6943) 2021-01-12 18:13:07.036413: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
>     (pid=6943) 2021-01-12 18:13:07.036466: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: 1120-144117-apses921-10-149-224-88
>     (pid=6943) 2021-01-12 18:13:07.036480: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: 1120-144117-apses921-10-149-224-88
>     (pid=6943) 2021-01-12 18:13:07.036587: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 450.80.2
>     (pid=6943) 2021-01-12 18:13:07.036639: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 450.80.2
>     (pid=6943) 2021-01-12 18:13:07.036655: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 450.80.2
>     (pid=6943) 2021-01-12 18:13:07.037129: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
>     (pid=6943) To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
>     (pid=6943) 2021-01-12 18:13:07.048693: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2499995000 Hz
>     (pid=6943) 2021-01-12 18:13:07.049004: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f38dc310160 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
>     (pid=6943) 2021-01-12 18:13:07.049038: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
>     (pid=6945) 2021-01-12 18:13:07.017585: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
>     (pid=6945) 2021-01-12 18:13:07.044730: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
>     (pid=6945) 2021-01-12 18:13:07.044782: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: 1120-144117-apses921-10-149-224-88
>     (pid=6945) 2021-01-12 18:13:07.044798: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: 1120-144117-apses921-10-149-224-88
>     (pid=6945) 2021-01-12 18:13:07.044912: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 450.80.2
>     (pid=6945) 2021-01-12 18:13:07.044951: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 450.80.2
>     (pid=6945) 2021-01-12 18:13:07.044965: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 450.80.2
>     (pid=6945) 2021-01-12 18:13:07.045332: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
>     (pid=6945) To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
>     (pid=6949) 2021-01-12 18:13:07.027482: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
>     (pid=6949) 2021-01-12 18:13:07.034677: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
>     (pid=6949) 2021-01-12 18:13:07.034744: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: 1120-144117-apses921-10-149-224-88
>     (pid=6949) 2021-01-12 18:13:07.034762: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: 1120-144117-apses921-10-149-224-88
>     (pid=6949) 2021-01-12 18:13:07.034876: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 450.80.2
>     (pid=6949) 2021-01-12 18:13:07.034928: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 450.80.2
>     (pid=6949) 2021-01-12 18:13:07.034943: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 450.80.2
>     (pid=6949) 2021-01-12 18:13:07.035171: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
>     (pid=6949) To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
>     (pid=6949) 2021-01-12 18:13:07.050296: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2499995000 Hz
>     (pid=6949) 2021-01-12 18:13:07.050606: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7efeec3101d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
>     (pid=6949) 2021-01-12 18:13:07.050637: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
>     (pid=6945) 2021-01-12 18:13:07.064856: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2499995000 Hz
>     (pid=6945) 2021-01-12 18:13:07.065243: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f93c8310320 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
>     (pid=6945) 2021-01-12 18:13:07.065279: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
>     (pid=6948) WARNING:tensorflow:From <command-685894176419834>:37: Model.fit_generator (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
>     (pid=6948) Instructions for updating:
>     (pid=6948) Please use Model.fit, which supports generators.
>     (pid=6943) WARNING:tensorflow:From <command-685894176419834>:37: Model.fit_generator (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
>     (pid=6943) Instructions for updating:
>     (pid=6943) Please use Model.fit, which supports generators.
>     (pid=6945) WARNING:tensorflow:From <command-685894176419834>:37: Model.fit_generator (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
>     (pid=6945) Instructions for updating:
>     (pid=6945) Please use Model.fit, which supports generators.
>     (pid=6949) WARNING:tensorflow:From <command-685894176419834>:37: Model.fit_generator (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
>     (pid=6949) Instructions for updating:
>     (pid=6949) Please use Model.fit, which supports generators.
>     (pid=6945) Epoch 1/200
>     (pid=6943) Epoch 1/200
>     (pid=6949) Epoch 1/200
>     (pid=6948) Epoch 1/200
>     (pid=6943)   1/395 [..............................] - ETA: 0s - loss: 2.2131 - accuracy: 0.1562
>     (pid=6948)   1/395 [..............................] - ETA: 0s - loss: 2.3414 - accuracy: 0.1875
>     (pid=6945)   1/395 [..............................] - ETA: 0s - loss: 2.0423 - accuracy: 0.2500
>     (pid=6949)   1/395 [..............................] - ETA: 0s - loss: 2.2062 - accuracy: 0.1875
>     (pid=6943)   2/395 [..............................] - ETA: 4:41 - loss: 2.0237 - accuracy: 0.1875
>     (pid=6948)   2/395 [..............................] - ETA: 5:47 - loss: 2.6242 - accuracy: 0.2500
>     (pid=6945)   2/395 [..............................] - ETA: 6:03 - loss: 2.1231 - accuracy: 0.2969
>     (pid=6949)   2/395 [..............................] - ETA: 5:13 - loss: 2.6741 - accuracy: 0.2500
>     (pid=6943)   3/395 [..............................] - ETA: 6:33 - loss: 1.9714 - accuracy: 0.1771
>     (pid=6945)   3/395 [..............................] - ETA: 7:06 - loss: 2.1580 - accuracy: 0.3229
>     (pid=6948)   3/395 [..............................] - ETA: 7:45 - loss: 2.4659 - accuracy: 0.3229
>     (pid=6949)   3/395 [..............................] - ETA: 6:58 - loss: 2.9112 - accuracy: 0.2812
>     (pid=6943)   4/395 [..............................] - ETA: 7:29 - loss: 1.8858 - accuracy: 0.2344
>     (pid=6945)   4/395 [..............................] - ETA: 7:33 - loss: 2.2433 - accuracy: 0.3438
>     (pid=6948)   4/395 [..............................] - ETA: 8:22 - loss: 2.4949 - accuracy: 0.3750
>     (pid=6949)   4/395 [..............................] - ETA: 7:30 - loss: 2.7770 - accuracy: 0.3516
>     (pid=6943)   5/395 [..............................] - ETA: 7:50 - loss: 1.8617 - accuracy: 0.2375
>     (pid=6945)   5/395 [..............................] - ETA: 7:48 - loss: 2.4487 - accuracy: 0.3750
>     (pid=6949)   5/395 [..............................] - ETA: 7:48 - loss: 2.7115 - accuracy: 0.3875
>     (pid=6948)   5/395 [..............................] - ETA: 8:39 - loss: 2.7588 - accuracy: 0.3750
>     (pid=6943)   6/395 [..............................] - ETA: 8:15 - loss: 1.8445 - accuracy: 0.2396
>     (pid=6945)   6/395 [..............................] - ETA: 8:32 - loss: 2.5224 - accuracy: 0.3854
>     (pid=6949)   6/395 [..............................] - ETA: 8:10 - loss: 2.7145 - accuracy: 0.4010
>     (pid=6948)   6/395 [..............................] - ETA: 8:57 - loss: 2.8308 - accuracy: 0.3802
>     (pid=6943)   7/395 [..............................] - ETA: 8:34 - loss: 1.8105 - accuracy: 0.2545
>     (pid=6949)   7/395 [..............................] - ETA: 8:20 - loss: 2.8188 - accuracy: 0.3973
>     (pid=6945)   7/395 [..............................] - ETA: 8:51 - loss: 2.5645 - accuracy: 0.3839
>     (pid=6948)   7/395 [..............................] - ETA: 9:21 - loss: 2.9240 - accuracy: 0.3839
>     (pid=6943)   8/395 [..............................] - ETA: 8:41 - loss: 1.8109 - accuracy: 0.2656
>     (pid=6949)   8/395 [..............................] - ETA: 8:36 - loss: 2.9181 - accuracy: 0.3945
>     (pid=6945)   8/395 [..............................] - ETA: 9:09 - loss: 2.5546 - accuracy: 0.4062
>     (pid=6948)   8/395 [..............................] - ETA: 9:39 - loss: 2.9288 - accuracy: 0.3789
>     (pid=6943)   9/395 [..............................] - ETA: 8:55 - loss: 1.8119 - accuracy: 0.2674
>     (pid=6949)   9/395 [..............................] - ETA: 9:02 - loss: 2.9599 - accuracy: 0.4062
>     (pid=6945)   9/395 [..............................] - ETA: 9:24 - loss: 2.6355 - accuracy: 0.3958
>     (pid=6948)   9/395 [..............................] - ETA: 10:00 - loss: 2.8595 - accuracy: 0.3785
>     (pid=6943)  10/395 [..............................] - ETA: 9:18 - loss: 1.8093 - accuracy: 0.2656
>     (pid=6945)  10/395 [..............................] - ETA: 9:42 - loss: 2.7097 - accuracy: 0.4031
>     (pid=6949)  10/395 [..............................] - ETA: 9:26 - loss: 2.8147 - accuracy: 0.4313
>     (pid=6948)  10/395 [..............................] - ETA: 9:58 - loss: 2.8107 - accuracy: 0.3969 
>     (pid=6943)  11/395 [..............................] - ETA: 9:24 - loss: 1.8248 - accuracy: 0.2642
>     (pid=6949)  11/395 [..............................] - ETA: 9:15 - loss: 2.8705 - accuracy: 0.4290
>     (pid=6945)  11/395 [..............................] - ETA: 9:46 - loss: 2.6304 - accuracy: 0.4176
>     (pid=6948)  11/395 [..............................] - ETA: 10:00 - loss: 2.7871 - accuracy: 0.4176
>     (pid=6943)  12/395 [..............................] - ETA: 9:26 - loss: 1.8038 - accuracy: 0.2682
>     (pid=6949)  12/395 [..............................] - ETA: 9:17 - loss: 2.9361 - accuracy: 0.4245
>     (pid=6945)  12/395 [..............................] - ETA: 9:53 - loss: 2.5781 - accuracy: 0.4297
>     (pid=6948)  12/395 [..............................] - ETA: 10:03 - loss: 2.8041 - accuracy: 0.4089
>     (pid=6943)  13/395 [..............................] - ETA: 9:28 - loss: 1.7958 - accuracy: 0.2716
>     (pid=6949)  13/395 [..............................] - ETA: 9:23 - loss: 2.9888 - accuracy: 0.4231
>     (pid=6945)  13/395 [..............................] - ETA: 9:54 - loss: 2.6444 - accuracy: 0.4207
>     (pid=6948)  13/395 [..............................] - ETA: 10:11 - loss: 2.7499 - accuracy: 0.4111
>     (pid=6949)  14/395 [>.............................] - ETA: 9:33 - loss: 3.0002 - accuracy: 0.4263
>     (pid=6945)  14/395 [>.............................] - ETA: 9:58 - loss: 2.7610 - accuracy: 0.4174
>     (pid=6943)  14/395 [>.............................] - ETA: 10:05 - loss: 1.7887 - accuracy: 0.2835
>     (pid=6948)  14/395 [>.............................] - ETA: 10:08 - loss: 2.7568 - accuracy: 0.4107
>     (pid=6949)  15/395 [>.............................] - ETA: 9:40 - loss: 3.0620 - accuracy: 0.4208
>     (pid=6945)  15/395 [>.............................] - ETA: 9:56 - loss: 2.8249 - accuracy: 0.4083
>     (pid=6943)  15/395 [>.............................] - ETA: 10:09 - loss: 1.7787 - accuracy: 0.2833
>     (pid=6948)  15/395 [>.............................] - ETA: 10:11 - loss: 2.7702 - accuracy: 0.4104
>     (pid=6949)  16/395 [>.............................] - ETA: 9:50 - loss: 3.0118 - accuracy: 0.4258
>     (pid=6945)  16/395 [>.............................] - ETA: 10:02 - loss: 2.8379 - accuracy: 0.4062
>     (pid=6943)  16/395 [>.............................] - ETA: 10:09 - loss: 1.7643 - accuracy: 0.2871
>
>     *** WARNING: skipped 18076112 bytes of output ***
>
>     (pid=14102) 377/395 [===========================>..] - ETA: 11s - loss: 1.0448 - accuracy: 0.6059
>     (pid=14102) 378/395 [===========================>..] - ETA: 10s - loss: 1.0445 - accuracy: 0.6065
>     (pid=14102) 379/395 [===========================>..] - ETA: 10s - loss: 1.0443 - accuracy: 0.6066
>     (pid=14102) 380/395 [===========================>..]
>     (pid=14102)  - ETA: 9s - loss: 1.0445 - accuracy: 0.6066 
>     (pid=14102) 381/395 [===========================>..] - ETA: 8s - loss: 1.0444 - accuracy: 0.6067
>     (pid=14102) 382/395 [============================>.] - ETA: 8s - loss: 1.0449 - accuracy: 0.6064
>     (pid=14102) 383/395 [============================>.] - ETA: 7s - loss: 1.0449 - accuracy: 0.6062
>     (pid=14102) 384/395 [============================>.] - ETA: 7s - loss: 1.0448 - accuracy: 0.6059
>     (pid=14102) 385/395 [============================>.] - ETA: 6s - loss: 1.0447 - accuracy: 0.6059
>     (pid=14102) 386/395 [============================>.] - ETA: 5s - loss: 1.0449 - accuracy: 0.6057
>     (pid=14102) 387/395 [============================>.] - ETA: 5s - loss: 1.0446 - accuracy: 0.6059
>     (pid=14102) 388/395 [============================>.] - ETA: 4s - loss: 1.0441 - accuracy: 0.6062
>     (pid=14102) 389/395 [============================>.] - ETA: 3s - loss: 1.0439 - accuracy: 0.6064
>     (pid=14102) 390/395 [============================>.] - ETA: 3s - loss: 1.0438 - accuracy: 0.6067
>     (pid=14102) 391/395 [============================>.] - ETA: 2s - loss: 1.0437 - accuracy: 0.6068
>     (pid=14102) 392/395 [============================>.] - ETA: 1s - loss: 1.0437 - accuracy: 0.6071
>     (pid=14102) 393/395 [============================>.] - ETA: 1s - loss: 1.0436 - accuracy: 0.6070
>     (pid=14102) 394/395 [============================>.] - ETA: 0s - loss: 1.0437 - accuracy: 0.6072
>     (pid=14102) 395/395 [==============================] - ETA: 0s - loss: 1.0434 - accuracy: 0.6077395/395 [==============================] - 252s 637ms/step - loss: 1.0434 - accuracy: 0.6077
>     (pid=14102)  1/44 [..............................] - ETA: 0s - loss: 1.3407 - accuracy: 0.5000
>     (pid=14102)  2/44 [>.............................] - ETA: 24s - loss: 1.2272 - accuracy: 0.5938
>     (pid=14102)  3/44 [=>............................] - ETA: 34s - loss: 1.2597 - accuracy: 0.5521
>     (pid=14102)  4/44 [=>............................] - ETA: 37s - loss: 1.3114 - accuracy: 0.5391
>     (pid=14102)  5/44 [==>...........................] - ETA: 37s - loss: 1.2837 - accuracy: 0.5312
>     (pid=14102)  6/44 [===>..........................] - ETA: 38s - loss: 1.2913 - accuracy: 0.5312
>     (pid=14102)  7/44 [===>..........................] - ETA: 38s - loss: 1.3062 - accuracy: 0.5402
>     (pid=14102)  8/44 [====>.........................] - ETA: 38s - loss: 1.3110 - accuracy: 0.5508
>     (pid=14102)  9/44 [=====>........................] - ETA: 38s - loss: 1.2941 - accuracy: 0.5660
>     (pid=14102) 10/44 [=====>........................] - ETA: 37s - loss: 1.2770 - accuracy: 0.5719
>     (pid=14102) 11/44 [======>.......................] - ETA: 36s - loss: 1.2895 - accuracy: 0.5653
>     (pid=14102) 12/44 [=======>......................] - ETA: 35s - loss: 1.2831 - accuracy: 0.5547
>     (pid=14102) 13/44 [=======>......................] - ETA: 34s - loss: 1.2693 - accuracy: 0.5625
>     (pid=14102) 14/44 [========>.....................] - ETA: 34s - loss: 1.2614 - accuracy: 0.5625
>     (pid=14102) 15/44 [=========>....................] - ETA: 33s - loss: 1.2722 - accuracy: 0.5625
>     (pid=14102) 16/44 [=========>....................] - ETA: 32s - loss: 1.2561 - accuracy: 0.5625
>     (pid=14102) 17/44 [==========>...................] - ETA: 31s - loss: 1.2545 - accuracy: 0.5662
>     (pid=14102) 18/44 [===========>..................] - ETA: 30s - loss: 1.2468 - accuracy: 0.5694
>     (pid=14102) 19/44 [===========>..................] - ETA: 30s - loss: 1.2473 - accuracy: 0.5691
>     (pid=14102) 20/44 [============>.................] - ETA: 29s - loss: 1.2446 - accuracy: 0.5672
>     (pid=14102) 21/44 [=============>................] - ETA: 28s - loss: 1.2428 - accuracy: 0.5655
>     (pid=14102) 22/44 [==============>...............] - ETA: 27s - loss: 1.2506 - accuracy: 0.5611
>     (pid=14102) 23/44 [==============>...............] - ETA: 25s - loss: 1.2489 - accuracy: 0.5611
>     (pid=14102) 24/44 [===============>..............] - ETA: 24s - loss: 1.2492 - accuracy: 0.5638
>     (pid=14102) 25/44 [================>.............] - ETA: 23s - loss: 1.2480 - accuracy: 0.5663
>     (pid=14102) 26/44 [================>.............] - ETA: 22s - loss: 1.2478 - accuracy: 0.5673
>     (pid=14102) 27/44 [=================>............] - ETA: 20s - loss: 1.2499 - accuracy: 0.5671
>     (pid=14102) 28/44 [==================>...........] - ETA: 19s - loss: 1.2494 - accuracy: 0.5681
>     (pid=14102) 29/44 [==================>...........] - ETA: 18s - loss: 1.2509 - accuracy: 0.5700
>     (pid=14102) 30/44 [===================>..........] - ETA: 17s - loss: 1.2494 - accuracy: 0.5698
>     (pid=14102) 31/44 [====================>.........] - ETA: 16s - loss: 1.2489 - accuracy: 0.5685
>     (pid=14102) 32/44 [====================>.........] - ETA: 15s - loss: 1.2472 - accuracy: 0.5684
>     (pid=14102) 33/44 [=====================>........] - ETA: 13s - loss: 1.2489 - accuracy: 0.5701
>     (pid=14102) 34/44 [======================>.......] - ETA: 12s - loss: 1.2514 - accuracy: 0.5708
>     (pid=14102) 35/44 [======================>.......] - ETA: 11s - loss: 1.2525 - accuracy: 0.5679
>     (pid=14102) 36/44 [=======================>......] - ETA: 10s - loss: 1.2528 - accuracy: 0.5686
>     (pid=14102) 37/44 [========================>.....] - ETA: 8s - loss: 1.2544 - accuracy: 0.5684 
>     (pid=14102) 38/44 [========================>.....] - ETA: 7s - loss: 1.2610 - accuracy: 0.5658
>     (pid=14102) 39/44 [=========================>....] - ETA: 6s - loss: 1.2627 - accuracy: 0.5657
>     (pid=14102) 40/44 [==========================>...] - ETA: 5s - loss: 1.2619 - accuracy: 0.5633
>     (pid=14102) 41/44 [==========================>...] - ETA: 3s - loss: 1.2618 - accuracy: 0.5648
>     (pid=14102) 42/44 [===========================>..] - ETA: 2s - loss: 1.2591 - accuracy: 0.5655
>     (pid=14102) 43/44 [============================>.] - ETA: 1s - loss: 1.2609 - accuracy: 0.5620
>     (pid=14102) 2021-01-13 00:53:00,732	ERROR function_runner.py:254 -- Runner Thread raised error.
>     (pid=14102) Traceback (most recent call last):
>     (pid=14102)   File "/databricks/python/lib/python3.7/site-packages/ray/tune/function_runner.py", line 248, in run
>     (pid=14102)     self._entrypoint()
>     (pid=14102)   File "/databricks/python/lib/python3.7/site-packages/ray/tune/function_runner.py", line 316, in entrypoint
>     (pid=14102)     self._status_reporter.get_checkpoint())
>     (pid=14102)   File "/databricks/python/lib/python3.7/site-packages/ray/tune/function_runner.py", line 575, in _trainable_func
>     (pid=14102)     output = fn()
>     (pid=14102)   File "<command-685894176419834>", line 52, in training_function
>     (pid=14102) NameError: name 'train_mix_loss' is not defined
>     (pid=14102) Exception in thread Thread-2:
>     (pid=14102) Traceback (most recent call last):
>     (pid=14102)   File "/databricks/python/lib/python3.7/threading.py", line 926, in _bootstrap_inner
>     (pid=14102)     self.run()
>     (pid=14102)   File "/databricks/python/lib/python3.7/site-packages/ray/tune/function_runner.py", line 267, in run
>     (pid=14102)     raise e
>     (pid=14102)   File "/databricks/python/lib/python3.7/site-packages/ray/tune/function_runner.py", line 248, in run
>     (pid=14102)     self._entrypoint()
>     (pid=14102)   File "/databricks/python/lib/python3.7/site-packages/ray/tune/function_runner.py", line 316, in entrypoint
>     (pid=14102)     self._status_reporter.get_checkpoint())
>     (pid=14102)   File "/databricks/python/lib/python3.7/site-packages/ray/tune/function_runner.py", line 575, in _trainable_func
>     (pid=14102)     output = fn()
>     (pid=14102)   File "<command-685894176419834>", line 52, in training_function
>     (pid=14102) NameError: name 'train_mix_loss' is not defined
>     (pid=14102) 
>     (pid=14102) 44/44 [==============================] - ETA: 0s - loss: 1.2616 - accuracy: 0.561344/44 [==============================] - 55s 1s/step - loss: 1.2616 - accuracy: 0.5613
>     2021-01-13 00:53:00,926	ERROR trial_runner.py:607 -- Trial training_function_c6336_00008: Error processing event.
>     Traceback (most recent call last):
>       File "/databricks/python/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 519, in _process_trial
>         result = self.trial_executor.fetch_result(trial)
>       File "/databricks/python/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 497, in fetch_result
>         result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
>       File "/databricks/python/lib/python3.7/site-packages/ray/worker.py", line 1379, in get
>         raise value.as_instanceof_cause()
>     ray.exceptions.RayTaskError(TuneError): ray::ImplicitFunc.train() (pid=14102, ip=10.149.224.88)
>       File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
>       File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor
>       File "/databricks/python/lib/python3.7/site-packages/ray/tune/trainable.py", line 183, in train
>         result = self.step()
>       File "/databricks/python/lib/python3.7/site-packages/ray/tune/function_runner.py", line 366, in step
>         self._report_thread_runner_error(block=True)
>       File "/databricks/python/lib/python3.7/site-packages/ray/tune/function_runner.py", line 513, in _report_thread_runner_error
>         .format(err_tb_str)))
>     ray.tune.error.TuneError: Trial raised an exception. Traceback:
>     ray::ImplicitFunc.train() (pid=14102, ip=10.149.224.88)
>       File "/databricks/python/lib/python3.7/site-packages/ray/tune/function_runner.py", line 248, in run
>         self._entrypoint()
>       File "/databricks/python/lib/python3.7/site-packages/ray/tune/function_runner.py", line 316, in entrypoint
>         self._status_reporter.get_checkpoint())
>       File "/databricks/python/lib/python3.7/site-packages/ray/tune/function_runner.py", line 575, in _trainable_func
>         output = fn()
>       File "<command-685894176419834>", line 52, in training_function
>     NameError: name 'train_mix_loss' is not defined
>     Result for training_function_c6336_00008:
>       {}
>       
>     == Status ==
>     Memory usage on this node: 9.1/10.8 GiB
>     Using FIFO scheduling algorithm.
>     Resources requested: 0/4 CPUs, 0/1 GPUs, 0.0/2.49 GiB heap, 0.0/0.83 GiB objects (0/1.0 accelerator_type:T4)
>     Result logdir: /databricks/driver/ray_results/training_function_2021-01-12_18-12-47
>     Number of trials: 9/9 (9 ERROR)
>     +-------------------------------+----------+-------+---------------+----------------+
>     | Trial name                    | status   | loc   |   number_conv |   number_dense |
>     |-------------------------------+----------+-------+---------------+----------------|
>     | training_function_c6336_00000 | ERROR    |       |             2 |              0 |
>     | training_function_c6336_00001 | ERROR    |       |             4 |              0 |
>     | training_function_c6336_00002 | ERROR    |       |             6 |              0 |
>     | training_function_c6336_00003 | ERROR    |       |             2 |              1 |
>     | training_function_c6336_00004 | ERROR    |       |             4 |              1 |
>     | training_function_c6336_00005 | ERROR    |       |             6 |              1 |
>     | training_function_c6336_00006 | ERROR    |       |             2 |              2 |
>     | training_function_c6336_00007 | ERROR    |       |             4 |              2 |
>     | training_function_c6336_00008 | ERROR    |       |             6 |              2 |
>     +-------------------------------+----------+-------+---------------+----------------+
>     Number of errored trials: 9
>     +-------------------------------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
>     | Trial name                    |   # failures | error file                                                                                                                                                      |
>     |-------------------------------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
>     | training_function_c6336_00000 |            1 | /databricks/driver/ray_results/training_function_2021-01-12_18-12-47/training_function_c6336_00000_0_number_conv=2,number_dense=0_2021-01-12_18-12-47/error.txt |
>     | training_function_c6336_00001 |            1 | /databricks/driver/ray_results/training_function_2021-01-12_18-12-47/training_function_c6336_00001_1_number_conv=4,number_dense=0_2021-01-12_18-12-47/error.txt |
>     | training_function_c6336_00002 |            1 | /databricks/driver/ray_results/training_function_2021-01-12_18-12-47/training_function_c6336_00002_2_number_conv=6,number_dense=0_2021-01-12_18-12-47/error.txt |
>     | training_function_c6336_00003 |            1 | /databricks/driver/ray_results/training_function_2021-01-12_18-12-47/training_function_c6336_00003_3_number_conv=2,number_dense=1_2021-01-12_18-12-47/error.txt |
>     | training_function_c6336_00004 |            1 | /databricks/driver/ray_results/training_function_2021-01-12_18-12-47/training_function_c6336_00004_4_number_conv=4,number_dense=1_2021-01-12_19-41-14/error.txt |
>     | training_function_c6336_00005 |            1 | /databricks/driver/ray_results/training_function_2021-01-12_18-12-47/training_function_c6336_00005_5_number_conv=6,number_dense=1_2021-01-12_19-46-24/error.txt |
>     | training_function_c6336_00006 |            1 | /databricks/driver/ray_results/training_function_2021-01-12_18-12-47/training_function_c6336_00006_6_number_conv=2,number_dense=2_2021-01-12_20-35-10/error.txt |
>     | training_function_c6336_00007 |            1 | /databricks/driver/ray_results/training_function_2021-01-12_18-12-47/training_function_c6336_00007_7_number_conv=4,number_dense=2_2021-01-12_20-35-23/error.txt |
>     | training_function_c6336_00008 |            1 | /databricks/driver/ray_results/training_function_2021-01-12_18-12-47/training_function_c6336_00008_8_number_conv=6,number_dense=2_2021-01-12_22-22-53/error.txt |
>     +-------------------------------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
>
>     == Status ==
>     Memory usage on this node: 9.1/10.8 GiB
>     Using FIFO scheduling algorithm.
>     Resources requested: 0/4 CPUs, 0/1 GPUs, 0.0/2.49 GiB heap, 0.0/0.83 GiB objects (0/1.0 accelerator_type:T4)
>     Result logdir: /databricks/driver/ray_results/training_function_2021-01-12_18-12-47
>     Number of trials: 9/9 (9 ERROR)
>     +-------------------------------+----------+-------+---------------+----------------+
>     | Trial name                    | status   | loc   |   number_conv |   number_dense |
>     |-------------------------------+----------+-------+---------------+----------------|
>     | training_function_c6336_00000 | ERROR    |       |             2 |              0 |
>     | training_function_c6336_00001 | ERROR    |       |             4 |              0 |
>     | training_function_c6336_00002 | ERROR    |       |             6 |              0 |
>     | training_function_c6336_00003 | ERROR    |       |             2 |              1 |
>     | training_function_c6336_00004 | ERROR    |       |             4 |              1 |
>     | training_function_c6336_00005 | ERROR    |       |             6 |              1 |
>     | training_function_c6336_00006 | ERROR    |       |             2 |              2 |
>     | training_function_c6336_00007 | ERROR    |       |             4 |              2 |
>     | training_function_c6336_00008 | ERROR    |       |             6 |              2 |
>     +-------------------------------+----------+-------+---------------+----------------+
>     Number of errored trials: 9
>     +-------------------------------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
>     | Trial name                    |   # failures | error file                                                                                                                                                      |
>     |-------------------------------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
>     | training_function_c6336_00000 |            1 | /databricks/driver/ray_results/training_function_2021-01-12_18-12-47/training_function_c6336_00000_0_number_conv=2,number_dense=0_2021-01-12_18-12-47/error.txt |
>     | training_function_c6336_00001 |            1 | /databricks/driver/ray_results/training_function_2021-01-12_18-12-47/training_function_c6336_00001_1_number_conv=4,number_dense=0_2021-01-12_18-12-47/error.txt |
>     | training_function_c6336_00002 |            1 | /databricks/driver/ray_results/training_function_2021-01-12_18-12-47/training_function_c6336_00002_2_number_conv=6,number_dense=0_2021-01-12_18-12-47/error.txt |
>     | training_function_c6336_00003 |            1 | /databricks/driver/ray_results/training_function_2021-01-12_18-12-47/training_function_c6336_00003_3_number_conv=2,number_dense=1_2021-01-12_18-12-47/error.txt |
>     | training_function_c6336_00004 |            1 | /databricks/driver/ray_results/training_function_2021-01-12_18-12-47/training_function_c6336_00004_4_number_conv=4,number_dense=1_2021-01-12_19-41-14/error.txt |
>     | training_function_c6336_00005 |            1 | /databricks/driver/ray_results/training_function_2021-01-12_18-12-47/training_function_c6336_00005_5_number_conv=6,number_dense=1_2021-01-12_19-46-24/error.txt |
>     | training_function_c6336_00006 |            1 | /databricks/driver/ray_results/training_function_2021-01-12_18-12-47/training_function_c6336_00006_6_number_conv=2,number_dense=2_2021-01-12_20-35-10/error.txt |
>     | training_function_c6336_00007 |            1 | /databricks/driver/ray_results/training_function_2021-01-12_18-12-47/training_function_c6336_00007_7_number_conv=4,number_dense=2_2021-01-12_20-35-23/error.txt |
>     | training_function_c6336_00008 |            1 | /databricks/driver/ray_results/training_function_2021-01-12_18-12-47/training_function_c6336_00008_8_number_conv=6,number_dense=2_2021-01-12_22-22-53/error.txt |
>     +-------------------------------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+

In [None]:
df

  

  

### Directly training on MixUp data

As we saw in the previous parts (**probably. my preliminary trials
indicated this, at least**), performance on MixUp data gave a reasonably
good indication of performance on held-out validation data. This
indicates that performance may be improved by directly training on MixUp
data, which we will now do.

In [None]:
# Limit the number of rows.
reporter = CLIReporter(max_progress_rows=10)
# Add a custom metric column, in addition to the default metrics.
# Note that this must be a metric that is returned in your training results.
reporter.add_metric_column("val_mix_accuracy")
reporter.add_metric_column("val_accuracy")
reporter.add_metric_column("train_accuracy")

#config = {"number_conv" : 3,"number_dense" : 5}
#training_function(config)

#get_data_loaders()

analysis = tune.run(
    training_function,
    config={
        "number_conv": tune.grid_search(np.arange(2,7,2).tolist()),
        "number_dense": tune.grid_search(np.arange(0,3,1).tolist()),
        "train_with_mixed_data": True
    },
    local_dir='ray_results',
    progress_reporter=reporter)
    
  #resources_per_trial={'gpu': 1})

print("Best config: ", analysis.get_best_config(
    metric="val_accuracy", mode="max"))

#Get a dataframe for analyzing trial results.
df = analysis.results_df


In [None]:
df

  

### Conclusions

**We obviously need to check if this is true...** In conclusion, we
found some agreement between the performance of networks trained through
a standard procedure on a MixUp version of the training set and the
performance on a validation set, for a wide variety of hyperparameters.
By directly utilizing MixUp data as part of the training procedure, we
found further gains in the performance on held-out validation data,
again for a wide variety of hyperparameters. This indicates that, at
least for image data and convolutional neural networks, the connection
between MixUp and generalization is strong.