Introduction
------------

A pertinent question in machine learning is to explain why a model
generalizes and using the answer to improve learning algorithms.
Recently, a training procedure called MixUp was proposed to address this
\[\[1\]\]. The basic idea is that instead of feeding the raw training
data to our supervised learning algorithm, we instead use convex
combinations of two randomly selected data points. The benefit of this
is two-fold. First, it plays the role of data augmentation: the network
will never see two completely identical training samples, since we
constantly produce new random combinations. Second, the network is
encouraged to behave nicely in-between training samples, which has the
potential to reduce overfitting. A connection between performance on
MixUp data and generalization abilities of networks trained without the
MixUp procedure was also studied in \[\[2\]\].

In this project, we will investigate these connections at a large scale
by performing a distributed hyperparameter search. First, we will train
neural networks without MixUp, and study the connection between MixUp
performance and test error. Then, we will train on MixUp data, and see
whether directly optimizing MixUp performance will yield more beneficial
test errors.

To make the hyperparameter search distributed and scalable, we will use
the Ray Tune package \[\[3\]\]. Furthermore, we will use Horovod to
enable the individual networks to handle data in a distributed fashion
as well \[\[4\]\].

In [None]:
# Imports
import tensorflow as tf
import numpy as np
from tensorflow import keras
from tensorflow.keras.layers import Dense,Conv2D,Flatten,BatchNormalization,Dropout
from tensorflow.keras import Sequential
from ray import tune
from ray.tune import CLIReporter
from sklearn.metrics import confusion_matrix
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from functools import partial

# Fixes the issue "AttributeError: 'ConsoleBuffer has no attribute 'fileno'"
import sys
sys.stdout.fileno = lambda: False

  

  

#### The data set

We will use the Intel Image Classification data set \[\[5\]\]. It
consists of 25k 150x150 RBG images from 6 different classes: buildings,
forest, glacier, mountain, sea, or street.

In [None]:
"""
The global parameters for training.
"""

img_height,img_width,channels = 32*2,32*2,3
batch_size = 32
train_data_dir,test_data_dir = "/dbfs/FileStore/tables/Group20/seg_train/seg_train/", "dbfs/FileStore/tables/Group20/seg_test/seg_test/"
num_classes = 6
alpha = 0.2 # Degree of mixup is ~ Beta(alpha,alpha)

  

  

#### MixUp data generator

To create MixUp data, we will define a custom data generator. It takes
an underlying image generator as argument, and outputs convex
combinations of two randomly selected (example,label) pairs drawn
according to the underlying generator.

In [None]:
class MixupImageDataGenerator(tf.keras.utils.Sequence):
    def __init__(self, generator, directory, batch_size, img_height, img_width, alpha=0.2, subset=None):
        self.batch_size = batch_size
        self.batch_index = 0
        self.alpha = alpha

        # First iterator yielding tuples of (x, y)
        self.generator1 = generator.flow_from_directory(directory,
                                                        target_size=(
                                                            img_height, img_width),
                                                        class_mode="categorical",
                                                        batch_size=batch_size,
                                                        shuffle=True,
                                                        subset=subset)

        # Second iterator yielding tuples of (x, y)
        self.generator2 = generator.flow_from_directory(directory,
                                                        target_size=(
                                                            img_height, img_width),
                                                        class_mode="categorical",
                                                        batch_size=batch_size,
                                                        shuffle=True,
                                                        subset=subset)

        # Number of images across all classes in image directory.
        self.n = self.generator1.samples


    def __len__(self):
        # returns the number of batches
        return (self.n + self.batch_size - 1) // self.batch_size

    def __getitem__(self, index):
        # Get a pair of inputs and outputs from two iterators.
        X1, y1 = self.generator1.next()
        X2, y2 = self.generator2.next()


        # random sample the lambda value from beta distribution.
        l = np.random.beta(self.alpha, self.alpha, X1.shape[0])

        X_l = l.reshape(X1.shape[0], 1, 1, 1)
        y_l = l.reshape(X1.shape[0], 1)


        # Perform the mixup.
        X = X1 * X_l + X2 * (1 - X_l)
        y = y1 * y_l + y2 * (1 - y_l)
        return X, y

    def reset_index(self):
        """Reset the generator indexes array.
        """

        self.generator1._set_index_array()
        self.generator2._set_index_array()


    def on_epoch_end(self):
        self.reset_index()

In [None]:
"""
A method that gives us the different dataloaders that we need for training and validation.

With for_training set to True, the model gives us the dataloaders
* train_mix_loader: A data loader that gives us mixed data for training
* train_loader: A data loader that gives us the unmixed training data
* val_mixed_loader: A data loader that gives us mixed validation data
* val_loader: A data loader with the unmixed validation data

By setting for_training to False, the method gives us the dataloader
* test_loader: Unmixed and unshuffled dataloader for the testing data. The reason for not shuffeling the data is in order to simplify the validation process.
"""
def get_data_loaders(for_training = True):
  
    #For training data
    if for_training:
        datagen_train_val = ImageDataGenerator(rescale=1./255,
                                rotation_range=5,
                                width_shift_range=0.05,
                                height_shift_range=0,
                                shear_range=0.05,
                                zoom_range=0,
                                brightness_range=(1, 1.3),
                                horizontal_flip=True,
                                fill_mode='nearest',
                                validation_split=0.1)

        train_mix_loader = MixupImageDataGenerator(generator = datagen_train_val,
                                                   directory = train_data_dir,
                                                   batch_size = batch_size,
                                                   img_height = img_height,
                                                   img_width = img_width,
                                                   alpha=alpha,
                                                   subset="training")
        
        val_mix_loader = MixupImageDataGenerator(generator = datagen_train_val,
                                                 directory = train_data_dir,
                                                 batch_size = batch_size,
                                                 img_height = img_height,
                                                 img_width = img_width,
                                                 alpha=alpha,
                                                 subset="validation")

        train_loader = datagen_train_val.flow_from_directory(train_data_dir,
                                                        target_size=(img_height, img_width),
                                                        class_mode="categorical",
                                                        batch_size=batch_size,
                                                        shuffle=True,
                                                        subset="training")

        val_loader = datagen_train_val.flow_from_directory(train_data_dir,
                                                        target_size=(img_height, img_width),
                                                        class_mode="categorical",
                                                        batch_size=batch_size,
                                                        shuffle=True,
                                                        subset="validation")
        
        return train_mix_loader,train_loader, val_mix_loader, val_loader

    #For test data
    else:
        datagen_test = ImageDataGenerator(rescale=1./255,
                                rotation_range=0,
                                width_shift_range=0,
                                height_shift_range=0,
                                shear_range=0,
                                zoom_range=0,
                                brightness_range=(1, 1),
                                horizontal_flip=False,
                                fill_mode='nearest',
                                validation_split=0)

        test_loader = datagen_test.flow_from_directory(test_data_dir,
                                                    target_size=(img_height, img_width),
                                                        class_mode="categorical",
                                                        batch_size=batch_size,
                                                        shuffle=False,
                                                        subset=None)

        return test_loader

  

  

##### Network architecture

In [None]:
"""
creates the CNN with number_conv convolutional layers followed by number_dense dense layers. THe model is compiled with a SGD optimizer and a categorical crossentropy loss.
"""
def create_model(number_conv,number_dense):
    model = Sequential()
    model.add(Conv2D(24,kernel_size = 3, activation='relu',padding="same", input_shape=(img_height, img_width,channels)))
    model.add(BatchNormalization())
    for s in range(1,number_conv):
        model.add(Conv2D(24+12*s,kernel_size = 3,padding="same", activation = 'relu'))
        model.add(BatchNormalization())
    model.add(Flatten())
    model.add(Dropout(0.4))
    for s in range(number_dense):
        model.add(Dense(units=num_classes, activation='relu'))
        model.add(Dropout(0.4))
    model.add(BatchNormalization())
    model.add(Dense(num_classes,activation= "softmax"))
    model.compile(optimizer="adam", loss='categorical_crossentropy', metrics=['accuracy'])
    return model

  

  

#### Training function

\*\*This should be replaced by Horovod training function if we manage to
complete the installation. Now, I put the Olofs\_implementation
thing\*\*

In [None]:
def training_function(config, checkpoint_dir=None):
    # Hyperparameters
    number_conv, number_dense = config["number_conv"], config["number_dense"]
    train_with_mixed_data = config["train_with_mixed_data"]
    
    
    """
    Get the different dataloaders
    One with training data using mixing
    One with training without mixing
    One with validation data with mixing
    One with validation without mixing
    Set for_training to False to get testing data
    """
    train_mix_dataloader,train_dataloader,val_mix_dataloader,val_dataloader = get_data_loaders(for_training = True)

    """
    Construct the model based on hyperparameters
    """
    model = create_model( number_conv,number_dense )

    
    """
    Adds earlystopping to training. This is based on the performance accuracy on the validation dataset. Chould we have validation loss here?
    """
    callbacks = [tf.keras.callbacks.EarlyStopping(patience=10,monitor="val_accuracy",min_delta=0.01,restore_best_weights=True)]

    """
    Train the model and give the training history.
    """
    if train_with_mixed_data:
      history = model.fit_generator(train_mix_dataloader, validation_data = val_mix_dataloader,callbacks = callbacks,verbose = False,epochs = 200)
    else:
      history = model.fit_generator(train_dataloader, validation_data = val_mix_dataloader,callbacks = callbacks,verbose = False,epochs = 200)
    
    """
    Logg the results
    """
    #x_mix, y_mix = mixup_data( x_val, y_val)
    #mix_loss, mix_acc = model.evaluate( x_mix, y_mix )
    train_loss_unmix, train_acc_unmix = model.evaluate( train_dataloader )
    val_loss, val_acc = model.evaluate( val_dataloader )
    ind_max = np.argmax(history.history['val_accuracy'])
    train_acc = history.history['accuracy'][ind_max]
    val_mix_acc = history.history['val_accuracy'][ind_max]
    
    tune.report(mean_loss=train_acc, train_accuracy = train_acc_unmix, val_mix_accuracy = val_mix_acc, val_accuracy = val_acc)


In [None]:
train_mix_dataloader,train_dataloader,val_mix_dataloader,val_dataloader = get_data_loaders(for_training = True)

  

>     Found 12632 images belonging to 6 classes.
>     Found 12632 images belonging to 6 classes.
>     Found 1402 images belonging to 6 classes.
>     Found 1402 images belonging to 6 classes.
>     Found 12632 images belonging to 6 classes.
>     Found 1402 images belonging to 6 classes.

  

### Connection between MixUp performance and generalization

First, we will train our neural networks using a standard procedure,
with normal training data. We then measure their performance on a MixUp
version of the training set as well as on a validation set to study the
connection between these metrics.

In [None]:
# Limit the number of rows.
reporter = CLIReporter(max_progress_rows=10)
# Add a custom metric column, in addition to the default metrics.
# Note that this must be a metric that is returned in your training results.
reporter.add_metric_column("val_mix_accuracy")
reporter.add_metric_column("val_accuracy")
reporter.add_metric_column("train_accuracy")

#config = {"number_conv" : 3,"number_dense" : 5}
#training_function(config)

#get_data_loaders()

analysis = tune.run(
    training_function,
    config={
        "number_conv": tune.grid_search(np.arange(2,3,3).tolist()),
        "number_dense": tune.grid_search(np.arange(2,3,1).tolist()),
        "train_with_mixed_data": False
    },
    local_dir='ray_results',
    progress_reporter=reporter,
    resources_per_trial={'gpu': 1})

print("Best config: ", analysis.get_best_config(
    metric="val_accuracy", mode="max"))

#Get a dataframe for analyzing trial results.
df = analysis.results_df


  

>     2021-01-10 13:49:58,077	INFO services.py:1173 -- View the Ray dashboard at http://127.0.0.1:8265
>     == Status ==
>     Memory usage on this node: 5.1/10.8 GiB
>     Using FIFO scheduling algorithm.
>     Resources requested: 1/4 CPUs, 1/1 GPUs, 0.0/3.66 GiB heap, 0.0/1.27 GiB objects (0/1.0 accelerator_type:T4)
>     Result logdir: /databricks/driver/ray_results/training_function_2021-01-10_13-49-59
>     Number of trials: 1/1 (1 RUNNING)
>     +-------------------------------+----------+-------+---------------+----------------+
>     | Trial name                    | status   | loc   |   number_conv |   number_dense |
>     |-------------------------------+----------+-------+---------------+----------------|
>     | training_function_baf52_00000 | RUNNING  |       |             2 |              2 |
>     +-------------------------------+----------+-------+---------------+----------------+
>
>
>     (pid=2962) 2021-01-10 13:50:00.038310: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
>     (pid=2962) 2021-01-10 13:50:00.038359: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
>     (pid=2962) Found 12632 images belonging to 6 classes.
>     (pid=2962) Found 12632 images belonging to 6 classes.
>     (pid=2962) Found 1402 images belonging to 6 classes.
>     (pid=2962) Found 1402 images belonging to 6 classes.
>     (pid=2962) Found 12632 images belonging to 6 classes.
>     (pid=2962) Found 1402 images belonging to 6 classes.
>     (pid=2962) 2021-01-10 13:50:06.218881: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
>     (pid=2962) 2021-01-10 13:50:06.219867: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
>     (pid=2962) 2021-01-10 13:50:06.245830: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
>     (pid=2962) 2021-01-10 13:50:06.246687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
>     (pid=2962) pciBusID: 0000:00:1e.0 name: Tesla T4 computeCapability: 7.5
>     (pid=2962) coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
>     (pid=2962) 2021-01-10 13:50:06.246797: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
>     (pid=2962) 2021-01-10 13:50:06.246861: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
>     (pid=2962) 2021-01-10 13:50:06.246913: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
>     (pid=2962) 2021-01-10 13:50:06.248674: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
>     (pid=2962) 2021-01-10 13:50:06.248985: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
>     (pid=2962) 2021-01-10 13:50:06.251099: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
>     (pid=2962) 2021-01-10 13:50:06.251210: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
>     (pid=2962) 2021-01-10 13:50:06.251298: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
>     (pid=2962) 2021-01-10 13:50:06.251319: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
>     (pid=2962) Skipping registering GPU devices...
>     (pid=2962) 2021-01-10 13:50:06.252511: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX512F
>     (pid=2962) To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
>     (pid=2962) 2021-01-10 13:50:06.252728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
>     (pid=2962) 2021-01-10 13:50:06.252748: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      
>     (pid=2962) 2021-01-10 13:50:06.252760: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
>     (pid=2962) /databricks/python/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:1844: UserWarning: `Model.fit_generator` is deprecated and will be removed in a future version. Please use `Model.fit`, which supports generators.
>     (pid=2962)   warnings.warn('`Model.fit_generator` is deprecated and '

  

### Directly training on MixUp data

As we saw in the previous parts (**probably. my preliminary trials
indicated this, at least**)