# Neural Networks
We will try to train a neural network to see if we can get better results. We will try to use a model created by the Visual Geometry Group (VGG) at Oxford. We will use the VGG16 model which contains 16 layers (13 convolutional layers and 3 fully connected layers).

Our first attempt will be to use the model with our dataset without any modification, then we will try to perform some data augmentation to see if we can get better results.
![](../doc/vgg16.png)

In [None]:
# Connect to Colab backend
from colabcode import ColabCode
ColabCode(port=10000)

In [8]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D, Rescaling
from tensorflow.keras.models import Sequential

physical_devices = tf.config.list_physical_devices('GPU')
try:
  tf.config.experimental.set_memory_growth(physical_devices[0], True)
  assert tf.config.experimental.get_memory_growth(physical_devices[0])
except:
  # Invalid device or cannot modify virtual devices once initialized.
  pass

> Lion (Evo**L**ved S**i**gned M**o**me**n**tum) optimizer ([paper](https://arxiv.org/abs/2302.06675))

We will use the Lion optimizer to train the model. It's a new optimizer that combines the advantages of the Adam and SGD optimizers. It's a momentum-based optimizer that uses the first and second moments of the gradient to update the parameters. The performance of the optimizer is comparable to Adam and SGD, but it's faster and more memory efficient.

The implementation comes from this [GitHub](https://github.com/GLambard/Lion-tensorflow/blob/main/lion_tensorflow.py) repository.

In [9]:
# define the optimizer
from typing import Tuple, Optional, Callable
from tensorflow.keras import optimizers

def exists(val):
    return val is not None

# update functions

@tf.function
def update_fn(p, grad, exp_avg, lr, wd, beta1, beta2):
    # stepweight decay

    p.assign(p * (1 - lr * wd))

    # weight update

    update = tf.raw_ops.LinSpace(start=1.0, stop=0.0, num=1, name=None)[0]*exp_avg + (1 - tf.raw_ops.LinSpace(start=1.0, stop=0.0, num=1, name=None)[0])*grad
    p.assign_add(tf.sign(update) * -lr)

    # decay the momentum running average coefficient

    exp_avg.assign(exp_avg * beta2 + grad * (1 - beta2))

# class

class Lion(optimizers.Optimizer):
    def __init__(
        self,
        lr: float = 1e-4,
        betas: Tuple[float, float] = (0.9, 0.99),
        weight_decay: float = 0.0,
        use_triton: bool = False,
        name: str = "Lion",
        **kwargs
    ):
        assert lr > 0.
        assert all([0. <= beta <= 1. for beta in betas])

        super().__init__(**kwargs)

        self.lr = lr
        self.betas = betas
        self.weight_decay = weight_decay

        self.update_fn = update_fn

    def get_config(self):
        config = super().get_config()
        config.update({
            'lr': self.lr,
            'betas': self.betas,
            'weight_decay': self.weight_decay
        })
        return config

    @tf.function
    def _resource_apply_dense(self, grad, var):
        lr = self.lr
        beta1 = self.betas[0]
        beta2 = self.betas[1]
        wd = self.weight_decay

        # init state - exponential moving average of gradient values
        exp_avg = self.get_slot(var, "exp_avg")
        if exp_avg is None:
            exp_avg = self.add_slot(var, "exp_avg", tf.zeros_like(var))

        self.update_fn(
            var,
            grad,
            exp_avg,
            lr,
            wd,
            beta1,
            beta2
        )

    def _resource_apply_sparse(self, grad, var, indices):
        raise NotImplementedError("Sparse gradient updates are not supported.")

    def _resource_apply_sparse_duplicate_indices(self, grad, var, indices):
        raise NotImplementedError("Sparse gradient updates are not supported.")

In [10]:
# define the model
vgg16 = Sequential([
        # Preprocessing stack
        Rescaling(1./255, input_shape=(224, 224, 3)),
        # 1st stack
        Conv2D(64, (3, 3), padding='same', activation='relu'),
        Conv2D(64, (3, 3), padding='same', activation='relu'),
        MaxPooling2D(pool_size=(2, 2)),
        # 2nd stack
        Conv2D(128, (3, 3), padding='same', activation='relu'),
        Conv2D(128, (3, 3), padding='same', activation='relu'),
        MaxPooling2D(pool_size=(2, 2)),
        # 3rd stack
        Conv2D(256, (3, 3), padding = 'same', activation = 'relu'),
        Conv2D(256, (3, 3), padding = 'same', activation = 'relu'),
        Conv2D(256, (3, 3), padding = 'same', activation = 'relu'),
        MaxPooling2D(2, 2),
        # 4th stack
        Conv2D(512, (3, 3), padding = 'same', activation = 'relu'),
        Conv2D(512, (3, 3), padding = 'same', activation = 'relu'),
        Conv2D(512, (3, 3), padding = 'same', activation = 'relu'),
        MaxPooling2D(2, 2),
        # 5th stack
        Conv2D(512, (3, 3), padding = 'same', activation = 'relu'),
        Conv2D(512, (3, 3), padding = 'same', activation = 'relu'),
        Conv2D(512, (3, 3), padding = 'same', activation = 'relu'),
        MaxPooling2D(2, 2),
        # 6th stack
        Flatten(),
        Dense(4096, activation = 'relu'),
        Dropout(0.5),
        # 7th stack
        Dense(4096, activation = 'relu'),
        Dropout(0.5),
        # 8th stack
        Dense(1, activation = 'sigmoid')
    ]
)
#lion = Lion(lr=1e-3)
# Compile the model with Lion optimizer
vgg16.compile(optimizer=optimizers.RMSprop(learning_rate = 1e-3), loss='binary_crossentropy', metrics=['accuracy'])

In [11]:
# Load the data using tensorflow utilities
root = "../data"
train_ds = tf.keras.utils.image_dataset_from_directory(
    root,
    validation_split=0.3,
    subset='training',
    seed=42,
    image_size=(224, 224),
    batch_size=32
)
val_ds = tf.keras.utils.image_dataset_from_directory(
    root,
    validation_split=0.3,
    subset='validation',
    seed=42,
    image_size=(224, 224),
    batch_size=32
)

Found 27558 files belonging to 2 classes.
Using 19291 files for training.
Found 27558 files belonging to 2 classes.
Using 8267 files for validation.


In [12]:
# Configure the dataset for performance
AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
vgg16.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 rescaling_3 (Rescaling)     (None, 224, 224, 3)       0         
                                                                 
 conv2d_39 (Conv2D)          (None, 224, 224, 64)      1792      
                                                                 
 conv2d_40 (Conv2D)          (None, 224, 224, 64)      36928     
                                                                 
 max_pooling2d_15 (MaxPoolin  (None, 112, 112, 64)     0         
 g2D)                                                            
                                                                 
 conv2d_41 (Conv2D)          (None, 112, 112, 128)     73856     
                                                                 
 conv2d_42 (Conv2D)          (None, 112, 112, 128)     147584    
                                                      

In [13]:
epochs=10
history = vgg16.fit(
  train_ds,
  validation_data=val_ds,
  epochs=epochs
)

Epoch 1/10


: 

: 

> Perform data augmentation

To improve the model performance, we will perform data augmentation on the training set. We will use the following transformations:

- Rotation
- Zoom
- Horizontal and Vertical Flips
- Width and Height Shifts
- Shear Transformation
- Brightness
- Contrast
- Gaussian Noise
- Gaussian Blur
- Motion Blur
- Median Blur
- Random Crop
- To Gray
- Coarse Dropout
- Invert

In [None]:
# we will use the Keras ImageDataGenerator to augment the data
#from keras.preprocessing.image import ImageDataGenerator

# create an instance of the ImageDataGenerator
#datagen = ImageDataGenerator()
# fit the generator to the data - this will calculate the mean and std of the data
#datagen.fit(X_train)
## get a batch iterator to efficiently iterate over the training data
#train_iterator = datagen.flow(X_train, y_train, batch_size=32)
## get a batch iterator for the validation data
#val_iterator = datagen.flow(X_val, y_val, batch_size=32)
## fit the model
#mod
