# CS-6580 Lecture 12 - Architecture Patterns
**Dylan Zwick**

*Weber State University*

Today, we're going to dive into some architectures that are frequently utilized in the building of neural networks - particularly but not exclusively convolutional neural networks.

When you're designing a model, what you're really doing is designing its *hypothesis space* - the space of possible functions over which gradient descent can search, parametrized by the model's weights. Like feature engineering, a good hypotheses space encodes *prior knowledge* about the problem you're trying to solve. For example, using convolutional layers means that you expect the relevant patterns to be translation invariant.

A good model architecture is one that reduces the size of the search space while not significantly limiting effective solutions. It's about making the problem simpler for gradient descent to solve - which is a pretty simple process that needs all the help it can get.

Today, we'll review a few essential architecture best practices - *residual connections, batch normalization*, and *separable convolutions*. We'll then apply them to the cats vs. dogs problem we investigated a couple weeks ago.

An important general idea when architecting, well, really anything, is that you want what you do to be *modular*, *hierarchical*, and *reusable*. Being modular means it's broken down into the fundamental classes of problem. Being hierarchical means that patterns are applied at multiple levels of abstraction and complexity. Being reusable means that when the same problems are encountered in different contexts the same solutions are applied.

Deep learning model architecture is primarily about maknig clever use of modularity, hierarchy, and reuse. Neural network architectures are structured into repeated groups of layers (usually called "blocks"). These layers are then structures into pyramid-like hierarchies - the number of filters grows with layer depth, while the size of the feature maps shrink.

Generally speaking, a deep stack of narrow layers performs better than a shallow stack of large layers. However, there's a limit to how deep you can stack layers, due to the problem of *vanishing gradients*. One way to mitigate this problem is with *residual connections*.

### Residual Connecitons

Backpropagation in a sequential deep learning model is kind of similar to a game of "Telephone", in that every time information is transmitted noise is introduced into the signal.

If you have a chain of functions,

<center>
  $\displaystyle f_{4}(f_{3}(f_{2}(f_{1}(x))))$,
</center>

the goal is to adjust the parameters of each based on the error recorded on the output $f_{4}$. To adjunt $f_{1}$, you need to percolate error information through $f_{2}$, $f_{3}$, and $f_{4}$. However, each time you do this you introduce some noise. If your function chain is too deep, this noise can overwhelm the signal, and backpropagation stops working.

Residual connections attempt to ameliorate this problem. The approach is simple. Just att the input of a layer back to its output.

<center>
    <img src = "https://lh3.googleusercontent.com/drive-viewer/AEYmBYTZ4pAWwc9a_fSGtlNhOZ-lUmMsMxTfE6oIIs3fp3ZgvndfRwMAUUgOiR8A3PIFhWRH8qrG1-LvpgfhN0HhATOvQVnz=s2560" width=200>
</center>

Note that adding the input back implies the output has the same shape as the input. The way this is managed is through a $1 \times 1$ convolutional layer without activation, with strides to match any downsampling.

### Batch Normalizations

*Normalization* is a broad category of methods that seek to make different samples seen by a machine learning model more similar to each other, which helps the model learn and generalize well to new data.

One very common type of data normalization is centering the data on zero, and then giving the data a unit standand deviation. This is just a constant addition applied to all the data points, and then a constant multiplication. This is, in effect, making the assumption that the data follows a normal distribution, which is frequently but not always reasonable.

Batch normalization applies this to layers *within* a network. Just because the data entering a neural network layer has a $0$ mean or unit variance, there's no reason to expect the output will. Well, batch normalization insures it does.

Nobody really understands exactly how batch normalization helps. There are theories, but no certainty. However, there's no debate that for certain types of problems it does.

One thing to note about batch normalization - if you're using it, you *don't* need or want a bias term.

### Depthwise Separable Convolutions

Depthwise separable convolutions are kind of amazing. Not amazing in so far as how they work. That's kind of cool, but nothing profound. What's amazing about them is how well they tend to work. It's essentially a drop-in replacement for *Conv2D* that will make your model smaller, and cause it to perform a few percentage points *better* on many tasks. How cool is that?

The depthwise separable convolution is so named because it deals with the depth dimension — the number of channels. Specifically, it separates the convolution into 2 parts: a depthwise convolution and a pointwise convolution.

For the depthwise convolution, it applies a convolution to each layer of the image *independently*. It then does a pointwise $1 \times 1$ convolution. This is equivalent to separating the learning of spatial features and the learning of channel-wise features. This is basically assuming the different channels are highly independent.

Depthwise separable convolution requires significantly fewer parameters and involves fewer computations compared to regular convolutions.

### Cats vs. Dogs II

In [1]:
# Remove the comment below and run the command. If you do need to run it, you should only need to run it once.
# !pip install gdown

In [2]:
#The Usual Suspects
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#Our deepl learning libraries
import tensorflow as tf
from tensorflow import keras
from keras import layers

#Some OS Libraries
import os
import shutil
import pathlib

#Some libraries for downloading and unzipping files
import gdown
import zipfile

#For creating a dataset from image libraries
from keras.preprocessing import image_dataset_from_directory

#For augmenting our image data
from tensorflow.keras.layers.experimental.preprocessing import RandomFlip, RandomRotation, RandomZoom

In [3]:
url = 'https://drive.google.com/uc?id=1m8tc0BAcDy6J9KNkWH1MNKzbBaD5Y32S'
output = 'cats_vs_dogs.zip'
gdown.download(url, output, quiet=False)

Downloading...
From (original): https://drive.google.com/uc?id=1m8tc0BAcDy6J9KNkWH1MNKzbBaD5Y32S
From (redirected): https://drive.google.com/uc?id=1m8tc0BAcDy6J9KNkWH1MNKzbBaD5Y32S&confirm=t&uuid=063189fd-1d29-493b-ba1a-0818242729b7
To: /content/cats_vs_dogs.zip
100%|██████████| 228M/228M [00:03<00:00, 57.9MB/s]


'cats_vs_dogs.zip'

In [6]:
!unzip -qq cats_vs_dogs.zip

In [7]:
new_base_dir = pathlib.Path("cats_vs_dogs")

train_dataset = image_dataset_from_directory(
    new_base_dir / "train",
    image_size=(180, 180),
    batch_size=32)
validation_dataset = image_dataset_from_directory(
    new_base_dir / "validation",
    image_size=(180, 180),
    batch_size=32)
test_dataset = image_dataset_from_directory(
    new_base_dir / "test",
    image_size=(180, 180),
    batch_size=32)

Found 2000 files belonging to 2 classes.
Found 1000 files belonging to 2 classes.
Found 2000 files belonging to 2 classes.


In [8]:
data_augmentation = keras.Sequential(
    [
        layers.RandomFlip("horizontal"),
        layers.RandomRotation(0.1),
        layers.RandomZoom(0.2),
    ]
)

In [13]:
inputs = keras.Input(shape=(180, 180, 3))
x = data_augmentation(inputs)

x = layers.Rescaling(1./255)(x)
x = layers.Conv2D(filters=32, kernel_size=5, use_bias=False)(x)

for size in [32, 64, 128, 256, 512]:
    residual = x

    x = layers.BatchNormalization()(x)
    x = layers.Activation("relu")(x)
    x = layers.SeparableConv2D(size, 3, padding="same", use_bias=False)(x)

    x = layers.BatchNormalization()(x)
    x = layers.Activation("relu")(x)
    x = layers.SeparableConv2D(size, 3, padding="same", use_bias=False)(x)

    x = layers.MaxPooling2D(3, strides=2, padding="same")(x)

    residual = layers.Conv2D(
        size, 1, strides=2, padding="same", use_bias=False)(residual)
    x = layers.add([x, residual])

x = layers.GlobalAveragePooling2D()(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs=inputs, outputs=outputs)

In [14]:
model.compile(loss="binary_crossentropy",
              optimizer="rmsprop",
              metrics=["accuracy"])

callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="new_convnet_model.keras",
        save_best_only=True,
        monitor="val_loss")
]

history = model.fit(
    train_dataset,
    epochs=20,
    validation_data=validation_dataset,
    callbacks=callbacks)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [15]:
model.summary()

Model: "model_2"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
 input_3 (InputLayer)        [(None, 180, 180, 3)]        0         []                            
                                                                                                  
 sequential (Sequential)     (None, 180, 180, 3)          0         ['input_3[0][0]']             
                                                                                                  
 rescaling_2 (Rescaling)     (None, 180, 180, 3)          0         ['sequential[2][0]']          
                                                                                                  
 conv2d_12 (Conv2D)          (None, 176, 176, 32)         2400      ['rescaling_2[0][0]']         
                                                                                            

In [16]:
test_model = keras.models.load_model("new_convnet_model.keras")
test_loss, test_acc = test_model.evaluate(test_dataset)
print(f"Test accuracy: {test_acc:.3f}")

Test accuracy: 0.743


In [17]:
inputs = keras.Input(shape=(180, 180, 3))
x = layers.Rescaling(1./255)(inputs)
x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=64, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=128, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.Flatten()(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs=inputs, outputs=outputs)

In [18]:
model.compile(loss="binary_crossentropy",
              optimizer="rmsprop",
              metrics=["accuracy"])

callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="old_convnet_model.keras",
        save_best_only=True,
        monitor="val_loss")
]
history = model.fit(
    train_dataset,
    epochs=20,
    validation_data=validation_dataset,
    callbacks=callbacks)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [19]:
model.summary()

Model: "model_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_4 (InputLayer)        [(None, 180, 180, 3)]     0         
                                                                 
 rescaling_3 (Rescaling)     (None, 180, 180, 3)       0         
                                                                 
 conv2d_18 (Conv2D)          (None, 178, 178, 32)      896       
                                                                 
 max_pooling2d_15 (MaxPooli  (None, 89, 89, 32)        0         
 ng2D)                                                           
                                                                 
 conv2d_19 (Conv2D)          (None, 87, 87, 64)        18496     
                                                                 
 max_pooling2d_16 (MaxPooli  (None, 43, 43, 64)        0         
 ng2D)                                                     

In [20]:
test_model = keras.models.load_model("old_convnet_model.keras")
test_loss, test_acc = test_model.evaluate(test_dataset)
print(f"Test accuracy: {test_acc:.3f}")

Test accuracy: 0.744
