# Ablation studies

Originally, in neurology, albation is the surgical removela of body tissue. In the context of machine learning, the term 'ablation study' has been adopted to describe a procedure where certain parts of a network are removed, to gain a better understanding of the network's behavior. By disabling or modifying a part of the system, we can analyze its impact on the performance of the system.


In [1]:
import numpy as np
import tensorflow as tf
import pandas as pd
import plotly.express as px
import plotly
import keras
import keras.layers as kl
from keras.datasets import mnist

In [2]:
def onehot(n):
    out = np.zeros(10)
    out[n] = 1
    return out
def onehot_data(data):
    return np.array([onehot(row) for row in data])

In [3]:
def plothistory(history):
    x = history.epoch
    h = history.history

    fig = plotly.subplots.make_subplots(rows=1, cols=3)

    h['Epoch'] = x
    acc = px.line(h,x='Epoch',y=[
        'accuracy',
        'val_accuracy',
        ],
        title='Accuracy',
        width=700, height=300
        )
    rec = px.line(h,x='Epoch',y=[
        'recall',
        'val_recall',
        ],
        title='Recall',
        width=700, height=300
        )
    prec = px.line(h,x='Epoch',y=[
        'precision',
        'val_precision'
        ],
        title='Precision',
        width=700, height=300
        )
    acc.show()
    rec.show()
    prec.show()

In [4]:
# Get data
(x_train, y_train),(x_test,y_test) = mnist.load_data()
# expand X to fit conv2 later
y_test = onehot_data(y_test)
y_train = onehot_data(y_train)

x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)

In [5]:
# Block model parts together for future modifications
conv_block_1 = lambda d: [
    kl.Conv2D(32,kernel_size=(5,5),padding='Same',activation='relu'),
    kl.Conv2D(32,kernel_size=(5,5),padding='Same',activation='relu'),
    kl.MaxPool2D(pool_size=(2,2)),
    kl.Dropout(d)
    ]


conv_block_2 = lambda d: [
    kl.Conv2D(64,kernel_size=(3,3),padding='Same',activation='relu'),
    kl.Conv2D(64,kernel_size=(3,3),padding='Same',activation='relu'),
    kl.MaxPool2D(pool_size=(2,2),strides=(2,2)),
    kl.Dropout(d),
]

dense_block = lambda n,d: [
    kl.Flatten(),
    kl.Dense(n, activation = "relu"),
    kl.Dropout(d),
]

# Reference model

In [6]:
# Standardowy CNN do mnista
model = keras.Sequential([

    keras.Input(shape=x_train.shape[1:]),

    *conv_block_1(0.25),

    *conv_block_2(0.25),

    *dense_block(256,0.5),
    
    kl.Dense(10,activation='softmax') 
])
optimizer = keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)
model.compile(optimizer = optimizer , loss = "categorical_crossentropy", metrics=["accuracy","Recall","Precision"])
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 28, 28, 32)        832       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 28, 28, 32)        25632     
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 32)        0         
_________________________________________________________________
dropout (Dropout)            (None, 14, 14, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 14, 14, 64)        18496     
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 14, 14, 64)        36928     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 7, 7, 64)          0

# Dense model

Model without convolution and pooling layers 

In [7]:
model_dense = keras.Sequential([

    keras.Input(shape=x_train.shape[1:]),

    # *conv_block_1(0.25),

    # *conv_block_2(0.25),

    *dense_block(1024,0.5),
    
    kl.Dense(10,activation='softmax') 
])
optimizer = keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)
model_dense.compile(optimizer = optimizer , loss = "categorical_crossentropy", metrics=["accuracy","Recall","Precision"])
model_dense.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_1 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 1024)              803840    
_________________________________________________________________
dropout_3 (Dropout)          (None, 1024)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)                10250     
Total params: 814,090
Trainable params: 814,090
Non-trainable params: 0
_________________________________________________________________


# Tight model

Very small dense layer at the end

In [8]:
model_tight = keras.Sequential([

    keras.Input(shape=x_train.shape[1:]),

    *conv_block_1(0.25),

    *conv_block_2(0.25),

    *dense_block(24,0.5),
    
    kl.Dense(10,activation='softmax') 
])
optimizer = keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)
model_tight.compile(optimizer = optimizer , loss = "categorical_crossentropy", metrics=["accuracy","Recall","Precision"])
model_tight.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_4 (Conv2D)            (None, 28, 28, 32)        832       
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 28, 28, 32)        25632     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 14, 14, 32)        0         
_________________________________________________________________
dropout_4 (Dropout)          (None, 14, 14, 32)        0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 14, 14, 64)        18496     
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 14, 14, 64)        36928     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 7, 7, 64)         

# Dropoutless model

Model without dropout

In [9]:
model_dropoutless = keras.Sequential([

    keras.Input(shape=x_train.shape[1:]),

    *conv_block_1(0),

    *conv_block_2(0),

    *dense_block(256,0),
    
    kl.Dense(10,activation='softmax') 
])
optimizer = keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)
model_dropoutless.compile(optimizer = optimizer , loss = "categorical_crossentropy", metrics=["accuracy","Recall","Precision"])
model_dropoutless.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_8 (Conv2D)            (None, 28, 28, 32)        832       
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 28, 28, 32)        25632     
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 14, 14, 32)        0         
_________________________________________________________________
dropout_7 (Dropout)          (None, 14, 14, 32)        0         
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 14, 14, 64)        18496     
_________________________________________________________________
conv2d_11 (Conv2D)           (None, 14, 14, 64)        36928     
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 7, 7, 64)         

# Sigmoid model

Model with sigmoid activation instead of relu

In [10]:
model_sigmoid = keras.Sequential([

    keras.Input(shape=x_train.shape[1:]),
    kl.Conv2D(32,kernel_size=(5,5),padding='Same',activation='sigmoid'),
    kl.Conv2D(32,kernel_size=(5,5),padding='Same',activation='sigmoid'),
    kl.MaxPool2D(pool_size=(2,2)),
    kl.Dropout(0.25),
    kl.Conv2D(64,kernel_size=(3,3),padding='Same',activation='sigmoid'),
    kl.Conv2D(64,kernel_size=(3,3),padding='Same',activation='sigmoid'),
    kl.MaxPool2D(pool_size=(2,2),strides=(2,2)),
    kl.Dropout(0.25),
    kl.Flatten(),
    kl.Dense(256, activation = "sigmoid"),
    kl.Dropout(0.5),    
    kl.Dense(10,activation='softmax') 
])
optimizer = keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)
model_sigmoid.compile(optimizer = optimizer , loss = "categorical_crossentropy", metrics=["accuracy","Recall","Precision"])
model_sigmoid.summary()

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_12 (Conv2D)           (None, 28, 28, 32)        832       
_________________________________________________________________
conv2d_13 (Conv2D)           (None, 28, 28, 32)        25632     
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 14, 14, 32)        0         
_________________________________________________________________
dropout_10 (Dropout)         (None, 14, 14, 32)        0         
_________________________________________________________________
conv2d_14 (Conv2D)           (None, 14, 14, 64)        18496     
_________________________________________________________________
conv2d_15 (Conv2D)           (None, 14, 14, 64)        36928     
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 7, 7, 64)         

In [11]:
def fit_model(model):
    return model.fit(
        x_train,y_train,
        epochs=4,
        validation_data=(x_test,y_test))

In [33]:
def results(history):
    h = history.history
    acc = h['val_accuracy'][-1]
    rec = h['val_recall'][-1]
    prec = h['val_precision'][-1]
    testacc = h['accuracy'][-1]
    return (acc,rec,prec,testacc)

In [13]:
histories = {
    'reference':fit_model(model),
    'dense': fit_model(model_dense),
    'tight': fit_model(model_tight),
    'dropoutless': fit_model(model_dropoutless),
    'sigmoid': fit_model(model_sigmoid),
}

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


# Reference

In [14]:
plothistory(histories['reference'])

# Dense

In [15]:
plothistory(histories['dense'])

# Tight

In [16]:
plothistory(histories['tight'])

# Dropoutless

In [17]:
plothistory(histories['dropoutless'])

# Sigmoid

In [18]:
plothistory(histories['sigmoid'])

# Comparison

In [34]:
for k in histories:
    print(k)
    res = [f'{e:.5f}' for e in results(histories[k])]
    depth = '    '
    print(depth + 'train accuracy: ' + res[3])
    print(depth + 'test accuracy: ' + res[0])
    print(depth + 'recall: ' + res[1])
    print(depth + 'precision: ' + res[2])

reference
    train accuracy: 0.95783
    test accuracy: 0.97990
    recall: 0.97770
    precision: 0.98281
dense
    train accuracy: 0.93190
    test accuracy: 0.96160
    recall: 0.95980
    precision: 0.96375
tight
    train accuracy: 0.11237
    test accuracy: 0.11350
    recall: 0.00000
    precision: 0.00000
dropoutless
    train accuracy: 0.98113
    test accuracy: 0.98040
    recall: 0.98020
    precision: 0.98059
sigmoid
    train accuracy: 0.97167
    test accuracy: 0.98450
    recall: 0.98350
    precision: 0.98626


### Dense

While the model is viable and generalizes correctly, it achieves significantly less accuracy due to having a harder time detecting repeating patterns in the image without the usage of a convolution.

### Tight

This model is as good as a random predictor, and therefore garbage. Combined with the results from the dense model, I can conclude that the dense layer is critical to the model's function.

### Dropoutless

Surprisingly, the dropoutless model, while overfitting, generalizes correctly. Sadly, with longer training, the dropoutless model is surpassed by the reference model, which means that dropout is a necessary addition. 

### Sigmoid

The sigmoid model is not significantly better or worse than the reference model, making it a viable alternative. Because I did not normalize the input data to be in range (0-1), after passing through a sigmoid layer the data becomes basically tresholded. This dataset responds well to that, since it is almost only fully black and fully white, but it probably would not be the case for any dataset with less defined edges between objects and the background.