StructureAdaptionFramework: a framework for handling neuron-level and layer-level structure adaptions in
neural networks.

Copyright (C) 2023  Roman Frels, roman.frels@gmail.com

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published by
the Free Software Foundation, version 3 of the License.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.

# Structure adaption demonstration

This is a demonstration of the capabilities of the structure adaption framework. In short, it supports growing and pruning of neurons and (multiple) layers while training the model and handles related tasks for convenience,
including preserving all weights and keeping the optimizer slots for the weights intact.
It is designed to work in conjunction with growing and pruning criteria that optimize the network architecture during training.

First we load a simple image recognition dataset and set up a simple convolutional neural network that we train for one epoch.

In [1]:
import tensorflow as tf
import StructureAdaption.structure_adaption as structure_adaption
import tensorflow_datasets as tfds

num_pixels = 60
ds_train = tfds.load('citrus_leaves', shuffle_files=True, as_supervised=True, with_info=False)
ds_train = ds_train['train']
def resize_image(image, label):
    resized_image = tf.image.resize(image, (num_pixels, num_pixels))
    return resized_image, label
ds_train = ds_train.map(resize_image)
ds_train = ds_train.batch(32)

def example_model():
    inputs = tf.keras.Input(shape=[num_pixels, num_pixels, 3], dtype=tf.dtypes.float32, name='x0')

    x1 = tf.keras.layers.Conv2D(8, 3, strides=1, padding='same', activation='relu', name='x1')(inputs)
    x2 = tf.keras.layers.BatchNormalization(name='x2')(x1)
    x3 = tf.keras.layers.Conv2D(8, 3, strides=1, padding='valid', activation='relu', name='x3')(x2)
    x4 = tf.keras.layers.BatchNormalization(name='x4')(x3)
    x5 = tf.keras.layers.Conv2D(8, 3, strides=2, padding='same', activation='relu', name='x5')(x4)
    x6 = tf.keras.layers.BatchNormalization(name='x6')(x5)
    x7 = tf.keras.layers.Conv2D(8, 3, strides=2, padding='valid', activation='relu', name='x7')(x6)
    x8 = tf.keras.layers.BatchNormalization(name='x8')(x7)

    x9 = tf.keras.layers.Flatten(name='x9')(x8)
    x10 = tf.keras.layers.Dense(units=12, activation='relu', name='x10')(x9)
    x11 = tf.keras.layers.Dense(units=8, activation='relu', name='x11')(x10)
    x12 = tf.keras.layers.Dense(units=4, activation='relu', name='x12')(x11)
    outputs = tf.keras.layers.Softmax(name='x13')(x12)
    return tf.keras.Model(inputs=inputs, outputs=outputs)

base_model = example_model()
parsed_model = structure_adaption.parse_model(base_model)

def compile_fn():
    parsed_model(tf.random.uniform(shape=(1, num_pixels, num_pixels, 3)))
    #parsed_model(tf.keras.Input((num_features)))

parsed_model.summary()

optimizer = tf.keras.optimizers.SGD(momentum=0.01)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics = [tf.keras.metrics.SparseCategoricalAccuracy()]

parsed_model.internal_model.compile(optimizer, loss_fn, metrics)
parsed_model.internal_model.fit(ds_train, epochs=1)

2023-09-06 21:04:42.066631: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-09-06 21:04:42.066654: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-09-06 21:04:44.019653: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2023-09-06 21:04:44.019675: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2023-09-06 21:04:44.019692: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (rome): /proc/driver/nvidia/version does not exist
2023-09-06 21:04:44.019906: I tensorflow/core/platform/cpu_feature_guard

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 x0 (InputLayer)             [(None, 60, 60, 3)]       0         
                                                                 
 x1 (AdaptionLayer)          (None, 60, 60, 8)         224       
                                                                 
 x2 (AdaptionLayer)          (None, 60, 60, 8)         32        
                                                                 
 x3 (AdaptionLayer)          (None, 58, 58, 8)         584       
                                                                 
 x4 (AdaptionLayer)          (None, 58, 58, 8)         32        
                                                                 
 x5 (AdaptionLayer)          (None, 29, 29, 8)         584       
                                                                 
 x6 (AdaptionLayer)          (None, 29, 29, 8)         32    

Since this is a simple sequential model, a growing criterion might choose to open a parallel branch convolutional layers with larger receptive fields. Keep in mind that all kinds of complicated structures would be possible, whatever the criterion chooses.
We introduce new layers and connect them to the insert start layer x2. The insert branch ends at layer x7, where a new add layer will be introduced. Observe the new layer in the summary named `new_add_node0`. After growing the branch the internal model is
recompiled with the optimizer, loss function and metrics and the training is continued. To get a better understanding of the framework it is important to understand that the internal tensorflow model is copied with each adaption. This invalidates of course
all references to the old internal model.

In [2]:
insert_start = parsed_model.internal_model.layers[2]
insert_end = parsed_model.internal_model.layers[7]
print('start of insert branch: ' + insert_start.name + ', end of insert branch: ' + insert_end.name)

l14 = tf.keras.layers.Conv2D(8, 5, strides=1, padding='same', activation='relu', name='x14')
x14 = l14(insert_start.output)
l15 = tf.keras.layers.BatchNormalization(name='x15')
x15 = l15(x14)

l16 = tf.keras.layers.Conv2D(8, 9, strides=1, padding='same', activation='relu', name='x16')
x16 = l16(insert_start.output)
l17 = tf.keras.layers.BatchNormalization(name='x17')
x17 = l17(x16)

l18 = tf.keras.layers.Add(name='x18')
x18 = l18([x15, x17])

l19 = tf.keras.layers.Conv2D(8, 3, strides=2, padding='valid', activation='relu', name='x19')
x19 = l19(x18)
l20 = tf.keras.layers.BatchNormalization(name='x20')
x20 = l20(x19)

insert_branch = structure_adaption.InsertBranch([insert_start, l14, l15, l16, l17, l18, l19, l20], insert_end)
parsed_model.grow_branch(insert_branch, optimizer, compile_fn, carry_optimizer=True)
parsed_model.summary()
parsed_model.internal_model.compile(optimizer, loss_fn, metrics)
parsed_model.internal_model.fit(ds_train, epochs=1)

start of insert branch: x2, end of insert branch: x7
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 x0 (InputLayer)                [(None, 60, 60, 3)]  0           []                               
                                                                                                  
 x1 (AdaptionLayer)             (None, 60, 60, 8)    224         ['x0[0][0]']                     
                                                                                                  
 x2 (AdaptionLayer)             (None, 60, 60, 8)    32          ['x1[0][0]']                     
                                                                                                  
 x14 (AdaptionLayer)            (None, 60, 60, 8)    1608        ['x2[0][0]']                     
                                         

<keras.callbacks.History at 0x7f124475bf10>

A pruning criterion might choose to remove one of the layers of the classifier head. We select the layer in question (x11), as well as the preceding (x10) and subsequent (x12) layers as start, middle and end of the newly created pruning branch.
When pruning the layer `leave_residual=True` is chosen to preserve the connection from start to end of the pruning branch. Furthermore `skip_mismatch=True` is chosen. This allows new weights to be initialized for layer x12,
since the old weights expect an input with dimension `[... 8]` but will get dimension `[... 12]`. The training is continued again after the adaption step.

In [3]:
layers = parsed_model.internal_model.layers
remove_start = layers[18]
remove_layer = layers[19]
remove_end = layers[20]
print('start layer: ' + remove_start.name + ', to be removed layer: ' + remove_layer.name + ', end layer: ' + remove_end.name)
remove_branch = structure_adaption.Branch([remove_start, remove_layer, remove_end])
parsed_model.prun_branch(remove_branch, optimizer, compile_fn, carry_optimizer=True,leave_residual=True, skip_mismatch=True)
parsed_model.summary()

parsed_model.internal_model.compile(optimizer, loss_fn, metrics)
parsed_model.internal_model.fit(ds_train, epochs=1)

start layer: x10, to be removed layer: x11, end layer: x12
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 x0 (InputLayer)                [(None, 60, 60, 3)]  0           []                               
                                                                                                  
 x1 (AdaptionLayer)             (None, 60, 60, 8)    224         ['x0[0][0]']                     
                                                                                                  
 x2 (AdaptionLayer)             (None, 60, 60, 8)    32          ['x1[0][0]']                     
                                                                                                  
 x14 (AdaptionLayer)            (None, 60, 60, 8)    1608        ['x2[0][0]']                     
                                   

<keras.callbacks.History at 0x7f1244606340>

A pruning criterion for neuron-level pruning might choose to remove neurons from the classification layer x10. This is done by retrieving first the grow prun tuples from the parsed model. These contain all valid combinations for neuron-level adaptions.
The first contained layer is the layer where neurons are added or remove and the last layer has its input weight adjusted accordingly.
Intermediate layers of the grow prun tuple are conserving the output dimension of the previous layer and don't have a notion of neurons.

We retrieve the grow prun tuple matching layer x10 and prun four neurons. The training is again continued afterward.

In [4]:
gp_tuples = parsed_model.grow_prun_tuples
print('pruning neurons in layer: ' + gp_tuples[1].first_layer.name)
parsed_model.prun(gp_tuples[1], [1, 4, 5, 8], optimizer, compile_fn)
parsed_model.summary()

parsed_model.internal_model.compile(optimizer, loss_fn, metrics)
parsed_model.internal_model.fit(ds_train, epochs=1)

pruning neurons in layer: x10
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 x0 (InputLayer)                [(None, 60, 60, 3)]  0           []                               
                                                                                                  
 x1 (AdaptionLayer)             (None, 60, 60, 8)    224         ['x0[0][0]']                     
                                                                                                  
 x2 (AdaptionLayer)             (None, 60, 60, 8)    32          ['x1[0][0]']                     
                                                                                                  
 x14 (AdaptionLayer)            (None, 60, 60, 8)    1608        ['x2[0][0]']                     
                                                                

<keras.callbacks.History at 0x7f12446022e0>

A growing criterion for neuron-level pruning might choose to add neurons/filters to the layer x3. For sake of brevity a simple zero initializer is supplied for the newly initialized weights in the first layer, intermediate layer and last layer.
However, it should be noted that the newly initialized weights can also be specified with numpy arrays or tensorflow tensors. After growing the model is trained one last time.

In [5]:
gp_tuples = parsed_model.grow_prun_tuples
print('growing neurons in layer: ' + gp_tuples[0].first_layer.name)
zeros_init = tf.keras.initializers.Zeros()
init_dict = {'first': [zeros_init, zeros_init], 'intermediate': [[zeros_init, zeros_init, zeros_init, zeros_init]], 'last': [zeros_init, zeros_init]}
parsed_model.grow(gp_tuples[0], 2, init_dict, optimizer, compile_fn)

parsed_model.internal_model.compile(optimizer, loss_fn, metrics)
parsed_model.internal_model.fit(ds_train, epochs=1)

growing neurons in layer: x3


<keras.callbacks.History at 0x7f12442bf2b0>

In this demonstration layer-level and neuron-level growing and pruning was performed, intermixed with training the model. For all caveats that apply consult the documentation.