StructureAdaptionFramework: a framework for handling neuron-level and layer-level structure adaptions in
neural networks.

Copyright (C) 2023  Roman Frels, roman.frels@gmail.com

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published by
the Free Software Foundation, version 3 of the License.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.

# Non sequential struture adaption with training example

In this example we will combine structure adaption with training. It's important to keep in mind, that each structure adaption initializes a new internally wrapped tensorflow model. It behaves like every other tensorflow model, but all references to the previous internal model are invalidated and need to be updated. This includes usually the optimizer slots, referring to weights of the model, and the grow prun tuples, sequential branches and parallel branches.
This example includes:
- Adding and removing neurons while training the network with an optimizer in between
- Adding and removing branches while training the network with an optimizer in between

To understand more caveats please refer to the documentation.

In [15]:
import tensorflow as tf
import StructureAdaption.structure_adaption as structure_adaption
import numpy as np

For simplicity we use the simple dense model from the first example.

In [16]:
num_features = 10

def example_model():
    inputs = tf.keras.Input(shape=[num_features], dtype=tf.dtypes.float32, name='x0')
    x1 = tf.keras.layers.Dense(units=6, activation='relu', name='x1')(inputs)
    x2 = tf.keras.layers.Dense(units=2, activation='relu', name='x2')(x1)
    x3 = tf.keras.layers.Dense(units=4, activation='relu', name='x3')(x2)
    outputs = tf.keras.layers.Softmax(name='x4')(x3)
    return tf.keras.Model(inputs=inputs, outputs=outputs)

base_model = example_model()
parsed_model = structure_adaption.parse_model(base_model)

def compile_fn():
    parsed_model(tf.random.uniform(shape=(1, num_features)))
    #parsed_model(tf.keras.Input((num_features)))

parsed_model.summary()



Model: "model_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 x0 (InputLayer)             [(None, 10)]              0         
                                                                 
 x1 (AdaptionLayer)          (None, 6)                 66        
                                                                 
 x2 (AdaptionLayer)          (None, 2)                 14        
                                                                 
 x3 (AdaptionLayer)          (None, 4)                 12        
                                                                 
 x4 (Softmax)                (None, 4)                 0         
                                                                 
Total params: 92
Trainable params: 92
Non-trainable params: 0
_________________________________________________________________


Now we need to define an optimizer and a training set for training. We train the model on the training set.

In [17]:
num_samples = 96
random_features = tf.random.uniform(shape=(num_samples, num_features))
random_label = tf.math.reduce_mean(random_features, axis=1, keepdims=True)
random_labels = tf.math.exp(tf.concat([random_label, 2* random_label, 3* random_label, 4*random_label], axis=1))
dataset = tf.data.Dataset.from_tensor_slices((random_features, random_labels))
dataset = dataset.batch(32)

optimizer = tf.keras.optimizers.SGD(momentum=0.01)  #TODO #jit_compile=False)
loss_fn = tf.keras.losses.MeanSquaredError()
parsed_model.internal_model.compile(optimizer=optimizer, loss=loss_fn, metrics=['accuracy'])
parsed_model.internal_model.fit(dataset, epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f90605a2250>

We remove two neurons from layer x1. After that we need to recompile the model and we train again. The optimizer slots are carried over automatically. We grow now two neurons in layer x2 and train again afterwards. In between growing and pruning the internal model can be treated like a regular tensorflow model.

In [18]:
gp_tuples = parsed_model.grow_prun_tuples
first = gp_tuples[0]
print('number of grow prun tuples: ' + str(len(gp_tuples)))
print("Name of the layer of the first gp tuple: " + gp_tuples[0].first_layer.name)
parsed_model.prun(first, [3, 5], optimizer, compile_fn=compile_fn)
parsed_model.internal_model.compile(optimizer=optimizer, loss=loss_fn, metrics=['accuracy'])
parsed_model.internal_model.fit(dataset, epochs=10)
gp_tuples = parsed_model.grow_prun_tuples
second = gp_tuples[1]
zeros_init = tf.keras.initializers.Zeros()
init_dict = {'first': [zeros_init, zeros_init], 'last': [zeros_init, zeros_init]}
parsed_model.grow(second, 2, init_dict, optimizer, compile_fn)
parsed_model.internal_model.compile(optimizer=optimizer, loss=loss_fn, metrics=['accuracy'])
parsed_model.internal_model.fit(dataset, epochs=10)

number of grow prun tuples: 2
Name of the layer of the first gp tuple: x1
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f9060413160>

As after every growing or pruning step we need to get the new sequential branches again from the parsed model. We grow a new branch from layer x0 to layer x3, introducing a new add node in the process.

In [19]:
sequential_branch = parsed_model.sequential_branches[0]
insert_start =  sequential_branch.layers[0]
insert_end = sequential_branch.layers[3]
l5 = tf.keras.layers.Dense(units=2, activation='relu', name='x5')
x5 = l5(insert_start.output)
l6 = tf.keras.layers.Dense(units=4, activation='relu', name='x6')
x6 = l6(x5)

grow_branch = structure_adaption.InsertBranch([insert_start, l5, l6], insert_end)
parsed_model.grow_branch(grow_branch, optimizer, compile_fn, carry_optimizer=True)
parsed_model.summary()
parsed_model.internal_model.compile(optimizer=optimizer, loss=loss_fn, metrics=['accuracy'])
parsed_model.internal_model.fit(dataset, epochs=10)

Model: "model_2"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 x0 (InputLayer)                [(None, 10)]         0           []                               
                                                                                                  
 x1 (AdaptionLayer)             (None, 4)            44          ['x0[0][0]']                     
                                                                                                  
 x5 (AdaptionLayer)             (None, 2)            22          ['x0[0][0]']                     
                                                                                                  
 x2 (AdaptionLayer)             (None, 4)            20          ['x1[0][0]']                     
                                                                                            

<keras.callbacks.History at 0x7f90603cb820>

We prun the branch over the layers x0, x1, and x2 and leave a residual connection.

In [21]:
layers = parsed_model.internal_model.layers
prun_branch = structure_adaption.Branch([layers[0], layers[1], layers[3]])
print('Prun branch consisting of layers: ')
for layer in prun_branch.layers:
    print(layer.name)
parsed_model.prun_branch(prun_branch, optimizer, compile_fn, carry_optimizer=True, leave_residual=True, skip_mismatch=True)
parsed_model.summary()
parsed_model.internal_model.compile(optimizer=optimizer, loss=loss_fn, metrics=['accuracy'])
parsed_model.internal_model.fit(dataset, epochs=10)

Prun branch consisting of layers: 
x0
x5
x6
Model: "model_2"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 x0 (InputLayer)                [(None, 10)]         0           []                               
                                                                                                  
 x2 (AdaptionLayer)             (None, 4)            44          ['x0[0][0]']                     
                                                                                                  
 x6 (AdaptionLayer)             (None, 4)            44          ['x0[0][0]']                     
                                                                                                  
 new_add_node0 (Add)            (None, 4)            0           ['x2[0][0]',                     
                                                

<keras.callbacks.History at 0x7f906026eb80>

As we have seen between growing and pruning the internal model can be treated as any other tensorflow model. No restrictions apply here. When growing or pruning the optimizer should be provided to carry over optimizer slots to the cloned network. Keeping in mind that the network is cloned in each adaption step should hint at the caveats that apply. For further details consult the documentation.