StructureAdaptionFramework: a framework for handling neuron-level and layer-level structure adaptions in
neural networks.

Copyright (C) 2023  Roman Frels, roman.frels@gmail.com

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published by
the Free Software Foundation, version 3 of the License.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.

# Non sequential structure adaption example

In the first example of basic structure adaption we have seen removing and adding neurons and layers sequentially. In this example we will remove and add layers non-sequentially and observe more caveats of the structure adaption framework.
This example includes:
- Sequential, parallel, and sequential-parallel branches
- Grow prun tuples and their relation to sequential branches
- Adding and removing neurons with intermediate layers
- Adding and removing branches non-sequentially

To understand the caveats arising in combination with training, see the training focused examples.

In [1]:
import tensorflow as tf
import StructureAdaption.structure_adaption as structure_adaption
import numpy as np
# Load the TensorBoard notebook extension
#%load_ext tensorboard

2023-08-29 18:23:54.030460: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-08-29 18:23:54.030516: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


This time we define a convolutional network as our example model. It has a parallel path, which does not necessarily make sense. We take care that the dimensions for the add layer match.

In [2]:
def example_model():
    inputs = tf.keras.Input(shape=[32, 32, 3], dtype=tf.dtypes.float32, name='x0')
    x1 = tf.keras.layers.Conv2D(6, 3, strides=1, padding='same', activation='relu', name='x1')(inputs)
    x2 = tf.keras.layers.BatchNormalization(name='x2')(x1)
    x3 = tf.keras.layers.Conv2D(6, 3, strides=2, padding='valid', activation='relu', name='x3')(x2)
    x4 = tf.keras.layers.BatchNormalization(name='x4')(x3)

    x5 = tf.keras.layers.Conv2D(6, 3, strides=1, padding='same', activation='relu', name='x5')(x4)
    x6 = tf.keras.layers.BatchNormalization(name='x6')(x5)
    x7 = tf.keras.layers.Conv2D(6, 3, strides=2, padding='valid', activation='relu', name='x7')(x6)

    x8 = tf.keras.layers.Conv2D(6, 3, strides=1, padding='same', activation='relu', name='x8')(x4)
    x9 = tf.keras.layers.BatchNormalization(name='x9')(x8)
    x10 = tf.keras.layers.Conv2D(6, 3, strides=2, padding='valid', activation='relu', name='x10')(x9)

    x11 = tf.keras.layers.Add(name='x11')([x7, x10])

    x12 = tf.keras.layers.BatchNormalization(name='x12')(x11)
    x13 = tf.keras.layers.Conv2D(9, 3, strides=1, padding='same', activation='relu', name='x13')(x12)
    x14 = tf.keras.layers.BatchNormalization(name='x14')(x13)
    x15 = tf.keras.layers.Conv2D(6, 3, strides=2, padding='valid', activation='relu', name='x15')(x14)

    x16 = tf.keras.layers.Flatten(name='x16')(x15)
    x17 = tf.keras.layers.Dense(units=2, activation='relu', name='x17')(x16)
    outputs = tf.keras.layers.Softmax(name='x18')(x17)
    return tf.keras.Model(inputs=inputs, outputs=outputs)

We are parsing the model and retrieve the grow prun truples, the sequential branches and the parallel branches. We can see that there are four, four and one of them respectively. This means the model has four supported layers with consecutive supported layers that are eligible to neuron-level adaptions (neuron-level growing and pruning). The BatchNormalization layers conserve the output dimension and form the intermediate layers of the grow prun tuples. The model has furthermore four purely sequential portions/branches and one portion, where different branches form parallel paths.

In [3]:
base_model = example_model()
base_model.summary()

parsed_model = structure_adaption.parse_model(base_model)
gp_tuples = parsed_model.grow_prun_tuples
sequential_branches = parsed_model.sequential_branches
parallel_branches = parsed_model.parallel_branches
print('number of grow prun tuples: ' + str(len(gp_tuples)))
print('number of sequential branches: ' + str(len(sequential_branches)))
print('number of parallel branches: ' + str(len(parallel_branches)))

def compile_fn():
    parsed_model(tf.keras.Input((32, 32, 3)))

parsed_model.summary()
#%tensorboard --logdir logs

2023-08-29 18:23:59.495369: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2023-08-29 18:23:59.495434: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2023-08-29 18:23:59.495473: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (rome): /proc/driver/nvidia/version does not exist
2023-08-29 18:23:59.496017: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 x0 (InputLayer)                [(None, 32, 32, 3)]  0           []                               
                                                                                                  
 x1 (Conv2D)                    (None, 32, 32, 6)    168         ['x0[0][0]']                     
                                                                                                  
 x2 (BatchNormalization)        (None, 32, 32, 6)    24          ['x1[0][0]']                     
                                                                                                  
 x3 (Conv2D)                    (None, 15, 15, 6)    330         ['x2[0][0]']                     
                                                                                              

With intermediate layers the growing and pruning of neurons changes only marginally. The pruning statement is exactly the same and in the growing statement we need to specify initializers for the intermediary layers. Note the doubly nested list for that, because there could be multiple intermediary layers. We add two neurons to the layer x1 and remove four neurons from layer x15. We need to retrieve the new grow prun tuples, sequential branches and parallel branches again after each adaption, because they refer to old layers.


In [4]:
print("Name of the layer of the first gp tuple: " + gp_tuples[0].first_layer.name)
first_tuple = gp_tuples[0]

init_units_kernel = tf.keras.initializers.RandomUniform(minval=-0.05, maxval=0.05, seed=None)
init_units_bias = np.array([2, 2])

inits_intermediate = [tf.keras.initializers.Ones(), tf.keras.initializers.Ones(), tf.keras.initializers.Ones(), tf.keras.initializers.Ones()]

init_inputs = [tf.keras.initializers.Ones(), tf.keras.initializers.Ones()]
init_dict = dict(first=[init_units_kernel, init_units_bias], intermediate=[inits_intermediate], last=init_inputs)
parsed_model.grow(first_tuple, 2, init_dict, None, compile_fn)
parsed_model.summary()

gp_tuples = parsed_model.grow_prun_tuples
sequential_branches = parsed_model.sequential_branches
parallel_branches = parsed_model.parallel_branches

print("Name of the layer of the third gp tuple: " + gp_tuples[2].first_layer.name)
third_tuple = gp_tuples[2]
parsed_model.prun(third_tuple, [0, 2, 5], None, compile_fn)
parsed_model.summary()

Name of the layer of the first gp tuple: x1
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 x0 (InputLayer)                [(None, 32, 32, 3)]  0           []                               
                                                                                                  
 x1 (AdaptionLayer)             (None, 32, 32, 8)    224         ['x0[0][0]']                     
                                                                                                  
 x2 (AdaptionLayer)             (None, 32, 32, 8)    32          ['x1[0][0]']                     
                                                                                                  
 x3 (AdaptionLayer)             (None, 15, 15, 6)    438         ['x2[0][0]']                     
                                                  

 In the basic structure adaption example we saw sequential growing and pruning of layers. Now we will grow and prun parallel layers. First we remove on of the existing branches in the parallel branch. Note in the summary that the add node x11 has also been removed, since it only had one remaining input.

In [5]:
sequential_branches = parsed_model.sequential_branches
print("Name of the intermediate layers of the third sequential branch: " + str([intermediate_layer.name for intermediate_layer in sequential_branches[1].intermediate_layers]))
parsed_model.prun_branch(sequential_branches[1], None, compile_fn)
parsed_model.summary()

Name of the intermediate layers of the third sequential branch: ['x5', 'x6', 'x7']
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 x0 (InputLayer)             [(None, 32, 32, 3)]       0         
                                                                 
 x1 (AdaptionLayer)          (None, 32, 32, 8)         224       
                                                                 
 x2 (AdaptionLayer)          (None, 32, 32, 8)         32        
                                                                 
 x3 (AdaptionLayer)          (None, 15, 15, 6)         438       
                                                                 
 x4 (AdaptionLayer)          (None, 15, 15, 6)         24        
                                                                 
 x8 (AdaptionLayer)          (None, 15, 15, 6)         330       
                                            

Then we add a parallel branch again with different layers. This time we add right in front of the Flatten layer x16, starting from layer x12. Note that the framework introduces a new merge node under the name `new_add_node0`. The InsertBranch was designed to match the output shape of the other input Branch to the merge layer.

In [6]:
layers = parsed_model.internal_model.layers
print("Name of the layer 8: " + layers[8].name)
start_layer = layers[8]
print("Name of the layer 12: " + layers[12].name)
end_layer = layers[12]

l19 = tf.keras.layers.Dense(units=6, activation='relu', name='x19')
x19 = l19(start_layer.output)
l20 = tf.keras.layers.BatchNormalization(name='x20')
x20 = l20(x19)
l21 = tf.keras.layers.Conv2D(6, 3, strides=2, padding='valid', activation='relu', name='x21')
x21 = l21(x20)

insert_branch = structure_adaption.InsertBranch([l19, l20, l21], end_layer)

parsed_model.grow_branch(insert_branch, None, compile_fn)
parsed_model.summary()

Name of the layer 8: x12
Name of the layer 12: x16
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 x0 (InputLayer)                [(None, 32, 32, 3)]  0           []                               
                                                                                                  
 x1 (AdaptionLayer)             (None, 32, 32, 8)    224         ['x0[0][0]']                     
                                                                                                  
 x2 (AdaptionLayer)             (None, 32, 32, 8)    32          ['x1[0][0]']                     
                                                                                                  
 x3 (AdaptionLayer)             (None, 15, 15, 6)    438         ['x2[0][0]']                     
                                           

We have removed and added neurons in conjunction with intermediate layers. And we have removed and inserted branches non-sequentially. For more details on limitations and caveats consult the documentation and the other examples.