StructureAdaptionFramework: a framework for handling neuron-level and layer-level structure adaptions in
neural networks.

Copyright (C) 2023  Roman Frels, roman.frels@gmail.com

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published by
the Free Software Foundation, version 3 of the License.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.

# Basic structure adaption example

This is an example for the basic functionality of the structure adaption framework.
This example includes:
    - adding and removing neurons
    - adding and removing layers sequentially

To understand the caveats arising in combination with training, see the training focused examples.

In [1]:
import tensorflow as tf
import StructureAdaption.structure_adaption as structure_adaption
import numpy as np

2023-08-27 18:25:58.006303: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-08-27 18:25:58.006332: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


First we define our model to work on. It's a simple dense network. 

In [2]:
def example_model():
    inputs = tf.keras.Input(shape=[10], dtype=tf.dtypes.float32, name='x0')
    x1 = tf.keras.layers.Dense(units=6, activation='relu', name='x1')(inputs)
    x2 = tf.keras.layers.Dense(units=2, activation='relu', name='x2')(x1)
    x3 = tf.keras.layers.Dense(units=4, activation='relu', name='x3')(x2)
    outputs = tf.keras.layers.Softmax(name='x4')(x3)
    return tf.keras.Model(inputs=inputs, outputs=outputs)

In order to use the model with the framework we need to parse it. We get a wrapped model that we use from now on. The internally wrapped model is still accessible via `parsed_model.internal_model`. Furthermore, we retrieve the tuples of layers that can be grown or pruned. A tuple consists of a first layer where neurons are removed or added, possibly intermediate layers that preserve the output dimension and a lst layer where the inputs need to be adjusted accordingly.
A compile function must be defined to bring the model into a valid state after an adaption. Note in the summary that the two dense layers get the type of a dynamically created class (mixin) used to expand their functionality.

In [3]:
base_model = example_model()
parsed_model = structure_adaption.parse_model(base_model)
gp_tuples = parsed_model.grow_prun_tuples
print('number of grow prun tuples: ' + str(len(gp_tuples)))

def compile_fn():
    parsed_model(tf.keras.Input((10)))

parsed_model.summary()

DEBUG structure_adaption.py:722: Registered not supported layer: x0
DEBUG structure_adaption.py:719: Registered supported layer: x1
DEBUG structure_adaption.py:719: Registered supported layer: x2
DEBUG structure_adaption.py:719: Registered supported layer: x3
DEBUG structure_adaption.py:722: Registered not supported layer: x4
DEBUG structure_adaption.py:449: current node: x0
DEBUG structure_adaption.py:449: current node: x1
DEBUG structure_adaption.py:449: current node: x2
DEBUG structure_adaption.py:449: current node: x3
DEBUG structure_adaption.py:449: current node: x4
number of grow prun tuples: 2
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 x0 (InputLayer)             [(None, 10)]              0         
                                                                 
 x1 (AdaptionLayer)          (None, 6)                 66        
                                               

2023-08-27 18:26:00.995071: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2023-08-27 18:26:00.995103: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2023-08-27 18:26:00.995133: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (rome): /proc/driver/nvidia/version does not exist
2023-08-27 18:26:00.995446: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


As you can see from the summary we have three adaption layers with 18, 14 and 16 neurons respectively. Let's say we want them all to have 16 neurons. Coincidentally we have two tuples that can be used to change the number of neurons in the first two dense layers. Tuples can only be formed with layers in the `SupportedLayersConfig.py`. That is why there is not tuple for the last dense layer.
No let's start with pruning two neurons from the first dense layer. For some reason, that should be determined by some criterion, we would like to prun neurons 0 and 2. Then it's one line of code executing the pruning.
Note that we provided `None` for the optimizer since we don't have an optimizer yet. This will come in a subsequent example, when we are also training the model. All weights from the model have been preserved, only leaving out the weights for the two pruned neurons. Please note that we need to re-fetch the grow prun tuples since the network is copied and re-initialized internally. That means the old tuples do refer to outdated layers.

In [4]:
first_tuple = gp_tuples[0]
parsed_model.prun(first_tuple, [0, 2], None, compile_fn)

parsed_model.summary()

gp_tuples = parsed_model.grow_prun_tuples
second_tuple = gp_tuples[1]

INFO structure_adaption.py:253: Weight name x2/bias:0 (filtered to: bias:0) not recognised in x2.
DEBUG structure_adaption.py:1168: outer_nodes: []
DEBUG structure_adaption.py:1169: current_layer: x0
DEBUG structure_adaption.py:1171: outbound layers: ['x1']
DEBUG structure_adaption.py:1191: clone and connect: current layer x0 and outbound layer x1
DEBUG structure_adaption.py:1102: normal edge detected from x0 to x1
DEBUG structure_adaption.py:1104: connecting: x0 to x1
DEBUG structure_adaption.py:1168: outer_nodes: []
DEBUG structure_adaption.py:1169: current_layer: x1
DEBUG structure_adaption.py:1171: outbound layers: ['x2']
DEBUG structure_adaption.py:1191: clone and connect: current layer x1 and outbound layer x2
DEBUG structure_adaption.py:1102: normal edge detected from x1 to x2
DEBUG structure_adaption.py:1104: connecting: x1 to x2
DEBUG structure_adaption.py:1168: outer_nodes: []
DEBUG structure_adaption.py:1169: current_layer: x2
DEBUG structure_adaption.py:1171: outbound layer

Now we need to add two neurons to the model. We need to specify how we like to initialize the new neurons. This can be done on four ways:
- We provide a keras intializer
- We provide a numpy array with the matching shape
- We provide a tensorflow tensor with the matching shape
- We provide `None` and need to verify that there is a default initializer defined for the particular weight of this layer type in the `SupportedLayersConfig.py`

We don't provide any initialisation for intermediate layers in this case, because there are none.
Observe that now after the summary we have 4 neurons in each adaption layer.

In [5]:
init_units_kernel = tf.keras.initializers.RandomUniform(minval=-0.05, maxval=0.05, seed=None)
init_units_bias = np.array([2, 2])

init_intermediate = None

init_inputs = tf.constant([[2, 2, 0.5, 1.5], [1, 2, 3, 0.8]])

init_dict = dict(first=[init_units_kernel, init_units_bias], intermediate=[init_intermediate], last=[init_inputs, init_inputs])
parsed_model.grow(second_tuple, 2, init_dict, None, compile_fn)
parsed_model.summary()

INFO structure_adaption.py:208: Weight name x3/bias:0 (filtered to: bias:0) not recognised in x3.
DEBUG structure_adaption.py:1168: outer_nodes: []
DEBUG structure_adaption.py:1169: current_layer: x0
DEBUG structure_adaption.py:1171: outbound layers: ['x1']
DEBUG structure_adaption.py:1191: clone and connect: current layer x0 and outbound layer x1
DEBUG structure_adaption.py:1102: normal edge detected from x0 to x1
DEBUG structure_adaption.py:1104: connecting: x0 to x1
DEBUG structure_adaption.py:1168: outer_nodes: []
DEBUG structure_adaption.py:1169: current_layer: x1
DEBUG structure_adaption.py:1171: outbound layers: ['x2']
DEBUG structure_adaption.py:1191: clone and connect: current layer x1 and outbound layer x2
DEBUG structure_adaption.py:1102: normal edge detected from x1 to x2
DEBUG structure_adaption.py:1104: connecting: x1 to x2
DEBUG structure_adaption.py:1168: outer_nodes: []
DEBUG structure_adaption.py:1169: current_layer: x2
DEBUG structure_adaption.py:1171: outbound layer

Since this model is not deep enough for us, we need to add a few layers sequentially. Therefore, we define an insert branch. We begin by defining the layers we want to add and attach them to the layer we want to insert them after. Then we provide the layer we want to connect them to again, without doing anything to it. We specify that we want to `insert_sequentially=True` and grow the network. Again all the weights are preserved. If weights may not match while inserting, because e.g. the input shape for a layer changes we can specify `skip_mismatch=True` in order to ignore these mismatches and initialize the layer with its own initializer. Again we provide `None` for the optimizer.

In [6]:
start_layer = parsed_model.internal_model.layers[1]
new_layer1 = tf.keras.layers.Dense(units=4, activation='relu', name='x5')
output_nl1 = new_layer1(start_layer.output)
new_layer2 = tf.keras.layers.Dense(units=4, activation='relu', name='x6')
output_nl2 = new_layer2(output_nl1)
end_layer = parsed_model.internal_model.layers[2]

insert_branch = structure_adaption.InsertBranch([start_layer, new_layer1, new_layer2], end_layer)

parsed_model.grow_branch(insert_branch, None, compile_fn, insert_sequential=True)
parsed_model.summary()

DEBUG structure_adaption.py:719: Registered supported layer: x5
DEBUG structure_adaption.py:719: Registered supported layer: x6
DEBUG structure_adaption.py:1235: remove edge found: from x1 to x2
DEBUG structure_adaption.py:1240: add edge found: from: x6 to x2
DEBUG structure_adaption.py:1168: outer_nodes: []
DEBUG structure_adaption.py:1169: current_layer: x0
DEBUG structure_adaption.py:1171: outbound layers: ['x1']
DEBUG structure_adaption.py:1191: clone and connect: current layer x0 and outbound layer x1
DEBUG structure_adaption.py:1102: normal edge detected from x0 to x1
DEBUG structure_adaption.py:1104: connecting: x0 to x1
DEBUG structure_adaption.py:1168: outer_nodes: []
DEBUG structure_adaption.py:1169: current_layer: x1
DEBUG structure_adaption.py:1171: outbound layers: ['x2', 'x5']
DEBUG structure_adaption.py:1191: clone and connect: current layer x1 and outbound layer x2
DEBUG structure_adaption.py:1066: new add edge layer detected: x6 and remove edge detected from x1 to x2
D

Now we changed our mind, because the model is running on resource constrained hardware. The layers x2 and x3 need to be removed. Because we want to leave a connection after the pruning, we specify `leave_residual=True`.

In [7]:
sequential_branches = parsed_model.sequential_branches
sequential_branch_layers = sequential_branches[0].layers
start_layer = sequential_branch_layers[3]  # layer x6
rm_l1 = sequential_branch_layers[4]
rm_l2 = sequential_branch_layers[5]
end_layer = sequential_branch_layers[6]
prun_branch = structure_adaption.Branch([start_layer, rm_l1, rm_l2, end_layer])
parsed_model.prun_branch(prun_branch, None, compile_fn, leave_residual=True)
parsed_model.summary()

DEBUG structure_adaption.py:1235: remove edge found: from x3 to x4
DEBUG structure_adaption.py:1235: remove edge found: from x6 to x2
DEBUG structure_adaption.py:1235: remove edge found: from x2 to x3
DEBUG structure_adaption.py:1240: add edge found: from: x6 to x4
DEBUG structure_adaption.py:1168: outer_nodes: []
DEBUG structure_adaption.py:1169: current_layer: x0
DEBUG structure_adaption.py:1171: outbound layers: ['x1']
DEBUG structure_adaption.py:1191: clone and connect: current layer x0 and outbound layer x1
DEBUG structure_adaption.py:1102: normal edge detected from x0 to x1
DEBUG structure_adaption.py:1104: connecting: x0 to x1
DEBUG structure_adaption.py:1168: outer_nodes: []
DEBUG structure_adaption.py:1169: current_layer: x1
DEBUG structure_adaption.py:1171: outbound layers: ['x5']
DEBUG structure_adaption.py:1191: clone and connect: current layer x1 and outbound layer x5
DEBUG structure_adaption.py:1102: normal edge detected from x1 to x5
DEBUG structure_adaption.py:1104: con

As we can see, the layers have been successfully removed.
We have removed and added neurons to individual layers. And we have removed and inserted layers sequentially. For more details on limitations and caveats consult the documentation and the other examples.