# ACETONE tutorial #4

**Implementing a new version of a layer**

Efficiency is a key aspect in the embedded sector, with each code being specifically adpated to a target. As such, we need to be able to create new implementations for each layer.

In this notebook, we'll explain how to create specific versions of a layer with ACETONE and use them.
We will then use the mode debug seen in [tutorial #3](./tutorial3_using_debug_mode.ipynb) to correct our implementation

* When running this notebook on Colab, we need to install ACETONE
* If you run this notebook locally, run it in the environment in which you installed ACETONE

In [1]:
# Cleaning the working environment
from pathlib import Path
from os import remove, listdir

# Path to the example files
PATH_DIR = Path("../tests/models/squeezenet1")

# Path to generated directories
output_path = Path("demo_squeezenet_variants")

def clean_directory(directory):
    if directory.exists():
        for file in listdir(directory):
            if not (directory / file).is_dir():
                remove(directory / file)

clean_directory(output_path)

## Imports

In this notebook, we'll use as example the model `SqueezeNet 1.0` (with `opset-version==12`) given in [*ONNX's model zoo*](https://github.com/onnx/models?tab=readme-ov-file). The beginning of the model is illustrated below.

![squeezenet](./data/squeezenet1.png)


In [2]:
# Eternal imports
import numpy as np
import numpy.random as rd
import pystache

# ACETONE's imports
from acetone_nnet import CodeGenerator, conv2d_factory
from acetone_nnet import debug
from acetone_nnet.generator import Conv2D

2025-05-13 09:45:31.858450: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2025-05-13 09:45:31.917285: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2025-05-13 09:45:31.917970: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [3]:
model_path = PATH_DIR / "squeezenet1.onnx"
test_dataset = np.float32(rd.random((1,3,224,224)))
function_name = "demo_squeezenet"
nb_tests = 1

## Adding a new implementation

Let's assume that, after studies and tests, we have found a new way to perform a convolution: setting each element of the output to `0.42`.

This method being far more efficient and simple than any other, we want to use it with ACETONE. But, sadly, the framework doesn't have an implementation for it, we have to add it ourselves.

In [4]:
# Printing all the algorithm implemented in ACETONE for a convolution
print("Base implementations : ")
print(conv2d_factory.list_implementations)

Base implementations : 
['6loops', 'indirect_gemm_nn', 'indirect_gemm_tn', 'indirect_gemm_nt', 'indirect_gemm_tt', 'std_gemm_nn', 'std_gemm_tn', 'std_gemm_nt', 'std_gemm_tt']


To implement it, we have to create a new class inheriting from the `Conv2D` class (or one of its child classes).

* The first method we must implement is called `generate_inference_code`. This method will construct the C code correponding to the layer, and return it as a string.
* The second method, `forwad_path_layer`, is optional. It tell the framework how to compute the output of the layer unsing Pyhton. If not given, the method defined in the parent class is used.

In [6]:
# Creating a new implementation
class Conv2D_Demo(Conv2D):

    def __init__(self, **kwargs: int) -> None:
        """Build a Convolution layer with a demo implementation."""
        super().__init__(**kwargs)

    def generate_inference_code_layer(self) -> str:
        """Generate computation code for layer."""
        input_str = [prev_layer.output_str for prev_layer in self.previous_layer]
        ouptut_str = f"output_{self.path}"

        code_str =  "    // {{name}}_{{idx}}\n    for (k = 0; k < {{size}}; ++k) {{output_str}}[k] = 0.42;"
        return pystache.render(code_str,{"name":self.name, "idx":self.idx, "size":self.size,"output_str":ouptut_str})

    def forward_path_layer(self, input_array) -> np.ndarray:
        return 0.42*np.ones((1,self.output_channels,self.output_height,self.output_width))

When parsing the neural network, each time ACETONE encounters a layer having several versions, it places a temporary layers. Once the model is completely extracted, those placeholders are then replaced by a definitive layer with the correct implementation, simply by extracting the values stored (such as weight, size, biases, ...) and using them to initialize a new layer.

In [7]:
# Creating a Conv2D_Demo layer using the attributes of old_layer
def conv2d_demo_implementation(
        old_layer: Conv2D,
        conv_algo: str,
) -> Conv2D_Demo:
    return Conv2D_Demo(
        idx=old_layer.idx,
        conv_algorithm=conv_algo,
        size=old_layer.size,
        padding=old_layer.padding,
        strides=old_layer.strides,
        kernel_h=old_layer.kernel_h,
        kernel_w=old_layer.kernel_w,
        dilation_rate=old_layer.dilation_rate,
        nb_filters=old_layer.nb_filters,
        input_shape=[1, old_layer.input_channels, old_layer.input_height, old_layer.input_width],
        output_shape=[1, old_layer.output_channels, old_layer.output_height, old_layer.output_width],
        weights=old_layer.weights,
        biases=old_layer.biases,
        activation_function=old_layer.activation_function,
    )

Finally, to add the newly created implementation to ACETONE, we need to register it within the layer's version manager.

In [8]:
conv2d_factory.register_implementation("demo", conv2d_demo_implementation)

print("Updated implementations : ")
print(conv2d_factory.list_implementations)

Updated implementations : 
['6loops', 'indirect_gemm_nn', 'indirect_gemm_tn', 'indirect_gemm_nt', 'indirect_gemm_tt', 'std_gemm_nn', 'std_gemm_tn', 'std_gemm_nt', 'std_gemm_tt', 'demo']


The new version being available in the list of implementations, we can now use it to generate code.

In [9]:
# Create an ACETONE CodeGenerator from the model
demo_generator = CodeGenerator(file=model_path,
                                    function_name=function_name,
                                    external_input=True,
                                    versions={"Conv2D":"demo"},
                                    nb_tests=nb_tests,
                                    verbose=False)

demo_generator.generate_c_files(output_path)

Finished model initialization.


The code then has the optimized implementation and is ready to be deployed on any target!

## Using the debug mode

We will now test our implementation to ensure its correctness.

We first create our reference against which the generated code will be compared.

In [10]:
to_save = True
saving_path = output_path / "debug_squeezenet.onnx"
otpimize_inputs = True

model, _, outputs_onnx = debug.debug_onnx(target_model=str(model_path),
                                          dataset=test_dataset,
                                          otpimize_inputs=otpimize_inputs,
                                          to_save=to_save,
                                          path=saving_path)


We then generate our new code, this time in debug mode. The new implementation is applied only on the convolution `29`, to isolate our implementation and try to locate any problems that could occur.

In [11]:
# Cleaning the directory to generate new code
clean_directory(output_path)
# Generating the C code
debug_generator = CodeGenerator(file=model_path,
                                test_dataset=test_dataset,
                                function_name=function_name,
                                nb_tests=nb_tests,
                                debug_mode="onnx",
                                versions={29:"demo"},
                                verbose=False)

debug_generator.generate_c_files(output_path)

Finished model initialization.


In [12]:
# Running the inference
! make -C demo_squeezenet_variants all
! ./demo_squeezenet_variants/demo_squeezenet ./demo_squeezenet_variants/output_c.txt

make: Entering directory '/tmp_user/ldtim610h/yaitaiss/acetone/tutorials/demo_squeezenet_variants'
gcc  -g -w -lm   -c -o inference.o inference.c
gcc  -g -w -lm   -c -o global_vars.o global_vars.c
gcc  -g -w -lm   -c -o main.o main.c
gcc  -g -w -lm   -c -o test_dataset.o test_dataset.c
gcc   -o demo_squeezenet inference.o global_vars.o main.o test_dataset.o  inference.h test_dataset.h   -g -w -lm
make: Leaving directory '/tmp_user/ldtim610h/yaitaiss/acetone/tutorials/demo_squeezenet_variants'


In [13]:
debug_file_path = output_path / "debug_file.txt"
outputs_c, targets_c = debug.extract_outputs_c(path_to_output=debug_file_path,
                                               data_type=debug_generator.data_type,
                                               nb_targets=len(debug_generator.debug_target))

In [14]:
same = debug.compare_result(acetone_result=outputs_c,
                            reference_result=outputs_onnx,
                            targets=targets_c,
                            verbose=True)

--------------------------------------------
Comparing Conv2D 1
--------------------------------------------
--------------------------------------------
Comparing MaxPooling2D 2
--------------------------------------------
--------------------------------------------
Comparing Conv2D 3
--------------------------------------------
--------------------------------------------
Comparing Conv2D 4
--------------------------------------------
--------------------------------------------
Comparing Conv2D 5
--------------------------------------------
--------------------------------------------
Comparing Concatenate 6
--------------------------------------------
--------------------------------------------
Comparing Conv2D 7
--------------------------------------------
--------------------------------------------
Comparing Conv2D 8
--------------------------------------------
--------------------------------------------
Comparing Conv2D 9
--------------------------------------------
--------

We can see that, starting from the layer Conv2D_29, all the layers raise an error, precisely locating the error.