In [1]:
# TODO Tuto for using / implementing a variant in ACETONE

# ACETONE tutorial #2

**Implementing and using other versions of a layer**

Efficiency is a key aspect in the embedded sector, with each code being specifically adpated to a terget. As such, we need to be able to chose the implementation of each layer.

In this notebook, we'll explain how to create and use specific versions of a layer in ACETONE.

* When running this notebook on Colab, we need to install ACETONE 
* If you run this notebook locally, run it in the environment in which you installed ACETONE

In [2]:
# TODO Installs on collab

In [3]:
# Cleaning the working environment
from pathlib import Path
from os import remove, listdir
files_directories = [Path("demo_lenet_indirect_gemm"),Path("demo_lenet_std_gemm"), Path("demo_lenet_easy")]

for directory in files_directories:
    if directory.exists():
        for file in listdir(directory):
            remove(directory / file)

## Imports

In this notebook, we'll use as an example a simple Lenet5 model exported to Keras' format h5. The used dataset is randomly generated for testing purposes.

![lenet5](./data/lenet5_trained.png)

In [4]:
import numpy as np

from acetone_nnet import CodeGenerator, cli_compare, list_all_implementations, conv2d_factory
from acetone_nnet.generator import Conv2D

In [5]:
model_path = "../tests/models/lenet5/lenet5_trained/lenet5_trained.h5"
test_dataset = "../tests/models/lenet5/lenet5_trained/test_input_lenet5.txt"
function_name = "demo_lenet"
nb_tests = 1

## Using ACETONE's native implementations

The framework laready provides, for some layers, several versions from which to choose before generating our code. 
In this notebook, we will focus on the convolution layer.

In [6]:
implemented = list_all_implementations()
for layer_name in implemented:
    print(layer_name,":")
    for implementation in implemented[layer_name]:
        print("   ", implementation)
    print("\n")

Conv2D :
    6loops
    indirect_gemm_nn
    indirect_gemm_tn
    indirect_gemm_nt
    indirect_gemm_tt
    std_gemm_nn
    std_gemm_tn
    std_gemm_nt
    std_gemm_tt
    gemm_target




We can change the implementation of a specific type of layer by using the class **CodeGenerator**'s argument `versions`. 

This argument takes a dictionnary containing a reference to the layer (usually the name) as key and the verion's name as value.

In this example, we want to use the algorithm `indirect_gemm_nn` to compute the convolution. ***(Describe algo)***

In [7]:
# Version of the layer to use
conv_algorithm = "indirect_gemm_nn"
indirect_gemm_output_path = "demo_lenet_indirect_gemm"

# Create an ACETONE CodeGenerator from the model
indirect_gemm_generator = CodeGenerator(file=model_path,
                                            function_name=function_name,
                                            test_dataset=test_dataset,
                                            versions={"Conv2D":conv_algorithm},
                                            nb_tests=nb_tests)

2025-04-11 15:34:04.001850: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2025-04-11 15:34:04.059831: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2025-04-11 15:34:04.060487: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Finished model initialization.


2025-04-11 15:34:08.240235: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...


Once the generator has been created, we can generate the corresponding C code and compute the inference.

In [8]:
indirect_gemm_generator.generate_c_files(indirect_gemm_output_path)
indirect_gemm_generator.compute_inference(indirect_gemm_output_path)

Generated function source file.
Generated function header file.
Conv2D 1 patches size: (14400,)
Conv2D 3 patches size: (9600,)
Generated globalvars .c file.
Generated main file.
Generated Makefile.
Generated testdataset files.
(1, 5, 5, 6)
(6, 5, 5, 16)
File output_python.txt generated.


array([2.14688426e-05, 9.42424937e-09, 2.48964154e-03, 9.30244851e-01,
       4.02413366e-09, 4.87522280e-07, 1.02479293e-09, 2.22011113e-08,
       6.72312127e-02, 1.23013641e-05])

In [9]:
# Version of the layer to use
conv_algorithm = "std_gemm_nn"
std_gemm_output_path = "demo_lenet_std_gemm"

# Create an ACETONE CodeGenerator from the model
std_gemm_generator = CodeGenerator(file=model_path,
                                    function_name=function_name,
                                    test_dataset=test_dataset,
                                    versions={1:conv_algorithm, 3:conv_algorithm},
                                    nb_tests=nb_tests)

Finished model initialization.


In [10]:
std_gemm_generator.generate_c_files(std_gemm_output_path)
std_gemm_generator.compute_inference(std_gemm_output_path)

Generated function source file.
Generated function header file.
Generated globalvars .c file.
Generated main file.
Generated Makefile.
Generated testdataset files.
(1, 5, 5, 6)


(6, 5, 5, 16)
File output_python.txt generated.


array([2.14688426e-05, 9.42424937e-09, 2.48964154e-03, 9.30244851e-01,
       4.02413366e-09, 4.87522280e-07, 1.02479293e-09, 2.22011113e-08,
       6.72312127e-02, 1.23013641e-05])

In [11]:
# Compiling the code
! make -C demo_lenet_indirect_gemm all

# Running the executable
! ./demo_lenet_indirect_gemm/demo_lenet ./demo_lenet_indirect_gemm/output_c.txt

make : on entre dans le répertoire « /tmp_user/ldtim610h/yaitaiss/acetone/tutorials/demo_lenet_indirect_gemm »
gcc  -g -w -lm   -c -o inference.o inference.c
gcc  -g -w -lm   -c -o global_vars.o global_vars.c
gcc  -g -w -lm   -c -o main.o main.c
gcc  -g -w -lm   -c -o test_dataset.o test_dataset.c
gcc   -o demo_lenet inference.o global_vars.o main.o test_dataset.o  inference.h test_dataset.h   -g -w -lm
make : on quitte le répertoire « /tmp_user/ldtim610h/yaitaiss/acetone/tutorials/demo_lenet_indirect_gemm »
   Average time over 1 tests: 1.155000e-05 s 
   ACETONE framework's inference output: 
2.14688262e-05 9.42424627e-09 0.00248963805 0.930245101 4.02412859e-09 4.8752247e-07 1.02479236e-09 2.22010641e-08 0.0672310665 1.23013542e-05 


In [12]:
# Compiling the code
! make -C demo_lenet_std_gemm all

# Running the executable
! ./demo_lenet_std_gemm/demo_lenet ./demo_lenet_std_gemm/output_c.txt

make : on entre dans le répertoire « /tmp_user/ldtim610h/yaitaiss/acetone/tutorials/demo_lenet_std_gemm »
gcc  -g -w -lm   -c -o inference.o inference.c
gcc  -g -w -lm   -c -o global_vars.o global_vars.c
gcc  -g -w -lm   -c -o main.o main.c
gcc  -g -w -lm   -c -o test_dataset.o test_dataset.c
gcc   -o demo_lenet inference.o global_vars.o main.o test_dataset.o  inference.h test_dataset.h   -g -w -lm
make : on quitte le répertoire « /tmp_user/ldtim610h/yaitaiss/acetone/tutorials/demo_lenet_std_gemm »
   Average time over 1 tests: 1.263000e-05 s 
   ACETONE framework's inference output: 
2.14688262e-05 9.42424627e-09 0.00248963805 0.930245101 4.02412859e-09 4.8752247e-07 1.02479236e-09 2.22010641e-08 0.0672310665 1.23013542e-05 


In [13]:
cli_compare(reference_file="./demo_lenet_indirect_gemm/output_c.txt", c_file="./demo_lenet_std_gemm/output_c.txt", nb_tests=1)

    Max absolute error for 1 test(s): 0.0
    Max relative error for 1 test(s): 0.0



## Adding a new implementation

Let's now assume that, after studies and tests, we have found a new way to perform a convolution : setting each element of the output to `0.42`.

This method being far more efficient and simple than any other, we want to use it with ACETONE. But, sadly, the framework doesn't have an implementation for it, we have to add it ourselves.

In [14]:
# Printing all the algorithm implemented in ACETONE for a convolution
print("Base implementations : ")
print(conv2d_factory.list_implementations)

Base implementations : 
['6loops', 'indirect_gemm_nn', 'indirect_gemm_tn', 'indirect_gemm_nt', 'indirect_gemm_tt', 'std_gemm_nn', 'std_gemm_tn', 'std_gemm_nt', 'std_gemm_tt', 'gemm_target']


To implement it, we have to  create a new class inheriting from the `Conv2D` class (or one of its child classes). 

* The first method we must implement is called `generate_inference_code`. This method will construct the C code correponding to the layer, and return it as a string.
* The second method, `forwad_path_layer`, is optional. It tell the framework how to compute the output of the layer unsing Pyhton. If not given, the method defined in the parent class is used.



In [15]:
# Creating a new implementation
class Conv2D_Demo(Conv2D):

    def __init__(self, **kwargs: int) -> None:
        """Build a Convolution layer with a demo implementation."""
        super().__init__(**kwargs)
    
    def generate_inference_code_layer(self) -> str:
        """Generate computation code for layer."""
        input_str = [prev_layer.output_str for prev_layer in self.previous_layer]
        ouptut_str = f"output_{self.path}"

        code_str =  f"    // {self.name}_{self.idx}\n    for (k = 0; k < {self.size}; ++k) {ouptut_str}[k] = 0.42;"
        return code_str
    
    def forward_path_layer(self, input_array) -> np.ndarray:
        return 0.42*np.ones((1,self.output_channels,self.output_height,self.output_width))

When parsing the neural network, each time ACETONE encounters a layer having several versions, it places a temporary layers. Once the model completly extracted, those placeholders are then replaced by a defintive layer whith the correct implementation, simply by extracting the values stored (such as weight, size, biases, ...) and using them to initialize a new layer.

In [16]:
# Creating a Conv2D_Demo layer using the attributes of old_layer
def conv2d_demo_implementation(
        old_layer: Conv2D,
        conv_algo: str,
) -> Conv2D_Demo:
    return Conv2D_Demo(
        idx=old_layer.idx,
        conv_algorithm=conv_algo,
        size=old_layer.size,
        padding=old_layer.padding,
        strides=old_layer.strides,
        kernel_h=old_layer.kernel_h,
        kernel_w=old_layer.kernel_w,
        dilation_rate=old_layer.dilation_rate,
        nb_filters=old_layer.nb_filters,
        input_shape=[1, old_layer.input_channels, old_layer.input_height, old_layer.input_width],
        output_shape=[1, old_layer.output_channels, old_layer.output_height, old_layer.output_width],
        weights=old_layer.weights,
        biases=old_layer.biases,
        activation_function=old_layer.activation_function,
    )

Finally, to add the newly created implementation to ACETONE, we need to register it within the layer's version manager.

In [17]:
conv2d_factory.register_implementation("demo", conv2d_demo_implementation)

print("Updated implementations : ")
print(conv2d_factory.list_implementations)

Updated implementations : 
['6loops', 'indirect_gemm_nn', 'indirect_gemm_tn', 'indirect_gemm_nt', 'indirect_gemm_tt', 'std_gemm_nn', 'std_gemm_tn', 'std_gemm_nt', 'std_gemm_tt', 'gemm_target', 'demo']


The new version being available in the list of implementations, we can now use it to generate code.

In [18]:
# Version of the layer to use
conv_algorithm = "demo"
demo_output_path = "demo_lenet_optimized"

# Create an ACETONE CodeGenerator from the model
demo_generator = CodeGenerator(file=model_path,
                                    function_name=function_name,
                                    test_dataset=test_dataset,
                                    versions={"Conv2D":conv_algorithm},
                                    nb_tests=nb_tests)

demo_generator.generate_c_files(demo_output_path)

Finished model initialization.
Generated function source file.
Generated function header file.
Generated globalvars .c file.
Generated main file.
Generated Makefile.
Generated testdataset files.


The code then has the optimized implementation and is ready to be deployed on any target !