# Implement your own model

This article will illustrate how to implement a trained model to ESP32.

## Prerequest

1. A trained model with coefficients converted to numpy file (.npy)
2. A development board integrated with ESP32.



## Let's start

First we need to import some packages to load the data.

In [1]:
import numpy as np
import os
import re
import json

- numpy is for loading the coefficients
- re is for searching certain parts of filename as the name of variables
- json is for loading the bias/bn offset exponents when converting in fixed point

Before we start loading the weights, we need to generate some headers for header files :).

In [2]:
with open("output/cnn.h", mode='w', encoding='utf-8') as fc:
    hdr = '#pragma once\n'
    hdr += '#include "dl_lib_matrix3d.h"\n'
    hdr += '#include "dl_lib_matrix3dq.h"\n\n'
    fc.writelines(hdr)

There are 2 types of network, convolution and fully connection. Thus we need to realize the 2 different conversions.

In [3]:
def convert_3d_conv_c(data):
    (H, W, C, N) = data.shape
    c_data = data.copy().reshape(N, H, W, C)
    for n in range(N):
        c_data[n, :, :, :] = data[:, :, :, n]
    return c_data

In [4]:
def convert_3d_fc_w(data):
    (W, H) = data.shape
    fc_data = data.copy()
    return fc_data.T.reshape([1, H, W, 1])

We also need to decide which data precision we need, float point as it is or fixed point with quantization.

In [5]:
data_type = 'fptp_t'    # For float point, 'qtp_t' for fixed point

To get the quantized coefficients, we map the range to the entire 16 bits.

In [6]:
def convert_3d_quantization(data):
    shape = data.shape
    q_data = data.flatten()
    _min = min(q_data)
    _max = max(q_data)
    if _min > 0: _min = 0
    if abs(_min) > abs(_max): _max = abs(_min)
        
    exponent = 0
    qtp_range = 2**15 - 1
    if _max != 0:
        while _max > qtp_range:
            exponent += 1
            _max = _max / 2
        while _max < (qtp_range / 2):
            exponent -= 1
            _max = _max * 2
            
    q_data = q_data * 2**(-exponent)
    q_data = np.array(q_data).reshape(shape).astype('int16')
    return q_data, exponent

But sometimes we have our own exponents that got from testing or debugging.

In [7]:
def convert_3d_quant_exponent(data, exponent):
    shape = data.shape
    q_data = data.flatten()
    q_data = q_data * 2**(-exponent)
    
    q_data = np.array(q_data).reshape(shape).astype('int16')
    return q_data, exponent

Now we start loading the coefficients through their names. 

In this example, the coefficients are put in the `weights` directory, and are named with `.npy`. So we need to get all of them, and then process one by one.

In [8]:
files = os.popen("find weights -name '*.npy' | sort")
# pattern of coefficient's name
coef_pat = re.compile("weights/(.*)\.npy")

In [9]:
for f in files:
    f = f.rstrip()
    coef = np.load(f)
    coef_name = coef_pat.search(f).group(1)
    
    if len(coef.shape) == 2:
        # Fully connection
        coef = convert_3d_fc_w(coef)
    else:
        if len(coef.shape) == 1:
            # Bias/BN
            coef = coef.reshape([1, 1, -1 ,1])
        coef = convert_3d_conv_c(coef)
        
    if data_type == 'qtp_t':
        coef, expo = convert_3d_quantization(coef)
        
    # Generate files
    item_f = "const static "
    if data_type == 'fptp_t':
        item_f += f"fptp_t {coef_name}_item_array[] = "
        data_template = "{:.6f}f, "
    else:
        item_f += f"qtp_t {coef_name}_item_array[] = "
        data_template = "{:d}, "
    item_f += "{\n\t"
    intend = 0
    for d in coef.flat:
        item_f += data_template.format(d)
        intend += 1
        if intend % 8 == 0:
            item_f += "\n\t"
    item_f += "\n};\n\n"
    
    (N, H, W, C) = coef.shape
    struct_f = "const static dl_matrix3d"
    if data_type == 'fptp_t':
        struct_f += f"_t {coef_name} = {{\n"
    else:
        struct_f += f"q_t {coef_name} = {{\n"
    struct_f += f"\t.w = {W},\n"
    struct_f += f"\t.h = {H},\n"
    struct_f += f"\t.c = {C},\n"
    struct_f += f"\t.n = {N},\n"
    struct_f += f"\t.stride = {W * C},\n"
    if data_type == 'qtp_t':
        struct_f += f"\t.exponent = {expo},\n"
    struct_f += f"\t.item = ({data_type} *)(&{coef_name}_item_array[0])\n}};\n\n"
    with open("output/cnn.h", mode='a', encoding='utf-8') as fc:
        fc.writelines(item_f)
        fc.writelines(struct_f)

With `output/cnn.h` we can get the coefficients in C codes.

To test the simple network, we use the data from mnist. And convert it to a loadable header file.

In [10]:
in_data = np.load("2.npy")
hdr = '#pragma once\n'
hdr += '#include "dl_lib_matrix3d.h"\n'
item_f = "const static uc_t input_item_array[] = {\n\t"
data_template = "{:d}, "
intend = 0
for d in in_data.flat:
    item_f += data_template.format(d)
    intend += 1
    if intend % 8 == 0:
        item_f += "\n\t"
item_f += "\n};\n\n"
with open("output/input.h", mode='w', encoding='utf-8') as fc:
    fc.writelines(hdr)
    fc.writelines(item_f)
        

Now we have the input and weights, go and test the network.

In `test` directory, run `idf.py build flash` as other esp32 examples and there we get the result!

![image.png](result.png)