Keras (https://keras.io/) is an open source free library that gives access to an interface for Neural Networks (NN) in Python. It is now integrated into the Tensorflow library.
With Keras we have the possibility of defining and training neural networks. QKeras (https://github.com/google/qkeras) is a quantization extension to Keras that provides drop-in replacement for some of the Keras layers, especially the ones that creates parameters and activation layers, and perform arithmetic operations, so that we can quickly create a deep quantized version of Keras network

In this example we are going to explore the capabilities of Qkeras, by defining and training a Convolutional Neural Network.
First, we import the necessaries packages and do some checks on libraries versions and GPUs

In [1]:
import keras
import tensorflow as tf
import qkeras
tf.keras.backend.clear_session()
print("TensorFlow version:", tf.__version__)
print("QKeras version:", qkeras.__version__)
print("Keras version:", keras.__version__)
import os

#If available, use GPU
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

2024-09-18 09:39:30.386143: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


TensorFlow version: 2.13.0
QKeras version: 0.9.0
Keras version: 2.13.1
Num GPUs Available:  0


We then create some useful directories where we can store the results

In [2]:
# Specify the folder name
folder_name = 'Mnist_Training'

script_path = os.getcwd()
# Get the current working directory
current_directory = os.path.dirname(script_path)

# Print the current working directory
print("Current working directory:", current_directory)


# Create the full path to the new folder
output_path = current_directory + "/" + folder_name

# Check if the folder already exists
if not os.path.exists(output_path):
    # Create the folder
    os.makedirs(output_path)
    print(f"Folder '{folder_name}' created successfully.")
else:
    print(f"Folder '{folder_name}' already exists.")

print(output_path)

Current working directory: /home/fede/Assegno_UNISS/budva_2024
Folder 'Mnist_Training' already exists.
/home/fede/Assegno_UNISS/budva_2024/Mnist_Training


Here are defined some flags. We can select the type of quantization we want and if we desire to convert the model to onnx

In [3]:
#.......FLAGS.........
mnist_baseline = False
mnist16_8 = False
mnist16_4 = True
mnist8_8 = False
mnist8_4 = False
mnist4_4 = False
mnist4_2 = False
mnist2_2 = False
mnist_bin = False

#This flag is responsible for the activation or deactivation of the transformation of the qkeras model to a qonnx model
convert_to_qonnx = True

#.........................

Here we import the Mnist Dataset, the Sequential for defining a sequential keras model, some layers needed to define the neural network. We then pre-process the mnist dataset by normalizing it, reshaping it and finally, whe apply the categorical function which gives a more desirable form to the classes for the training process.

In [4]:
from keras.datasets import mnist
from keras.models import Sequential
from keras.utils import to_categorical
from keras.optimizers import Adam
from keras.layers import MaxPool2D, Conv2D, Flatten, Dense, Activation


from keras.datasets import mnist

# Prepare the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
x_train, x_test = x_train / 255.0, x_test / 255.0
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)
mnist_input_shape = (1,28,28,1)

Here are defined the different models. Each model has some different kernel_quantizers and bias_quantizers, which are functions that apply the quantization to, respectively, weights and biases of the Conv and Gemm layers, and QActivations, which quantizes the activations. While all the Relu layers are quantized, even if with different precisions for each model, we always left untouched the last activation, the Sigmoid layer, to have a better accuracy.

In [5]:
            

if mnist_baseline:

    model = Sequential()

    model.add(Conv2D(32, (3,3), padding='same', activation='relu', input_shape=mnist_input_shape[1:]))
    model.add(MaxPool2D(pool_size=(2, 2)))
    model.add(Conv2D(64, (3,3), padding='same', activation='relu'))
    model.add(MaxPool2D(pool_size=(2,2)))
    model.add(Flatten())
    model.add(Dropout(0.7))
    model.add(Dense(10,activation = "sigmoid"))

    batch_size = 128
    epochs = 15

    with tf.device('/GPU:0'):
        model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

        model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)

        model.save( output_path +'/mnist_baseline.keras')

elif mnist16_8:

    from keras.layers import *
    from qkeras import *
    from qkeras.qlayers import QDense, QActivation, quantized_bits, quantized_relu
    from qkeras import QConv2D
    from keras.models import Model

    x = x_in = Input(mnist_input_shape[1:])

    x = (QConv2D(32, (3,3), padding='same',kernel_quantizer= quantized_bits(bits=8, integer=4, alpha=1),
        bias_quantizer= quantized_bits(bits=8, integer=4, alpha=1)))(x)
    x =(QActivation(quantized_relu(bits=16, integer=8, use_sigmoid=0, negative_slope=0.0), name="act_1"))(x)
    x =(MaxPool2D(pool_size=(2,2)))(x)
    x =(QConv2D(32, (3,3), padding='same',kernel_quantizer= quantized_bits(bits=8, integer=4, alpha=1),
        bias_quantizer= quantized_bits(bits=8, integer=4, alpha=1)))(x)
    x =(QActivation(quantized_relu(bits=16, integer=8, use_sigmoid=0, negative_slope=0.0), name="act_2"))(x)
    x =(MaxPool2D(pool_size=(2,2)))(x)
    x =(Flatten())(x)
    x =(Dropout(0.5))(x)
    x = QDense((10),kernel_quantizer= quantized_bits(bits=8, integer=4, alpha=1),
        bias_quantizer= quantized_bits(bits=8, integer=4, alpha=1)) (x)   # num_classes = 10
    x =(Activation(activation='sigmoid', name='out_activation'))(x)

    model = Model(inputs=x_in, outputs=x)

    batch_size = 128
    epochs = 15

    with tf.device('/GPU:0'):
        model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

        model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)

        model.save( output_path +'/mnist16_8.keras')
    
elif mnist16_4:

    from keras.layers import *
    from qkeras import *
    from qkeras.qlayers import QDense, QActivation, quantized_bits, quantized_relu
    from qkeras import QConv2D
    from keras.models import Model

    x = x_in = Input(mnist_input_shape[1:])

    x = (QConv2D(32, (3,3), padding='same',kernel_quantizer= quantized_bits(bits=4, integer=2, alpha=1),
        bias_quantizer= quantized_bits(bits=4, integer=2, alpha=1)))(x)
    x =(QActivation(quantized_relu(bits=16, integer=8, use_sigmoid=0, negative_slope=0.0), name="act_1"))(x)
    x =(MaxPool2D(pool_size=(2,2)))(x)
    x =(QConv2D(32, (3,3), padding='same',kernel_quantizer= quantized_bits(bits=4, integer=2, alpha=1),
        bias_quantizer= quantized_bits(bits=4, integer=2, alpha=1)))(x)
    x =(QActivation(quantized_relu(bits=16, integer=8, use_sigmoid=0, negative_slope=0.0), name="act_2"))(x)
    x =(MaxPool2D(pool_size=(2,2)))(x)
    x =(Flatten())(x)
    x =(Dropout(0.6))(x)
    x = QDense((10),kernel_quantizer= quantized_bits(bits=4, integer=2, alpha=1),
        bias_quantizer= quantized_bits(bits=4, integer=2, alpha=1)) (x)   # num_classes = 10
    x =(Activation(activation='sigmoid', name='out_activation'))(x)

    model = Model(inputs=x_in, outputs=x)

    batch_size = 128
    epochs = 2

    with tf.device('/GPU:0'):
        model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

        model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)

        model.save( output_path +'/mnist16_4.keras')

elif mnist8_8:

    from keras.layers import *
    from qkeras import *
    from qkeras.qlayers import QDense, QActivation, quantized_bits, quantized_relu
    from qkeras import QConv2D
    from keras.models import Model

    x = x_in = Input(mnist_input_shape[1:])

    x = (QConv2D(32, (3,3), padding='same',kernel_quantizer= quantized_bits(bits=8, integer=4, alpha=1),
        bias_quantizer= quantized_bits(bits=8, integer=4, alpha=1)))(x)
    x =(QActivation(quantized_relu(bits=8, integer=4, use_sigmoid=0, negative_slope=0.0), name="act_1"))(x)
    x =(MaxPool2D(pool_size=(2,2)))(x)
    x =(QConv2D(32, (3,3), padding='same',kernel_quantizer= quantized_bits(bits=4, integer=2, alpha=1),
        bias_quantizer= quantized_bits(bits=4, integer=2, alpha=1)))(x)
    x =(QActivation(quantized_relu(bits=4, integer=2, use_sigmoid=0, negative_slope=0.0), name="act_2"))(x)
    x =(MaxPool2D(pool_size=(2,2)))(x)
    x =(Flatten())(x)
    x =(Dropout(0.5))(x)
    x = QDense((10),kernel_quantizer= quantized_bits(bits=8, integer=4, alpha=1),
        bias_quantizer= quantized_bits(bits=8, integer=4, alpha=1)) (x)   # num_classes = 10
    x =(Activation(activation='sigmoid', name='out_activation'))(x)

    model = Model(inputs=x_in, outputs=x)

    batch_size = 128
    epochs = 15

    with tf.device('/GPU:0'):
        model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

        model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)

        model.save( output_path +'/mnist8_8_mxd.keras')

elif mnist8_4:

    from keras.layers import *
    from qkeras import *
    from qkeras.qlayers import QDense, QActivation, quantized_bits, quantized_relu
    from qkeras import QConv2D
    from keras.models import Model

    x = x_in = Input(mnist_input_shape[1:])

    x = (QConv2D(32, (3,3), padding='same',kernel_quantizer= quantized_bits(bits=4, integer=2, alpha=1),
        bias_quantizer= quantized_bits(bits=4, integer=2, alpha=1)))(x)
    x =(QActivation(quantized_relu(bits=8, integer=4, use_sigmoid=0, negative_slope=0.0), name="act_1"))(x)
    x =(MaxPool2D(pool_size=(2,2)))(x)
    x =(QConv2D(32, (3,3), padding='same',kernel_quantizer= quantized_bits(bits=4, integer=2, alpha=1),
        bias_quantizer= quantized_bits(bits=4, integer=2, alpha=1)))(x)
    x =(QActivation(quantized_relu(bits=8, integer=4, use_sigmoid=0, negative_slope=0.0), name="act_2"))(x)
    x =(MaxPool2D(pool_size=(2,2)))(x)
    x =(Flatten())(x)
    x =(Dropout(0.5))(x)
    x = QDense((10),kernel_quantizer= quantized_bits(bits=4, integer=2, alpha=1),
        bias_quantizer= quantized_bits(bits=4, integer=2, alpha=1)) (x)   # num_classes = 10
    x =(Activation(activation='sigmoid', name='out_activation'))(x)

    model = Model(inputs=x_in, outputs=x)

    batch_size = 128
    epochs = 15

    with tf.device('/GPU:0'):
        model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

        model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)

        model.save( output_path +'/mnist8_4.keras')

elif mnist4_4:

    from keras.layers import *
    from qkeras import *
    from qkeras.qlayers import QDense, QActivation, quantized_bits, quantized_relu
    from qkeras import QConv2D
    from keras.models import Model

    x = x_in = Input(mnist_input_shape[1:])

    x = (QConv2D(32, (3,3), padding='same',kernel_quantizer= quantized_bits(bits=4, integer=2, alpha=1),
        bias_quantizer= quantized_bits(bits=4, integer=2, alpha=1)))(x)
    x =(QActivation(quantized_relu(bits=4, integer=2, use_sigmoid=0, negative_slope=0.0), name="act_1"))(x)
    x =(MaxPool2D(pool_size=(2,2)))(x)
    x =(QConv2D(32, (3,3), padding='same',kernel_quantizer= quantized_bits(bits=4, integer=2, alpha=1),
        bias_quantizer= quantized_bits(bits=4, integer=2, alpha=1)))(x)
    x =(QActivation(quantized_relu(bits=4, integer=2, use_sigmoid=0, negative_slope=0.0), name="act_2"))(x)
    x =(MaxPool2D(pool_size=(2,2)))(x)
    x =(Flatten())(x)
    x =(Dropout(0.5))(x)
    x = QDense((10),kernel_quantizer= quantized_bits(bits=4, integer=2, alpha=1),
        bias_quantizer= quantized_bits(bits=4, integer=2, alpha=1)) (x)   # num_classes = 10
    x =(Activation(activation='sigmoid', name='out_activation'))(x)

    model = Model(inputs=x_in, outputs=x)

    batch_size = 128
    epochs = 40

    with tf.device('/GPU:0'):
        model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

        model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)

        model.save( output_path +'/mnist4_4.keras')

elif mnist4_2:

    from keras.layers import *
    from qkeras import *
    from qkeras.qlayers import QDense, QActivation, quantized_bits, quantized_relu
    from qkeras import QConv2D
    from keras.models import Model
    from keras.optimizers import Adam
    from keras.callbacks import EarlyStopping

    weight_decay = 0
    x = x_in = Input(mnist_input_shape[1:])

    x = (QConv2D(32, (3,3), padding='same',kernel_quantizer= quantized_bits(bits=2, integer=1, alpha=1),
        bias_quantizer= quantized_bits(bits=2, integer=1, alpha=1),kernel_regularizer=regularizers.l2(weight_decay)))(x)
    x =(QActivation(quantized_relu(bits=4, integer=2, use_sigmoid=0, negative_slope=0.0), name="act_1"))(x)
    x =(MaxPool2D(pool_size=(2,2)))(x)
    x =(QConv2D(32, (3,3), padding='same',kernel_quantizer= quantized_bits(bits=2, integer=1, alpha=1),
        bias_quantizer= quantized_bits(bits=2, integer=1, alpha=1),kernel_regularizer=regularizers.l2(weight_decay)))(x)
    x =(QActivation(quantized_relu(bits=4, integer=2, use_sigmoid=0, negative_slope=0.0), name="act_2"))(x)
    x =(MaxPool2D(pool_size=(2,2)))(x)
    x =(Flatten())(x)
    x =(Dropout(0.05))(x)
    x = QDense((10),kernel_quantizer= quantized_bits(bits=2, integer=1, alpha=1),
        bias_quantizer= quantized_bits(bits=2, integer=1, alpha=1),kernel_regularizer=regularizers.l2(weight_decay)) (x)   # num_classes = 10
    x =(Activation(activation='sigmoid', name='out_activation'))(x)

    model = Model(inputs=x_in, outputs=x)

    batch_size = 128
    epochs = 40

    with tf.device('/GPU:0'):
        model.compile(loss="categorical_crossentropy", optimizer=Adam(learning_rate = 0.001), metrics=["accuracy"])
        early_stopping_monitor = EarlyStopping(
                                        monitor='accuracy',
                                        min_delta=0,
                                        patience=4,
                                        verbose=0,
                                        mode='auto',
                                        baseline=None,
                                        restore_best_weights=True
                                    )

        model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.25, callbacks = [early_stopping_monitor])

        model.save( output_path +'/mnist4_2.keras')

elif mnist2_2:

    from keras.layers import *
    from qkeras import *
    from qkeras.qlayers import QDense, QActivation, quantized_bits, quantized_relu
    from qkeras import QConv2D
    from keras.models import Model
    from keras.optimizers import Adam

    weight_decay = 0
    x = x_in = Input(mnist_input_shape[1:])

    x = (QConv2D(32, (3,3), padding='same',kernel_quantizer= quantized_bits(bits=2, integer=1, alpha=1),
        bias_quantizer= quantized_bits(bits=2, integer=1, alpha=1),kernel_regularizer=regularizers.l2(weight_decay)))(x)
    x =(QActivation(quantized_relu(bits=2, integer=1, use_sigmoid=0, negative_slope=0.0), name="act_1"))(x)
    x =(MaxPool2D(pool_size=(2,2)))(x)
    x =(QConv2D(32, (3,3), padding='same',kernel_quantizer= quantized_bits(bits=2, integer=1, alpha=1),
        bias_quantizer= quantized_bits(bits=2, integer=1, alpha=1),kernel_regularizer=regularizers.l2(weight_decay)))(x)
    x =(QActivation(quantized_relu(bits=2, integer=1, use_sigmoid=0, negative_slope=0.0), name="act_2"))(x)
    x =(MaxPool2D(pool_size=(2,2)))(x)
    x =(Flatten())(x)
    #x =(Dropout(0.3))(x)
    x = QDense((10),kernel_quantizer= quantized_bits(bits=16, integer=8, alpha=1),
        bias_quantizer= quantized_bits(bits=2, integer=1, alpha=1),kernel_regularizer=regularizers.l2(weight_decay)) (x)   # num_classes = 10
    x =(Activation(activation='sigmoid', name='out_activation'))(x)

    model = Model(inputs=x_in, outputs=x)

    batch_size = 128
    epochs = 40

    with tf.device('/GPU:0'):
        model.compile(loss="categorical_crossentropy", optimizer=Adam(learning_rate = 0.01), metrics=["accuracy"])

        model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.25)

        model.save( output_path +'/mnist_mxd_16_2.keras')

elif mnist_bin:

    from keras.layers import *
    from qkeras import *
    from qkeras.qlayers import QDense, QActivation, quantized_bits, quantized_relu
    from qkeras import QConv2D
    from keras.models import Model

    x = x_in = Input(mnist_input_shape[1:])

    x = (QConv2D(32, (3,3), padding='same',kernel_quantizer= binary(use_01=False, alpha=1, use_stochastic_rounding=False),
        bias_quantizer= binary(use_01=False, alpha=1, use_stochastic_rounding=False)))(x)
    x =(QActivation(binary(use_01=False, alpha=1, use_stochastic_rounding=False), name="act_1"))(x)
    x =(MaxPool2D(pool_size=(2,2)))(x)
    x =(QConv2D(32, (3,3), padding='same',kernel_quantizer= binary(use_01=False, alpha=1, use_stochastic_rounding=False),
        bias_quantizer= binary(use_01=False, alpha=1, use_stochastic_rounding=False)))(x)
    x =(QActivation(binary(use_01=False, alpha=1, use_stochastic_rounding=False), name="act_2"))(x)
    x =(MaxPool2D(pool_size=(2,2)))(x)
    x =(Flatten())(x)
    x =(Dropout(0.5))(x)
    x = QDense((10),kernel_quantizer= binary(use_01=False, alpha=1, use_stochastic_rounding=False),
        bias_quantizer= binary(use_01=False, alpha=1, use_stochastic_rounding=False)) (x)   # num_classes = 10
    x =(Activation(activation='sigmoid', name='out_activation'))(x)

    model = Model(inputs=x_in, outputs=x)

    callbacks = [
            tf.keras.callbacks.EarlyStopping(patience=10, verbose=1),
            tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3, verbose=1),
        ]

    batch_size = 128
    epochs = 60

    with tf.device('/GPU:0'):
        model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

        model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1, callbacks = callbacks)

        model.save( output_path +'/mnist_binary_1.keras')




2024-09-18 09:44:46.681073: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.


Epoch 1/2
Epoch 2/2


In this last section we can decide to apply the transformation of the keras model into an onnx model

In [6]:

if convert_to_qonnx:

    from qonnx.converters import from_keras

    #we have ptq_model, converted in qkeras, and k_model, just in keras 

    path = output_path +'/qonnx_model.onnx'
    print("conversion to qonnx...")
    qonnx_model, _  = from_keras(
        model,
        name="qkeras_to_qonnx_converted",
        input_signature=None,
        opset=None,
        custom_ops=None,
        custom_op_handlers=None,
        custom_rewriter=None,
        inputs_as_nchw=None,
        extra_opset=None,
        shape_override=None,
        target=None,
        large_model=False,
        output_path = path,
    )



conversion to qonnx...
from_keras conversion
calling convert_common
convert_common
trying rewriter: <function rewrite_conv2d_with_pad at 0x7ff211b5a1f0>
trying rewriter: <function rewrite_constant_fold at 0x7ff211acdd30>
trying rewriter: <function rewrite_quantize_and_dequantize at 0x7ff211b22d30>
trying rewriter: <function rewrite_fused_ops at 0x7ff211b2b0d0>
trying rewriter: <function rewrite_transpose at 0x7ff211b229d0>
trying rewriter: <function rewrite_flatten at 0x7ff211b60b80>
trying rewriter: <function rewrite_random_uniform at 0x7ff211b640d0>
trying rewriter: <function rewrite_random_uniform_fold_const at 0x7ff211b64160>
trying rewriter: <function rewrite_random_normal at 0x7ff211b60e50>
trying rewriter: <function rewrite_dropout at 0x7ff211b60a60>
trying rewriter: <function rewrite_conv_dilations at 0x7ff211b5ae50>
trying rewriter: <function rewrite_eye at 0x7ff211b60af0>
trying rewriter: <function rewrite_leakyrelu at 0x7ff211b60d30>
trying rewriter: <function rewrite_thresh

2024-09-18 09:48:06.793797: I tensorflow/core/grappler/devices.cc:75] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2024-09-18 09:48:06.796690: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session
2024-09-18 09:48:06.823098: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2024-09-18 09:48:06.825440: I tensorflow/core/grappler/devices.cc:75] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2024-09-18 09:48:06.825525: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session
