Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between Qkeras model and Keras model #116

Open
sandeep1404 opened this issue Apr 3, 2023 · 9 comments
Open

Difference between Qkeras model and Keras model #116

sandeep1404 opened this issue Apr 3, 2023 · 9 comments

Comments

@sandeep1404
Copy link

sandeep1404 commented Apr 3, 2023

I have 2 models one is baseline keras model and its equivalent keras model, the models are taken from the QKerasTutorial.ipynb,
My keras model is shown below:

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 28, 28, 1)]       0         
                                                                 
 conv2d_1 (Conv2D)           (None, 26, 26, 18)        180       
                                                                 
 act_1 (Activation)          (None, 26, 26, 18)        0         
                                                                 
 conv2d_2 (Conv2D)           (None, 24, 24, 32)        5216      
                                                                 
 act_2 (Activation)          (None, 24, 24, 32)        0         
                                                                 
 flatten (Flatten)           (None, 18432)             0         
                                                                 
 dense (Dense)               (None, 10)                184330    
                                                                 
 softmax (Activation)        (None, 10)                0         
                                                                 
=================================================================
Total params: 189,726
Trainable params: 189,726
Non-trainable params: 0
_________________________________________________________________

My Equivalent Qkeras models is

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_2 (InputLayer)        [(None, 28, 28, 1)]       0         
                                                                 
 conv2d_1 (QConv2D)          (None, 26, 26, 18)        180       
                                                                 
 act_1 (QActivation)         (None, 26, 26, 18)        0         
                                                                 
 conv2d_2 (QConv2D)          (None, 24, 24, 32)        5216      
                                                                 
 act_2 (QActivation)         (None, 24, 24, 32)        0         
                                                                 
 flatten (Flatten)           (None, 18432)             0         
                                                                 
 dense (QDense)              (None, 10)                184330    
                                                                 
 softmax (Activation)        (None, 10)                0         
                                                                 
=================================================================
Total params: 189,726
Trainable params: 189,726
Non-trainable params: 0
_________________________________________________________________
comparison_models/qkeras_model.h5
qmodel.save('comparison_models/qkeras_model.h5')

I cannot able to see the difference in model size in both the models since both the models have the same model size, but the weights are being quantised in qkeras model when i check the each layer weights. My question is where can we observe the actual difference between the two models, what are the different metrics that decide the difference in models, usually the quantised models should perform faster inference when compared to keras models which is not quantised but I observed slower training and inference for quantised models when compared to keras model is it correct since inference should be faster for Qkeras models? what are the key metrics that spot the exact differences between the baseline keras model and quantised qkeras model at the software level, we may spot the difference in the model inference time and model size when we port them on hardware(FPGA )but during software simulations how can see the difference between the models as model size remains same and inference time is not giving a clear picture. Thanks in advance.

@jurevreca12
Copy link

QKeras is a library for Quantization-Aware Training. It still uses float parameters and activation while training, it just simulates quantizing a tensor by limiting the float tensor to a set of values representable with fixed-point parameters. That is why you will not see a difference in size of the model when saved as .h5, and also why training a model in QKeras is slower then training a normal model. QKeras does not currently provide a method of deploying these models to the CPU, where it would use actually n-bits parameters. There are however tools such as hls4ml that can deploy such a model to an FPGA circuit.

@sandeep1404
Copy link
Author

Hi @jurevreca12 Thanks for your reply, so to see the actual difference between qkeras model and baseline keras model which is not quantized in terms of inference time and model size(utilization) I need to port the model on FPGA using the hls4ml tool am I right? Is there any other way to notice the inference time and utilization at the software level without actually porting on FPGA, like I mean any memory profiling tools that can able to tell the difference between the quantized model and the unquantized model in terms of inference time and resource utilisation?

@jurevreca12
Copy link

QKeras can perform quantization to n-bits. This doesn't work well with processors, which typically have ALUs that support 8-bit, 16-bit, 32-bit operations.. Now. That doesn't mean that you can't run models quantized for example 3-bits ona CPU, but its really not straight forward. If you are looking to run your model on a CPU, I suggest you use another quantization library (i.e. https://www.tensorflow.org/model_optimization/guide/quantization/training). Those will quantize to 8-bits, but you will be able to deploy the model on the CPU.
In FPGAs you can make the ALU any size you want, that is why its a good fit with QKeras.

@sandeep1404
Copy link
Author

@jurevreca12 Thanks for your answer, I have one more query, I actually want to train a binary neural network(BNN) with one-bit weights and activations using qkeras, so I trained a bnn model, now if I want to observe inference time and model size I cannot observe it in CPU right am I correct, since the CPU architecture is not designed for 1bit precision, please correct me if I am wrong. So in order to observe inference time for the bnn model or any other bit precision like ternary(3bit), or 6bit precision which is not flexible with CPU architecture, I need to port the model on FPGA using hls4ml and then observe the model size and inference time right?

@jurevreca12
Copy link

You can deploy binarized neural networks to a cpu, but qkeras doesn't allow you to directly generate an implementation for CPU (Binarized networks will use the XNOR operation, which CPUs do support.). For binarized neural networks you can also train a network with the Larq library. And I believe they even have a CPU deployment engine, all though I am not sure if it is openly available (https://docs.larq.dev/compute-engine/).

@sandeep1404
Copy link
Author

Thanks @jurevreca12, Thank you for answering all my queries patiently, earlier I have worked on larq platform for training BNN but they are not suitable to port it on FPGA using hls4ml, since hls4ml doesn't support larq. So if I want to have a comparative analysis of my model in terms of model size and inference for different precision varying from 32 bit(baseline), 16bit, 8 bit,6bit,4bit,3bit,2bit and 1bit using Qkeras, I cannot do it on CPU, the only was is to port the models on FPGA using hls4ml tool flow, am I correct, please correct me if I am wrong, or is there any other way that you can suggest where I can get comparative analysis of the models in terms of model size and inference for different bit precision without porting it on hardware(FPGA).

@Prince5867
Copy link

Hello, I would like to ask you what is your version of keras and Qkeras, I am using the latest version of tf2.11 there is an incompatibility problem

@sandeep1404
Copy link
Author

Hi, I am using Qkeras version ==0.9.0 and TensorFlow version of ==2.12.0, I didn't face any incompatibility problem. If there is anything please let us know.

@Prince5867
Copy link

Hi, I am using Qkeras version ==0.9.0 and TensorFlow version of ==2.12.0, I didn't face any incompatibility problem. If there is anything please let us know.

Strange to say, I was using TF version 2.11.2 at the time, and the problem when installing Qkeras would uninstall my keras first, but now I try to use your TF version and the problem disappears

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants