# Fixed-Point Quantisation of CNN

This tutorial introduces fixed-point quantisation of CNN using our Plumber tool-chain.

### Fixed point or Q representation

To represent a non-integer or a fractional number a developer usually has two options. The first one is to use floating point representation, which supports a trade-off between numerical range and precision. The name gives out the main property, the point separating the integer part and fractional point floats, rather than stays fixed. However, this data-type and its arithmetic is challenging to implement in hardware with optimal performance, unless the processing device has enough space and resources for a dedicated *F*loating *P*oint *U*nit (FPU).

That is why in most of low-power, low-performance embeded devices that might require constant resolution we find fixed point representation or *Q*-representation. A non-integer number usually has a total fixed number of bits on which we can operate, be it 16, 32 or 64 etc. These bits are then split into two parts, with an imaginary point separating  The first part is for the _Integer_ part (IP) and the second one is _Fractional_ part. For example, given that we are operating on 16 bits in total, a Q16 number has 16 fractional bits; a Q2.14 number has 2 integer bits and 14 fractional bits. Note, that to represent signed numbers, we usually need to assign one more bit from the integer part to determine the number being signed.

This representation has its pros and its cons, on one hand it is very easy to [implement](https://en.wikipedia.org/wiki/Q_(number_format)#Math_operations) it in low-level designs, giving improved performance and lower power consumption, the issue remains its precision. Let's see that on an example.

In [13]:
import numpy as np

#Fractional bits
f = 2

#Introduce scale by which we are going to scale the output/input
scale = 1 << f

a = np.linspace(1,2,10)
a_fix = np.round(a*f)*(1.0/f)

print(a)
print(a_fix)

[1.         1.11111111 1.22222222 1.33333333 1.44444444 1.55555556
 1.66666667 1.77777778 1.88888889 2.        ]
[1.  1.  1.  1.5 1.5 1.5 1.5 2.  2.  2. ]


Let's look at the range of the numbers and it's resulution e.g.: in *U* (Unsigned) Q2.14 and *S* (Signed, aka one bit from the integer part will represent the sign) Q8.8:

In [14]:
# In total we have 16 bits to operate on
# UQ2.14
i = 2
f = 14
print("UQ2.14")
print("Range is: {} to {}".format(0, 2**i-2**(-f)))
print("Resolution is: {}".format(2**(-f)))

# SQ8.8
i = 8
f = 8
print("SQ8.8")
print("Range is: {} to {}".format(-2**(i-1), 2**(i-1)-2**(-f)))
print("Resolution is: {}".format(2**(-f)))

UQ2.14
Range is: 0 to 3.99993896484375
Resolution is: 6.103515625e-05
SQ8.8
Range is: -128 to 127.99609375
Resolution is: 0.00390625


In this toy example you can see how much the resolution can differ. 

##### Float to Q
To convert the number from floating point to Qm.n format: 
1. Multiply the floating point number by 2<sup>n</sup> - which is basically a shift of a number left by *n* places
2. Round to the nearest integer

##### Q to float
1. Convert the number to floating point as if it was an integer, in other words remove the binary point
2. Multiply by 2<sup>-n</sup>

In [20]:
# Given that we have UQ8.8 format 
m = 8
n = 8
f = 2.1
q = f * 2**n
rounded = round(q)
print("The number in Q format is: {} and it's binary representation is then {}".format(rounded, bin(rounded)))
print("The number back in float is: {}".format(rounded*2**(-n)))

The number in Q format is: 538 and it's binary representation is then 0b1000011010
The number back in float is: 2.1015625


This is another problem that occurs with this representation, if you are performing conversion from one format to another you are loosing precision and eventually, accuracy. 