# Part 2: Using the xfDNN Quantizer to Recalibrate Models

## Introduction 

In this part of the lab, we will look at quantizing 32-bit floating point models to Int16 or Int8 inpreparation for deployment. Deploying Int16/8 models dramatically improves inference deployment and lowers latency. While floating point precision is useful in model training, it is more energy efficient as well as lower latency to deploy models in lower precison. 

The xfDNN Quantizer performs a technique of quantization known as recalibration. This technique does not require full retraining of the model, and can be accomplished in a matter of seconds, as you will see below. It also allows you to maintain the accuracy of the high precision model.

Quantization of the model does not alter the orginal high precision model, rather, it calculates the dynamic range of the model and produces scaling parameters recorded in a json file, which will be used by the xDNN overlay during execution of the network/model. Quantization of the model is an offline process that only needs to be performed once per model. The quantizer produces an optimal target quantization from a given network (prototxt and caffemodel) and calibration set (unlabeled input images) without requiring hours of retraining or a labeled dataset.

In this lab, we will look at quantizing an optimized model generated from Part 1, defined in Caffe prototxt and caffemodel, to Int16 and Int8.  Depending on your earlier notebook this will be either a GoogLeNet-v1 or Resnet-50 model.

Just like in Part 1, first we will run through an example, then you will get a chance to try the quantizer yourself. 

### 1. Import required packages 

In [1]:
import os,sys
from __future__ import print_function

# Bring in Xilinx ML-Suite Compiler
from xfdnn.tools.quantize.quantize import CaffeFrontend as xfdnnQuantizer

### 1. Create Quantizer Instance and run it

To simplify handling of arguments, a config dictionary is used. Take a look at the dictionary below.

The arguments that need to be passed are:
- `outmodel` - Filename generated by the compiler for the optimized prototxt and caffemodel.
- `quantizecfg` - Output JSON filename of quantization scaling parameters. 
- `bitwidths` - Desired precision from quantizer. This is to set the precision for [image data, weight bitwidth, conv output]. All three values need to be set to the same setting. The valid options are `16` for Int16 and `8` for Int8.  
- `in_shape` - Sets the desired input image size of the first layer. Images will be resized to these demensions and must match the network data/placeholder layer.
- `transpose` - Images start as H,W,C (H=0,W=1,C=2) transpose swaps to C,H,W (2,0,1) for typical networks.
- `channel_swap` - Depending on network training and image read, can swap from RGB (R=0,G=1,B=2) to BGR (2,1,0).
- `raw_scale` - Depending on network training, scale pixel values before mean subtraction.
- `img_mean` - Depending on network training, subtract image mean if available.
- `input_scale` - Depending on network training, scale after subtracting mean.
- `calibration_size` - Number of images the quantizer will use to calculate the dynamic range. 
- `calibration_directory` - Location of dir of images used for the calibration process. 

Below is an example with all the parameters filled in. `channel_swap` `raw_scale` `img_mean` `input_scale` are expert parameters that should be left in the default positions, indicated below. 

In [2]:
# Use a config dictionary to pass parameters to the compiler
config = {}

config["caffemodel"] = "work/optimized_model" # String for naming intermediate prototxt, caffemodel

# Quantizer Arguments
#config["outmodel"] = Defined in Step 1 # String for naming intermediate prototxt, caffemodel
config["quantizecfg"] = "work/quantization_params.json" # Quantizer will generate quantization params
config["bitwidths"] = [16,16,16] # Supported quantization precision
config["in_shape"] = [3,224,224] # Images will be resized to this shape -> Needs to match prototxt
config["transpose"] = [2,0,1] # (H,W,C)->(C,H,W) transpose argument to quantizer
config["channel_swap"] = [2,1,0] # (R,G,B)->(B,G,R) Channel Swap argument to quantizer
config["raw_scale"] = 255.0
config["img_mean"] = [104.007, 116.669, 122.679] # Mean of the training set (From Imagenet)
config["input_scale"] = 1.0
config["calibration_size"] = 8 # Number of calibration images quantizer will use
config["calibration_directory"] = "../xfdnn/tools/quantize/calibration_directory" # Directory of images

quantizer = xfdnnQuantizer(
    deploy_model=config["caffemodel"]+".prototxt",        # Model filename: input file
    weights=config["caffemodel"]+".caffemodel",           # Floating Point weights
    output_json=config["quantizecfg"],                    # Quantization JSON output filename
    bitwidths=config["bitwidths"],                        # Fixed Point precision: 8,8,8 or 16,16,16
    dims=config["in_shape"],                              # Image dimensions [C,H,W]
    transpose=config["transpose"],                        # Transpose argument to caffe transformer
    channel_swap=config["channel_swap"],                  # Channel swap argument to caffe transfomer
    raw_scale=config["raw_scale"],                        # Raw scale argument to caffe transformer
    mean_value=config["img_mean"],                        # Image mean per channel to caffe transformer
    input_scale=config["input_scale"],                    # Input scale argument to caffe transformer
    calibration_size=config["calibration_size"],          # Number of calibration images to use
    calibration_directory=config["calibration_directory"] # Directory containing calbration images
)

# Invoke quantizer
try:
    quantizer.quantize()

    import json
    data = json.loads(open(config["quantizecfg"]).read())
    print("**********\nSuccessfully produced quantization JSON file for %d layers.\n"%len(data['network']))
except Exception as e:
    print("Failed to quantize:",e)

Mean : [104.007 116.669 122.679]
Adding ../xfdnn/tools/quantize/calibration_directory/14931486720_37bd588ce9_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/15439525724_97d7cc2c81_z.jpg to calibration batch.


  warn('`as_grey` has been deprecated in favor of `as_gray`')
  warn("The default mode, 'constant', will be changed to 'reflect' in "
  warn("Anti-aliasing will be enabled by default in skimage 0.15 to "


Adding ../xfdnn/tools/quantize/calibration_directory/3272651417_27976a64b3_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/3591612840_33710806df_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/36085792773_b9a3d115a3_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/4814953542_de4b973dc2_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/8289365270_82b20ef781_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/AdrianStoica_Rory_discdog.jpg to calibration batch.
--------------------------------------------------------------------------------
Processing layer 0 of 139
Layer Name:data Type:Input
Inputs: [], Outputs: ['data']
Quantizing layer output...
n:  32768 , len(bin_edges):  1099
Mean : th_layer_out:  150.9929962158203 , sf_layer_out:  0.00460808118582172
bw_layer_out:  16
th_layer_out:  150.9929962158203
--------------------------

bw_layer_out:  16
th_layer_out:  10.017965316772461
--------------------------------------------------------------------------------
Processing layer 18 of 139
Layer Name:res2b_branch2c Type:Convolution
Inputs: ['res2b_branch2b'], Outputs: ['res2b_branch2c']
Quantizing conv input layer ... res2b_branch2c
Threshold in shape= ()
Quantizing conv weights for layer res2b_branch2c...
Threshold params shape= (256,)
n:  32768 , len(bin_edges):  2536
Mean : th_layer_out:  15.793405532836914 , sf_layer_out:  0.00048199119641214986
Threshold out shape= ()
n:  32768 , len(bin_edges):  2536
Mean : th_layer_out:  15.793405532836914 , sf_layer_out:  0.00048199119641214986
bw_layer_in:  16
th_layer_in:  10.017965316772461
bw_layer_out:  16
th_layer_out:  15.793405532836914
--------------------------------------------------------------------------------
Processing layer 19 of 139
Layer Name:res2b Type:Eltwise
Inputs: ['res2a_res2a_relu_0_split_1', 'res2b_branch2c'], Outputs: ['res2b']
bw_layer_in:  16


n:  32768 , len(bin_edges):  1793
Mean : th_layer_out:  17.47499656677246 , sf_layer_out:  0.0005333108483160637
bw_layer_in:  16
th_layer_in:  8.553567886352539
bw_layer_out:  16
th_layer_out:  17.47499656677246
--------------------------------------------------------------------------------
Processing layer 36 of 139
Layer Name:res3a Type:Eltwise
Inputs: ['res3a_branch1', 'res3a_branch2c'], Outputs: ['res3a']
bw_layer_in:  16
th_layer_in:  17.47499656677246
bw_layer_out:  16
th_layer_out:  17.47499656677246
--------------------------------------------------------------------------------
Processing layer 37 of 139
Layer Name:res3a_relu Type:ReLU
Inputs: ['res3a'], Outputs: ['res3a']
n:  32768 , len(bin_edges):  1794
Mean : th_layer_out:  20.071392059326172 , sf_layer_out:  0.0006125489687590006
bw_layer_out:  16
th_layer_out:  20.071392059326172
--------------------------------------------------------------------------------
Processing layer 38 of 139
Layer Name:res3a_res3a_relu_0_spl

n:  32768 , len(bin_edges):  1793
Mean : th_layer_out:  25.08395767211914 , sf_layer_out:  0.0007655249999120805
bw_layer_out:  16
th_layer_out:  25.08395767211914
--------------------------------------------------------------------------------
Processing layer 54 of 139
Layer Name:res3c_res3c_relu_0_split Type:Split
Inputs: ['res3c'], Outputs: ['res3c_res3c_relu_0_split_0', 'res3c_res3c_relu_0_split_1']
bw_layer_in:  16
th_layer_in:  25.08395767211914
bw_layer_out:  16
th_layer_out:  25.08395767211914
--------------------------------------------------------------------------------
Processing layer 55 of 139
Layer Name:res3d_branch2a Type:Convolution
Inputs: ['res3c_res3c_relu_0_split_0'], Outputs: ['res3d_branch2a']
Quantizing conv input layer ... res3d_branch2a
Threshold in shape= ()
Quantizing conv weights for layer res3d_branch2a...
Threshold params shape= (128,)
n:  32768 , len(bin_edges):  898
Mean : th_layer_out:  20.83589744567871 , sf_layer_out:  0.0006358805336368514
Threshol

bw_layer_in:  16
th_layer_in:  8.478489875793457
bw_layer_out:  16
th_layer_out:  9.1516752243042
--------------------------------------------------------------------------------
Processing layer 75 of 139
Layer Name:res4b_branch2b_relu Type:ReLU
Inputs: ['res4b_branch2b'], Outputs: ['res4b_branch2b']
n:  32768 , len(bin_edges):  635
Mean : th_layer_out:  7.449094295501709 , sf_layer_out:  0.00022733525484486554
bw_layer_out:  16
th_layer_out:  7.449094295501709
--------------------------------------------------------------------------------
Processing layer 76 of 139
Layer Name:res4b_branch2c Type:Convolution
Inputs: ['res4b_branch2b'], Outputs: ['res4b_branch2c']
Quantizing conv input layer ... res4b_branch2c
Threshold in shape= ()
Quantizing conv weights for layer res4b_branch2c...
Threshold params shape= (1024,)
n:  32768 , len(bin_edges):  1269
Mean : th_layer_out:  10.426902770996094 , sf_layer_out:  0.0003182135310219457
Threshold out shape= ()
n:  32768 , len(bin_edges):  1269


n:  32768 , len(bin_edges):  1269
Mean : th_layer_out:  22.687528610229492 , sf_layer_out:  0.0006923895568782462
bw_layer_out:  16
th_layer_out:  22.687528610229492
--------------------------------------------------------------------------------
Processing layer 95 of 139
Layer Name:res4d_res4d_relu_0_split Type:Split
Inputs: ['res4d'], Outputs: ['res4d_res4d_relu_0_split_0', 'res4d_res4d_relu_0_split_1']
bw_layer_in:  16
th_layer_in:  22.687528610229492
bw_layer_out:  16
th_layer_out:  22.687528610229492
--------------------------------------------------------------------------------
Processing layer 96 of 139
Layer Name:res4e_branch2a Type:Convolution
Inputs: ['res4d_res4d_relu_0_split_0'], Outputs: ['res4e_branch2a']
Quantizing conv input layer ... res4e_branch2a
Threshold in shape= ()
Quantizing conv weights for layer res4e_branch2a...
Threshold params shape= (256,)
n:  32768 , len(bin_edges):  635
Mean : th_layer_out:  10.931966781616211 , sf_layer_out:  0.00033362733181604087
Th

n:  32768 , len(bin_edges):  450
Mean : th_layer_out:  8.515145301818848 , sf_layer_out:  0.0002598695425830515
Threshold out shape= ()
n:  32768 , len(bin_edges):  450
Mean : th_layer_out:  8.515145301818848 , sf_layer_out:  0.0002598695425830515
bw_layer_in:  16
th_layer_in:  24.951187133789062
bw_layer_out:  16
th_layer_out:  8.515145301818848
--------------------------------------------------------------------------------
Processing layer 114 of 139
Layer Name:res5a_branch2a_relu Type:ReLU
Inputs: ['res5a_branch2a'], Outputs: ['res5a_branch2a']
n:  32768 , len(bin_edges):  449
Mean : th_layer_out:  7.7006988525390625 , sf_layer_out:  0.00023501385090301407
bw_layer_out:  16
th_layer_out:  7.7006988525390625
--------------------------------------------------------------------------------
Processing layer 115 of 139
Layer Name:res5a_branch2b Type:Convolution
Inputs: ['res5a_branch2a'], Outputs: ['res5a_branch2b']
Quantizing conv input layer ... res5a_branch2b
Threshold in shape= ()
Q

n:  32768 , len(bin_edges):  449
Mean : th_layer_out:  6.708596706390381 , sf_layer_out:  0.0002047363721546184
bw_layer_out:  16
th_layer_out:  6.708596706390381
--------------------------------------------------------------------------------
Processing layer 133 of 139
Layer Name:res5c_branch2c Type:Convolution
Inputs: ['res5c_branch2b'], Outputs: ['res5c_branch2c']
Quantizing conv input layer ... res5c_branch2c
Threshold in shape= ()
Quantizing conv weights for layer res5c_branch2c...
Threshold params shape= (2048,)
n:  32768 , len(bin_edges):  898
Mean : th_layer_out:  24.105989456176758 , sf_layer_out:  0.0007356788676466187
Threshold out shape= ()
n:  32768 , len(bin_edges):  898
Mean : th_layer_out:  24.105989456176758 , sf_layer_out:  0.0007356788676466187
bw_layer_in:  16
th_layer_in:  6.708596706390381
bw_layer_out:  16
th_layer_out:  24.105989456176758
--------------------------------------------------------------------------------
Processing layer 134 of 139
Layer Name:res5

### 2. Try it yourself by changing the quantization precision

Now that you have had a chance to see how this works, it's time to get some hands on experience.  
Change the following from the example above:
1. Precision of quantization by adjusting `bitwidth`

Below, replace `value` with one of the supported precision types. [8,8,8] or [16,16,16]

In [None]:
# Since we already have an instance of the quantizer, you can just update these params:

quantizer.bitwidths = [8,8,8] 

# Invoke quantizer
try:
    quantizer.quantize()

    import json
    data = json.loads(open(config["quantizecfg"]).read())
    print("**********\nSuccessfully produced quantization JSON file for %d layers.\n"%len(data['network']))
except Exception as e:
    print("Failed to quantize:",e)

Mean : [104.007 116.669 122.679]
Adding ../xfdnn/tools/quantize/calibration_directory/13923040300_b4c8521b4d_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/15439525724_97d7cc2c81_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/16247716843_b419e8b111_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/4788821373_441cd29c9f_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/7291910830_86a8ebb15d_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/7647574936_ffebfa2bea_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/8289365270_82b20ef781_z.jpg to calibration batch.
Adding ../xfdnn/tools/quantize/calibration_directory/AdrianStoica_Rory_discdog.jpg to calibration batch.
--------------------------------------------------------------------------------
Processing layer 0 of 139
Layer Name:data Type:Input
Inpu

Mean : th_layer_out:  5.799498172024447 , sf_layer_out:  0.04566533993720037
bw_layer_out:  8
th_layer_out:  5.799498172024447
--------------------------------------------------------------------------------
Processing layer 16 of 139
Layer Name:res2b_branch2b Type:Convolution
Inputs: ['res2b_branch2a'], Outputs: ['res2b_branch2b']
Quantizing conv input layer ... res2b_branch2b
Threshold in shape= ()
Quantizing conv weights for layer res2b_branch2b...
Threshold params shape= (64,)
n:  128 , len(bin_edges):  1269
Mean : th_layer_out:  7.027280066667672 , sf_layer_out:  0.05533291391076907
Threshold out shape= ()
n:  128 , len(bin_edges):  1269
Mean : th_layer_out:  7.027280066667672 , sf_layer_out:  0.05533291391076907
bw_layer_in:  8
th_layer_in:  5.799498172024447
bw_layer_out:  8
th_layer_out:  7.027280066667672
--------------------------------------------------------------------------------
Processing layer 17 of 139
Layer Name:res2b_branch2b_relu Type:ReLU
Inputs: ['res2b_branch2b'

Mean : th_layer_out:  6.247526285241679 , sf_layer_out:  0.049193120356233695
bw_layer_out:  8
th_layer_out:  6.247526285241679
--------------------------------------------------------------------------------
Processing layer 33 of 139
Layer Name:res3a_branch2b Type:Convolution
Inputs: ['res3a_branch2a'], Outputs: ['res3a_branch2b']
Quantizing conv input layer ... res3a_branch2b
Threshold in shape= ()
Quantizing conv weights for layer res3a_branch2b...
Threshold params shape= (128,)
n:  128 , len(bin_edges):  898
Mean : th_layer_out:  6.681866611259571 , sf_layer_out:  0.05261312292330371
Threshold out shape= ()
n:  128 , len(bin_edges):  898
Mean : th_layer_out:  6.681866611259571 , sf_layer_out:  0.05261312292330371
bw_layer_in:  8
th_layer_in:  6.247526285241679
bw_layer_out:  8
th_layer_out:  6.681866611259571
--------------------------------------------------------------------------------
Processing layer 34 of 139
Layer Name:res3a_branch2b_relu Type:ReLU
Inputs: ['res3a_branch2b'

Mean : th_layer_out:  7.601435224348087 , sf_layer_out:  0.05985382066415817
bw_layer_in:  8
th_layer_in:  4.082589310194765
bw_layer_out:  8
th_layer_out:  7.601435224348087
--------------------------------------------------------------------------------
Processing layer 50 of 139
Layer Name:res3c_branch2b_relu Type:ReLU
Inputs: ['res3c_branch2b'], Outputs: ['res3c_branch2b']
n:  128 , len(bin_edges):  897
Mean : th_layer_out:  5.853347948619298 , sf_layer_out:  0.0460893539261362
bw_layer_out:  8
th_layer_out:  5.853347948619298
--------------------------------------------------------------------------------
Processing layer 51 of 139
Layer Name:res3c_branch2c Type:Convolution
Inputs: ['res3c_branch2b'], Outputs: ['res3c_branch2c']
Quantizing conv input layer ... res3c_branch2c
Threshold in shape= ()
Quantizing conv weights for layer res3c_branch2c...
Threshold params shape= (512,)
n:  128 , len(bin_edges):  1793
Mean : th_layer_out:  10.271786723818098 , sf_layer_out:  0.08088021042

n:  128 , len(bin_edges):  635
Mean : th_layer_out:  6.654018668346224 , sf_layer_out:  0.052393847782253734
bw_layer_in:  8
th_layer_in:  5.611795273489005
bw_layer_out:  8
th_layer_out:  6.654018668346224
--------------------------------------------------------------------------------
Processing layer 67 of 139
Layer Name:res4a_branch2b_relu Type:ReLU
Inputs: ['res4a_branch2b'], Outputs: ['res4a_branch2b']
n:  128 , len(bin_edges):  635
Mean : th_layer_out:  4.844061912798355 , sf_layer_out:  0.03814221978581382
bw_layer_out:  8
th_layer_out:  4.844061912798355
--------------------------------------------------------------------------------
Processing layer 68 of 139
Layer Name:res4a_branch2c Type:Convolution
Inputs: ['res4a_branch2b'], Outputs: ['res4a_branch2c']
Quantizing conv input layer ... res4a_branch2c
Threshold in shape= ()
Quantizing conv weights for layer res4a_branch2c...
Threshold params shape= (1024,)
n:  128 , len(bin_edges):  1269
Mean : th_layer_out:  6.1293173738834

Well done! That concludes the Part 2. Now you are ready to put parts 1 and 2 together and deploy a network/model. 

## [Part 3: Putting it all together: Compile, Quantize and Deploy][]

[Part 3: Putting it all together: Compile, Quantize and Deploy]: image_classification_caffe.ipynb