Refactor Quantization Aware Training of TF backend (#250)

Signed-off-by: zehao-intel <zehao.huang@intel.com>
intel · Dec 9, 2022 · 1deb7d2 · 1deb7d2
1 parent c61be34
commit 1deb7d2
Show file tree

Hide file tree

Showing 36 changed files with 2,064 additions and 300 deletions.
diff --git a/.azure-pipelines/scripts/codeScan/pyspelling/inc_dict.txt b/.azure-pipelines/scripts/codeScan/pyspelling/inc_dict.txt
@@ -2436,6 +2436,9 @@ npmjs
 AWSSageMakerSupport
 sagemaker
 xpu
+dgpu
+BenchmarkConfig
+QuantizationAwareTrainingConfig
 Startup
 doesn
 startup
diff --git a/examples/.config/model_params_tensorflow.json b/examples/.config/model_params_tensorflow.json
@@ -1944,6 +1944,13 @@
       "batch_size": 1,
       "new_benchmark": false
     },
+    "mnist_keras": {
+      "model_src_dir": "image_recognition/keras_models/mnist/quantization/qat",
+      "dataset_location": "",
+      "input_model": "/tf_dataset2/models/tensorflow/mnist_keras/saved_model/",
+      "main_script": "main.py",
+      "batch_size": 32
+    },
     "resnet50_fashion": {
       "model_src_dir": "image_recognition/keras_models/resnet50_fashion/quantization/ptq",
       "dataset_location": "/tf_dataset2/datasets/mnist/FashionMNIST",
@@ -1962,6 +1969,13 @@
       "batch_size": 1,
       "new_benchmark": true
     },
+    "resnet50_keras_qat": {
+      "model_src_dir": "image_recognition/keras_models/resnet50/quantization/qat",
+      "dataset_location": "/tf_dataset/dataset/imagenet",
+      "input_model": "/tf_dataset2/models/tensorflow/resnet50_keras/resnet50",
+      "main_script": "main.py",
+      "batch_size": 32
+    },
     "resnet50_keras_h5": {
       "model_src_dir": "image_recognition/keras_models/resnet50/quantization/ptq",
       "dataset_location": "/tf_dataset/dataset/imagenet",

diff --git a/...ples/tensorflow/image_recognition/keras_models/mnist/quantization/qat/README.md b/...ples/tensorflow/image_recognition/keras_models/mnist/quantization/qat/README.md
@@ -1,56 +1,118 @@
 Step-by-Step
 ============
 
-This document is used to list steps of reproducing TensorFlow keras Intel® Neural Compressor QAT conversion.
+This document is used to apply QAT to Tensorflow Keras models using Intel® Neural Compressor.
 This example can run on Intel CPUs and GPUs.
 
-
 ## Prerequisite
 
 ### 1. Installation
 ```shell
 # Install Intel® Neural Compressor
 pip install neural-compressor
 ```
-### 2. Install Intel Tensorflow and TensorFlow Model Optimization
+### 2. Install requirements
+The Tensorflow and intel-extension-for-tensorflow is mandatory to be installed to run this QAT example.
+The Intel Extension for Tensorflow for Intel CPUs is installed as default.
 ```shell
-pip install intel-tensorflow==2.4.0
-pip install tensorflow_model_optimization==0.5.0
+pip install -r requirements.txt
 ```
-> Note: To generate correct qat model with tensorflow_model_optimization 0.5.0, pls use TensorFlow 2.4 or above.
+> Note: Supported Tensorflow [Version](../../../../../../../README.md).
 
-### 3. Install Intel Extension for Tensorflow
+### 3. Benchmarking the model on Intel GPU (Optional)
 
-#### Quantizing the model on Intel GPU
-Intel Extension for Tensorflow is mandatory to be installed for quantizing the model on Intel GPUs.
+To run benchmark of the model on Intel GPUs, Intel Extension for Tensorflow for Intel GPUs is required.
 
 ```shell
 pip install --upgrade intel-extension-for-tensorflow[gpu]
 ```
-For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
-
-#### Quantizing the model on Intel CPU(Experimental)
-Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for quantizing the model on Intel CPUs.
 
-```shell
-pip install --upgrade intel-extension-for-tensorflow[cpu]
-```
+Please refer to the [Installation Guides](https://dgpu-docs.intel.com/installation-guides/ubuntu/ubuntu-focal-dc.html) for latest Intel GPU driver installation.
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers).
 
 ### 4. Prepare Pretrained model
 
-Run the `train.py` script to get pretrained fp32 model.
+The pretrained model is provided by [Keras Applications](https://keras.io/api/applications/). prepare the model, Run as follow: 
+ ```
 
-### 5. Prepare QAT model
-
-Run the `qat.py` script to get QAT model which in fact is a fp32 model with quant/dequant pair inserted.
-
-## Write Yaml config file
-In examples directory, there is a mnist.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The mnist_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
+python prepare_model.py --output_model=/path/to/model
+ ```
+`--output_model ` the model should be saved as SavedModel format or H5 format.
 
 ## Run Command
   ```shell
-  python convert.py     # to convert QAT model to quantized model.
-
-  python benchmark.py   # to run accuracy benchmark.
+  bash run_tuning.sh --input_model=./path/to/model --output_model=./result 
+  bash run_benchmark.sh --input_model=./path/to/model --mode=performance --batch_size=32
   ```
 
+Details of enabling Intel® Neural Compressor to apply QAT.
+=========================
+
+This is a tutorial of how to to apply QAT with Intel® Neural Compressor.
+## User Code Analysis
+1. User specifies fp32 *model* to apply quantization, the dataset is automatically downloaded. In this step, QDQ patterns will be inserted to the keras model, but the fp32 model will not be converted to a int8 model.
+
+2. User specifies *model* with QDQ patterns inserted, evaluate function to run benchmark. The model we get from the previous step will be run on ITEX backend. Then, the model is going to be fused and inferred.
+
+### Quantization Config
+The Quantization Config class has default parameters setting for running on Intel CPUs. If running this example on Intel GPUs, the 'backend' parameter should be set to 'itex' and the 'device' parameter should be set to 'gpu'.
+
+```
+config = QuantizationAwareTrainingConfig(
+    device="gpu",
+    backend="itex",
+    ...
+    )
+```
+
+### Code update
+
+After prepare step is done, we add quantization and benchmark code to generate quantized model and benchmark.
+
+#### Tune
+```python
+    logger.info('start quantizing the model...')
+    from neural_compressor import training, QuantizationAwareTrainingConfig
+    config = QuantizationAwareTrainingConfig()
+    # create a compression_manager instance to implement QAT
+    compression_manager = training.prepare_compression(FLAGS.input_model, config)
+    # QDQ patterns will be inserted to the input keras model
+    compression_manager.callbacks.on_train_begin()
+    # get the model with QDQ patterns inserted
+    q_aware_model = compression_manager.model.model
+
+    # training code defined by users
+    q_aware_model.compile(optimizer='adam',
+                loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
+                metrics=['accuracy'])
+    q_aware_model.summary()
+    train_images_subset = train_images[0:1000]
+    train_labels_subset = train_labels[0:1000]
+    q_aware_model.fit(train_images_subset, train_labels_subset,
+                    batch_size=500, epochs=1, validation_split=0.1)
+    _, q_aware_model_accuracy = q_aware_model.evaluate(
+                                    test_images, test_labels, verbose=0)
+    print('Quant test accuracy:', q_aware_model_accuracy)
+
+    # apply some post process steps and save the output model
+    compression_manager.callbacks.on_train_end()
+    compression_manager.save(FLAGS.output_model)
+```
+#### Benchmark
+```python
+    from neural_compressor.benchmark import fit
+    from neural_compressor.experimental import common
+    from neural_compressor.config import BenchmarkConfig
+    assert FLAGS.mode == 'performance' or FLAGS.mode == 'accuracy', \
+    "Benchmark only supports performance or accuracy mode."
+
+    # convert the quantized keras model to graph_def so that it can be fused by ITEX
+    model = common.Model(FLAGS.input_model).graph_def
+    if FLAGS.mode == 'performance':
+        conf = BenchmarkConfig(cores_per_instance=4, num_of_instance=7)
+        fit(model, conf, b_func=evaluate)
+    elif FLAGS.mode == 'accuracy':
+        accuracy = evaluate(model)
+        print('Batch size = %d' % FLAGS.batch_size)
+        print("Accuracy: %.5f" % accuracy)
+```
diff --git a/examples/tensorflow/image_recognition/keras_models/mnist/quantization/qat/benchmark.py b/examples/tensorflow/image_recognition/keras_models/mnist/quantization/qat/benchmark.py
diff --git a/examples/tensorflow/image_recognition/keras_models/mnist/quantization/qat/convert.py b/examples/tensorflow/image_recognition/keras_models/mnist/quantization/qat/convert.py