diff --git a/README.md b/README.md
index f734b177dcb..e4c2f798579 100755
--- a/README.md
+++ b/README.md
@@ -17,8 +17,6 @@ Intel® Neural Compressor
 Intel® Neural Compressor, formerly known as Intel® Low Precision Optimization Tool, is an open-source Python library that runs on Intel CPUs and GPUs, which delivers unified interfaces across multiple deep-learning frameworks for popular network compression technologies such as quantization, pruning, and knowledge distillation. This tool supports automatic accuracy-driven tuning strategies to help the user quickly find out the best quantized model. It also implements different weight-pruning algorithms to generate a pruned model with predefined sparsity goal. It also supports knowledge distillation to distill the knowledge from the teacher model to the student model. 
 Intel® Neural Compressor is a critical AI software component in the [Intel® oneAPI AI Analytics Toolkit](https://software.intel.com/content/www/us/en/develop/tools/oneapi/ai-analytics-toolkit.html).
 
-> **Note:**
-> GPU support is under development.
 
 **Visit the Intel® Neural Compressor online document website at: <https://intel.github.io/neural-compressor>.**   
 
@@ -107,6 +105,10 @@ Intel® Neural Compressor supports systems based on [Intel 64 architecture or co
 * Intel Xeon Scalable processor (formerly Skylake, Cascade Lake, Cooper Lake, and Icelake)
 * Future Intel Xeon Scalable processor (code name Sapphire Rapids)
 
+Intel® Neural Compressor supports the following Intel GPUs built on Intel's Xe architecture:
+
+* [Intel® Data Center GPU Flex Series](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/data-center-gpu/flex-series/overview.html)
+
 ### Validated Software Environment
 
 * OS version: CentOS 8.4, Ubuntu 20.04  
diff --git a/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/README.md b/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/README.md
index 6b623a13c62..8be431fe876 100644
--- a/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/README.md
+++ b/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/README.md
@@ -2,6 +2,7 @@ Step-by-Step
 ============
 
 This document is used to enable Tensorflow SavedModel format using Intel® Neural Compressor for performance only.
+This example can run on Intel CPUs and GPUs.
 
 
 ## Prerequisite
@@ -17,7 +18,23 @@ pip install intel-tensorflow
 ```
 > Note: Supported Tensorflow >= 2.4.0.
 
-### 3. Prepare Pretrained model
+### 3. Install Intel Extension for Tensorflow if needed
+#### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+#### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+
+### 4. Prepare Pretrained model
 Download the model from tensorflow-hub.
 
 image recognition
@@ -25,6 +42,8 @@ image recognition
 - [mobilenetv2](https://hub.tensorflow.google.cn/google/imagenet/mobilenet_v2_035_224/classification/5)
 - [efficientnet_v2_b0](https://hub.tensorflow.google.cn/google/imagenet/efficientnet_v2_imagenet1k_b0/classification/2)
 
+## Write Yaml config file
+In examples directory, there are mobilenet_v1.yaml, mobilenet_v2.yaml and efficientnet_v2_b0.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The mobilenet_v1_itex.yaml, mobilenet_v2_itex.yaml and efficientnet_v2_b0_itex.yaml are prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
 
 ## Run Command
   ```shell
diff --git a/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/efficientnet_v2_b0.yaml b/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/efficientnet_v2_b0.yaml
index c4dcbaf0272..bce410a1004 100644
--- a/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/efficientnet_v2_b0.yaml
+++ b/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/efficientnet_v2_b0.yaml
@@ -19,6 +19,8 @@ model:                                               # mandatory. neural_compres
   name: efficientnet_v2_b0
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 5, 10, 50, 100                    # optional. default value is the size of whole dataset. used to set how many portions of calibration dataset is used. exclusive with iterations field.
diff --git a/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/efficientnet_v2_b0_itex.yaml b/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/efficientnet_v2_b0_itex.yaml
new file mode 100644
index 00000000000..05aa4502599
--- /dev/null
+++ b/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/efficientnet_v2_b0_itex.yaml
@@ -0,0 +1,89 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+version: 1.0
+
+model:                                               # mandatory. neural_compressor uses this model name and framework name to decide where to save tuning history and deploy yaml.
+  name: efficientnet_v2_b0
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 5, 10, 50, 100                    # optional. default value is the size of whole dataset. used to set how many portions of calibration dataset is used. exclusive with iterations field.
+    dataloader:
+      dataset:
+        ImagenetRaw:
+          data_path: /path/to/calibration/dataset     # NOTE: modify to calibration dataset location if needed
+          image_list: /path/to/calibration/label      # data file, record image_names and their labels
+      transform:
+        PaddedCenterCrop:
+          size: 224
+          crop_padding: 32
+        Resize:
+          size: 224
+          interpolation: bicubic
+        Normalize:
+          mean: [123.675, 116.28, 103.53]
+          std: [58.395, 57.12, 57.375]
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImagenetRaw:
+          data_path: /path/to/evaluation/dataset     # NOTE: modify to evaluation dataset location if needed
+          image_list: /path/to/evaluation/label      # data file, record image_names and their labels
+      transform:
+        PaddedCenterCrop:
+          size: 224
+          crop_padding: 32
+        Resize:
+          size: 224
+          interpolation: bicubic
+        Normalize:
+          mean: [123.675, 116.28, 103.53]
+          std: [58.395, 57.12, 57.375]
+  performance:                                       # optional. used to benchmark performance of passing model.
+    iteration: 100
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1
+      dataset:
+        ImagenetRaw:
+          data_path: /path/to/evaluation/dataset     # NOTE: modify to evaluation dataset location if needed
+          image_list: /path/to/evaluation/label      # data file, record image_names and their labels
+      transform:
+        PaddedCenterCrop:
+          size: 224
+          crop_padding: 32
+        Resize:
+          size: 224
+          interpolation: bicubic
+        Normalize:
+          mean: [123.675, 116.28, 103.53]
+          std: [58.395, 57.12, 57.375]
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/mobilenet_v1.yaml b/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/mobilenet_v1.yaml
index 9eb2c3782e9..72e7ff3ddf9 100644
--- a/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/mobilenet_v1.yaml
+++ b/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/mobilenet_v1.yaml
@@ -17,6 +17,8 @@ model:                                               # mandatory. used to specif
   name: mobilenet_v1
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 20, 50                            # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/mobilenet_v1_itex.yaml b/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/mobilenet_v1_itex.yaml
new file mode 100644
index 00000000000..29b4dcde606
--- /dev/null
+++ b/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/mobilenet_v1_itex.yaml
@@ -0,0 +1,73 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: mobilenet_v1
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 20, 50                            # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+    weight:
+      granularity: per_channel
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+  performance:                                       # optional. used to benchmark performance of passing model.
+    iteration: 100
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet:
+          height: 224
+          width: 224
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/mobilenet_v2.yaml b/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/mobilenet_v2.yaml
index b8d9b7bfd87..0b0c25b458b 100644
--- a/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/mobilenet_v2.yaml
+++ b/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/mobilenet_v2.yaml
@@ -17,6 +17,8 @@ model:                                               # mandatory. used to specif
   name: mobilenet_v2
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 20, 50                            # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/mobilenet_v2_itex.yaml b/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/mobilenet_v2_itex.yaml
new file mode 100644
index 00000000000..0ae6a06a429
--- /dev/null
+++ b/examples/tensorflow/image_recognition/SavedModel/quantization/ptq/mobilenet_v2_itex.yaml
@@ -0,0 +1,82 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: mobilenet_v2
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 20, 50                            # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+    weight:
+      granularity: per_channel
+
+  op_wise: {
+             'MobilenetV2/expanded_conv/depthwise/depthwise': {
+               'activation':  {'dtype': ['fp32']},
+             },
+             'MobilenetV2/Conv_1/Conv2D': {
+               'activation':  {'dtype': ['fp32']},
+             }
+           }
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+  performance:                                       # optional. used to benchmark performance of passing model.
+    iteration: 100
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet:
+          height: 224
+          width: 224
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/keras_models/inception_resnet_v2/quantization/ptq/README.md b/examples/tensorflow/image_recognition/keras_models/inception_resnet_v2/quantization/ptq/README.md
index 83cdebf94bd..6be94f7d20b 100644
--- a/examples/tensorflow/image_recognition/keras_models/inception_resnet_v2/quantization/ptq/README.md
+++ b/examples/tensorflow/image_recognition/keras_models/inception_resnet_v2/quantization/ptq/README.md
@@ -2,6 +2,7 @@ Step-by-Step
 ============
 
 This document is used to enable Tensorflow Keras models using Intel® Neural Compressor.
+This example can run on Intel CPUs and GPUs.
 
 
 ## Prerequisite
@@ -17,7 +18,24 @@ pip install intel-tensorflow
 ```
 > Note: Supported Tensorflow [Version](../../../../../../../README.md).
 
-### 3. Prepare Pretrained model
+### 3. Install Intel Extension for Tensorflow if needed
+
+#### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+#### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+
+### 4. Prepare Pretrained model
 
 The pretrained model is provided by [Keras Applications](https://keras.io/api/applications/). prepare the model, Run as follow: 
  ```
@@ -25,6 +43,9 @@ python prepare_model.py   --output_model=/path/to/model
  ```
 `--output_model ` the model should be saved as SavedModel format or H5 format.
 
+## Write Yaml config file
+In examples directory, there is a inception_resnet_v2.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The inception_resnet_v2_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
+
 ## Run Command
   ```shell
   bash run_tuning.sh --config=inception_resnet_v2.yaml --input_model=./path/to/model --output_model=./result --eval_data=/path/to/evaluation/dataset --calib_data=/path/to/calibration/dataset
diff --git a/examples/tensorflow/image_recognition/keras_models/inception_resnet_v2/quantization/ptq/inception_resnet_v2.yaml b/examples/tensorflow/image_recognition/keras_models/inception_resnet_v2/quantization/ptq/inception_resnet_v2.yaml
index 91da8e41f96..0fb00e3407a 100644
--- a/examples/tensorflow/image_recognition/keras_models/inception_resnet_v2/quantization/ptq/inception_resnet_v2.yaml
+++ b/examples/tensorflow/image_recognition/keras_models/inception_resnet_v2/quantization/ptq/inception_resnet_v2.yaml
@@ -19,6 +19,8 @@ model:                                               # mandatory. used to specif
   name: inception_resnet_v2
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/keras_models/inception_resnet_v2/quantization/ptq/inception_resnet_v2_itex.yaml b/examples/tensorflow/image_recognition/keras_models/inception_resnet_v2/quantization/ptq/inception_resnet_v2_itex.yaml
new file mode 100644
index 00000000000..85028bffccd
--- /dev/null
+++ b/examples/tensorflow/image_recognition/keras_models/inception_resnet_v2/quantization/ptq/inception_resnet_v2_itex.yaml
@@ -0,0 +1,44 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+version: 1.0
+
+model:                                               # mandatory. used to specify model specific information.
+  name: inception_resnet_v2
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:
+  performance:                                       # optional. used to benchmark performance of passing model.
+    iteration: 100
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/keras_models/inception_v3/quantization/ptq/README.md b/examples/tensorflow/image_recognition/keras_models/inception_v3/quantization/ptq/README.md
index 238a2dd756f..6f6803fa35a 100644
--- a/examples/tensorflow/image_recognition/keras_models/inception_v3/quantization/ptq/README.md
+++ b/examples/tensorflow/image_recognition/keras_models/inception_v3/quantization/ptq/README.md
@@ -2,6 +2,7 @@ Step-by-Step
 ============
 
 This document is used to enable Tensorflow Keras models using Intel® Neural Compressor.
+This example can run on Intel CPUs and GPUs.
 
 
 ## Prerequisite
@@ -17,7 +18,24 @@ pip install intel-tensorflow
 ```
 > Note: Supported Tensorflow [Version](../../../../../../../README.md).
 
-### 3. Prepare Pretrained model
+### 3. Install Intel Extension for Tensorflow if needed
+
+#### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+#### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+
+### 4. Prepare Pretrained model
 
 The pretrained model is provided by [Keras Applications](https://keras.io/api/applications/). prepare the model, Run as follow: 
  ```
@@ -25,6 +43,9 @@ python prepare_model.py   --output_model=/path/to/model
  ```
 `--output_model ` the model should be saved as SavedModel format or H5 format.
 
+## Write Yaml config file
+In examples directory, there is a inception_v3.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The inception_v3_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
+
 ## Run Command
   ```shell
   bash run_tuning.sh --config=inception_v3.yaml --input_model=./path/to/model --output_model=./result --eval_data=/path/to/evaluation/dataset --calib_data=/path/to/calibration/dataset
diff --git a/examples/tensorflow/image_recognition/keras_models/inception_v3/quantization/ptq/inception_v3.yaml b/examples/tensorflow/image_recognition/keras_models/inception_v3/quantization/ptq/inception_v3.yaml
index 60a9857147e..d7cc15e64dc 100644
--- a/examples/tensorflow/image_recognition/keras_models/inception_v3/quantization/ptq/inception_v3.yaml
+++ b/examples/tensorflow/image_recognition/keras_models/inception_v3/quantization/ptq/inception_v3.yaml
@@ -19,6 +19,8 @@ model:                                               # mandatory. used to specif
   name: inception_v3
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/keras_models/inception_v3/quantization/ptq/inception_v3_itex.yaml b/examples/tensorflow/image_recognition/keras_models/inception_v3/quantization/ptq/inception_v3_itex.yaml
new file mode 100644
index 00000000000..6156c1bcd5e
--- /dev/null
+++ b/examples/tensorflow/image_recognition/keras_models/inception_v3/quantization/ptq/inception_v3_itex.yaml
@@ -0,0 +1,49 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+version: 1.0
+
+model:                                               # mandatory. used to specify model specific information.
+  name: inception_v3
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+  op_wise: {
+             'v0/cg/conv0/conv2d/Conv2D': {
+               'activation':  {'dtype': ['fp32']},
+             }
+           }
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:
+  performance:                                       # optional. used to benchmark performance of passing model.
+    iteration: 100
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/keras_models/mnist/quantization/qat/README.md b/examples/tensorflow/image_recognition/keras_models/mnist/quantization/qat/README.md
index 80001e08ee2..0183244fd16 100644
--- a/examples/tensorflow/image_recognition/keras_models/mnist/quantization/qat/README.md
+++ b/examples/tensorflow/image_recognition/keras_models/mnist/quantization/qat/README.md
@@ -2,6 +2,7 @@ Step-by-Step
 ============
 
 This document is used to list steps of reproducing TensorFlow keras Intel® Neural Compressor QAT conversion.
+This example can run on Intel CPUs and GPUs.
 
 
 ## Prerequisite
@@ -18,14 +19,34 @@ pip install tensorflow_model_optimization==0.5.0
 ```
 > Note: To generate correct qat model with tensorflow_model_optimization 0.5.0, pls use TensorFlow 2.4 or above.
 
-### 3. Prepare Pretrained model
+### 3. Install Intel Extension for Tensorflow if needed
+
+#### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+#### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+
+### 4. Prepare Pretrained model
 
 Run the `train.py` script to get pretrained fp32 model.
 
-### 4. Prepare QAT model
+### 5. Prepare QAT model
 
 Run the `qat.py` script to get QAT model which in fact is a fp32 model with quant/dequant pair inserted.
 
+## Write Yaml config file
+In examples directory, there is a mnist.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The mnist_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
+
 ## Run Command
   ```shell
   python convert.py     # to convert QAT model to quantized model.
diff --git a/examples/tensorflow/image_recognition/keras_models/mnist/quantization/qat/mnist.yaml b/examples/tensorflow/image_recognition/keras_models/mnist/quantization/qat/mnist.yaml
index e27b54a0044..30c89e41ce6 100644
--- a/examples/tensorflow/image_recognition/keras_models/mnist/quantization/qat/mnist.yaml
+++ b/examples/tensorflow/image_recognition/keras_models/mnist/quantization/qat/mnist.yaml
@@ -17,6 +17,8 @@ model:                                               # mandatory. used to specif
   name: mnist
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
   accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
     metric:
diff --git a/examples/tensorflow/image_recognition/keras_models/mnist/quantization/qat/mnist_itex.yaml b/examples/tensorflow/image_recognition/keras_models/mnist/quantization/qat/mnist_itex.yaml
new file mode 100644
index 00000000000..5681e991f3a
--- /dev/null
+++ b/examples/tensorflow/image_recognition/keras_models/mnist/quantization/qat/mnist_itex.yaml
@@ -0,0 +1,26 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: mnist
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      Accuracy: {}                                   # built-in metrics are topk, map, f1, allow user to register new metric.
+
diff --git a/examples/tensorflow/image_recognition/keras_models/mobilenet_v2/quantization/ptq/README.md b/examples/tensorflow/image_recognition/keras_models/mobilenet_v2/quantization/ptq/README.md
index 5725f5e941c..ad338937a28 100644
--- a/examples/tensorflow/image_recognition/keras_models/mobilenet_v2/quantization/ptq/README.md
+++ b/examples/tensorflow/image_recognition/keras_models/mobilenet_v2/quantization/ptq/README.md
@@ -2,6 +2,7 @@ Step-by-Step
 ============
 
 This document is used to enable Tensorflow Keras models using Intel® Neural Compressor.
+This example can run on Intel CPUs and GPUs.
 
 
 ## Prerequisite
@@ -17,7 +18,22 @@ pip install intel-tensorflow
 ```
 > Note: Supported Tensorflow [Version](../../../../../../../README.md).
 
-### 3. Prepare Pretrained model
+### 3. Install Intel Extension for Tensorflow if needed
+#### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+#### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+### 4. Prepare Pretrained model
 
 The pretrained model is provided by [Keras Applications](https://keras.io/api/applications/). prepare the model, Run as follow: 
  ```
@@ -25,6 +41,9 @@ python prepare_model.py   --output_model=/path/to/model
  ```
 `--output_model ` the model should be saved as SavedModel format or H5 format.
 
+## Write Yaml config file
+In examples directory, there is a mobilenet_v2.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The mobilenet_v2_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
+
 ## Run Command
   ```shell
   bash run_tuning.sh --config=mobilenet_v2.yaml --input_model=./path/to/model --output_model=./result --eval_data=/path/to/evaluation/dataset --calib_data=/path/to/calibration/dataset
diff --git a/examples/tensorflow/image_recognition/keras_models/mobilenet_v2/quantization/ptq/mobilenet_v2.yaml b/examples/tensorflow/image_recognition/keras_models/mobilenet_v2/quantization/ptq/mobilenet_v2.yaml
index 45961aceb1c..c96be536aa7 100644
--- a/examples/tensorflow/image_recognition/keras_models/mobilenet_v2/quantization/ptq/mobilenet_v2.yaml
+++ b/examples/tensorflow/image_recognition/keras_models/mobilenet_v2/quantization/ptq/mobilenet_v2.yaml
@@ -19,6 +19,8 @@ model:                                               # mandatory. used to specif
   name: mobilenet_v2
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 20, 50                            # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/keras_models/mobilenet_v2/quantization/ptq/mobilenet_v2_itex.yaml b/examples/tensorflow/image_recognition/keras_models/mobilenet_v2/quantization/ptq/mobilenet_v2_itex.yaml
new file mode 100644
index 00000000000..818a2944e30
--- /dev/null
+++ b/examples/tensorflow/image_recognition/keras_models/mobilenet_v2/quantization/ptq/mobilenet_v2_itex.yaml
@@ -0,0 +1,55 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+version: 1.0
+
+model:                                               # mandatory. used to specify model specific information.
+  name: mobilenet_v2
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 20, 50                            # optional. default value is 100. used to set how many samples should be used in calibration.
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+    weight:
+      granularity: per_channel
+
+  op_wise: {
+             'MobilenetV2/expanded_conv/depthwise/depthwise': {
+               'activation':  {'dtype': ['fp32']},
+             },
+             'MobilenetV2/Conv_1/Conv2D': {
+               'activation':  {'dtype': ['fp32']},
+             }
+           }
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:
+  performance:                                       # optional. used to benchmark performance of passing model.
+    iteration: 100
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/keras_models/resnet101/quantization/ptq/README.md b/examples/tensorflow/image_recognition/keras_models/resnet101/quantization/ptq/README.md
index 2b4baba61d1..f736d68d598 100644
--- a/examples/tensorflow/image_recognition/keras_models/resnet101/quantization/ptq/README.md
+++ b/examples/tensorflow/image_recognition/keras_models/resnet101/quantization/ptq/README.md
@@ -2,6 +2,7 @@ Step-by-Step
 ============
 
 This document is used to enable Tensorflow Keras models using Intel® Neural Compressor.
+This example can run on Intel CPUs and GPUs.
 
 
 ## Prerequisite
@@ -16,8 +17,23 @@ pip install neural-compressor
 pip install intel-tensorflow
 ```
 > Note: Supported Tensorflow [Version](../../../../../../../README.md).
+### 3. Install Intel Extension for Tensorflow if needed
+#### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
 
-### 3. Prepare Pretrained model
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+#### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+
+### 4. Prepare Pretrained model
 
 The pretrained model is provided by [Keras Applications](https://keras.io/api/applications/). prepare the model, Run as follow: 
  ```
@@ -25,6 +41,9 @@ The pretrained model is provided by [Keras Applications](https://keras.io/api/ap
  ```
 `--output_model ` the model should be saved as SavedModel format or H5 format.
 
+## Write Yaml config file
+In examples directory, there is a resnet101.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The resnet101_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
+
 ## Run Command
   ```shell
   bash run_tuning.sh --config=resnet101.yaml --input_model=./path/to/model --output_model=./result --eval_data=/path/to/evaluation/dataset --calib_data=/path/to/calibration/dataset
diff --git a/examples/tensorflow/image_recognition/keras_models/resnet101/quantization/ptq/resnet101.yaml b/examples/tensorflow/image_recognition/keras_models/resnet101/quantization/ptq/resnet101.yaml
index d8a765fd219..e97409b1b8f 100644
--- a/examples/tensorflow/image_recognition/keras_models/resnet101/quantization/ptq/resnet101.yaml
+++ b/examples/tensorflow/image_recognition/keras_models/resnet101/quantization/ptq/resnet101.yaml
@@ -19,6 +19,8 @@ model:                                               # mandatory. used to specif
   name: resnet_v1_101
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/keras_models/resnet101/quantization/ptq/resnet101_itex.yaml b/examples/tensorflow/image_recognition/keras_models/resnet101/quantization/ptq/resnet101_itex.yaml
new file mode 100644
index 00000000000..ff5447efe92
--- /dev/null
+++ b/examples/tensorflow/image_recognition/keras_models/resnet101/quantization/ptq/resnet101_itex.yaml
@@ -0,0 +1,44 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+version: 1.0
+
+model:                                               # mandatory. used to specify model specific information.
+  name: resnet_v1_101
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:
+  performance:                                       # optional. used to benchmark performance of passing model.
+    iteration: 100
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/keras_models/resnet50/quantization/ptq/README.md b/examples/tensorflow/image_recognition/keras_models/resnet50/quantization/ptq/README.md
index 9deab724472..139ed6b323b 100644
--- a/examples/tensorflow/image_recognition/keras_models/resnet50/quantization/ptq/README.md
+++ b/examples/tensorflow/image_recognition/keras_models/resnet50/quantization/ptq/README.md
@@ -2,6 +2,7 @@ Step-by-Step
 ============
 
 This document is used to enable Tensorflow Keras models using Intel® Neural Compressor.
+This example can run on Intel CPUs and GPUs.
 
 
 ## Prerequisite
@@ -17,7 +18,23 @@ pip install intel-tensorflow
 ```
 > Note: Supported Tensorflow [Version](../../../../../../../README.md).
 
-### 3. Prepare Pretrained model
+### 3. Install Intel Extension for Tensorflow if needed
+#### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+#### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+
+### 4. Prepare Pretrained model
 
 The pretrained model is provided by [Keras Applications](https://keras.io/api/applications/). prepare the model, Run as follow: 
  ```
@@ -25,6 +42,9 @@ The pretrained model is provided by [Keras Applications](https://keras.io/api/ap
  ```
 `--output_model ` the model should be saved as SavedModel format or H5 format.
 
+## Write Yaml config file
+In examples directory, there is a resnet50.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The resnet50_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
+
 ## Run Command
   ```shell
   bash run_tuning.sh --config=resnet50.yaml --input_model=./path/to/model --output_model=./result
diff --git a/examples/tensorflow/image_recognition/keras_models/resnet50/quantization/ptq/resnet50.yaml b/examples/tensorflow/image_recognition/keras_models/resnet50/quantization/ptq/resnet50.yaml
index 5b8ba356cd4..783cea3f3f8 100644
--- a/examples/tensorflow/image_recognition/keras_models/resnet50/quantization/ptq/resnet50.yaml
+++ b/examples/tensorflow/image_recognition/keras_models/resnet50/quantization/ptq/resnet50.yaml
@@ -19,6 +19,8 @@ model:                                               # mandatory. used to specif
   name: resnet50
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/keras_models/resnet50/quantization/ptq/resnet50_itex.yaml b/examples/tensorflow/image_recognition/keras_models/resnet50/quantization/ptq/resnet50_itex.yaml
new file mode 100644
index 00000000000..29744d16c86
--- /dev/null
+++ b/examples/tensorflow/image_recognition/keras_models/resnet50/quantization/ptq/resnet50_itex.yaml
@@ -0,0 +1,78 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+version: 1.0
+
+model:                                               # mandatory. used to specify model specific information.
+  name: resnet50
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+    postprocess:
+      transform:
+        LabelShift: 1
+  performance:                                       # optional. used to benchmark performance of passing model.
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/keras_models/resnet50_fashion/quantization/ptq/README.md b/examples/tensorflow/image_recognition/keras_models/resnet50_fashion/quantization/ptq/README.md
index 52644cb1b1c..9005c39cd5a 100644
--- a/examples/tensorflow/image_recognition/keras_models/resnet50_fashion/quantization/ptq/README.md
+++ b/examples/tensorflow/image_recognition/keras_models/resnet50_fashion/quantization/ptq/README.md
@@ -2,6 +2,7 @@ Step-by-Step
 ============
 
 This document is used to list steps of reproducing TensorFlow keras Intel® Neural Compressor tuning zoo result.
+This example can run on Intel CPUs and GPUs.
 
 
 ## Prerequisite
@@ -17,11 +18,27 @@ pip install intel-tensorflow
 ```
 > Note: Supported Tensorflow [Version](../../../../../../../README.md#supported-frameworks).
 
-### 3. Prepare Pretrained model
+### 3. Install Intel Extension for Tensorflow if needed
+#### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+#### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+
+### 4. Prepare Pretrained model
 
 Run the `resnet50_fashion_mnist_train.py` script located in `LowPrecisionInferenceTool/examples/tensorflow/keras`, and it will generate a saved model called `resnet50_fashion` at current path.
 
-### 4. Prepare dataset
+### 5. Prepare dataset
 
 If users download FashionMNIST dataset in advance, please set dataset part in yaml as follows:
 
@@ -33,10 +50,11 @@ dataset:
 
 Otherwise, if users do not download dataset in advance, please set the dataset root to any place which can be accessed and it will automatically download FashionMNIST dataset to the corresponding path.
 
+## Write Yaml config file
+In examples directory, there is a resnet50_fashion.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The resnet50_fashion_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
 
 ## Run Command
   ```shell
   bash run_tuning.sh --config=resnet50_fashion.yaml --input_model=./resnet50_fashion --output_model=./result
   bash run_benchmark.sh --config=resnet50_fashion.yaml --input_model=./resnet50_fashion --mode=performance
   ```
-
diff --git a/examples/tensorflow/image_recognition/keras_models/resnet50_fashion/quantization/ptq/resnet50_fashion.yaml b/examples/tensorflow/image_recognition/keras_models/resnet50_fashion/quantization/ptq/resnet50_fashion.yaml
index f8238eebf27..da172682c22 100644
--- a/examples/tensorflow/image_recognition/keras_models/resnet50_fashion/quantization/ptq/resnet50_fashion.yaml
+++ b/examples/tensorflow/image_recognition/keras_models/resnet50_fashion/quantization/ptq/resnet50_fashion.yaml
@@ -17,6 +17,8 @@ model:                                               # mandatory. used to specif
   name: resnet50_fashion
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is the size of whole dataset. used to set how many portions of calibration dataset is used. exclusive with iterations field.
diff --git a/examples/tensorflow/image_recognition/keras_models/resnet50_fashion/quantization/ptq/resnet50_fashion_itex.yaml b/examples/tensorflow/image_recognition/keras_models/resnet50_fashion/quantization/ptq/resnet50_fashion_itex.yaml
new file mode 100644
index 00000000000..e8c90ff093d
--- /dev/null
+++ b/examples/tensorflow/image_recognition/keras_models/resnet50_fashion/quantization/ptq/resnet50_fashion_itex.yaml
@@ -0,0 +1,64 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: resnet50_fashion
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is the size of whole dataset. used to set how many portions of calibration dataset is used. exclusive with iterations field.
+    dataloader:
+      dataset:
+        FashionMNIST:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        Rescale: {}
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      Accuracy: {}                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 1
+      dataset:
+        FashionMNIST:
+          root: /path/to/evaluation/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        Rescale: {}
+  performance:                                       # optional. used to benchmark performance of passing model.
+    iteration: 100
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1
+      dataset:
+        FashionMNIST:
+          root: /path/to/evaluation/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        Rescale: {}
+ 
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/keras_models/resnetv2_101/quantization/ptq/README.md b/examples/tensorflow/image_recognition/keras_models/resnetv2_101/quantization/ptq/README.md
index ce8e1ee6a54..367cda7f3b3 100644
--- a/examples/tensorflow/image_recognition/keras_models/resnetv2_101/quantization/ptq/README.md
+++ b/examples/tensorflow/image_recognition/keras_models/resnetv2_101/quantization/ptq/README.md
@@ -2,6 +2,7 @@ Step-by-Step
 ============
 
 This document is used to enable Tensorflow Keras models using Intel® Neural Compressor.
+This example can run on Intel CPUs and GPUs.
 
 
 ## Prerequisite
@@ -17,7 +18,23 @@ pip install intel-tensorflow
 ```
 > Note: Supported Tensorflow [Version](../../../../../../../README.md).
 
-### 3. Prepare Pretrained model
+### 3. Install Intel Extension for Tensorflow if needed
+#### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+#### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+
+### 4. Prepare Pretrained model
 
 The pretrained model is provided by [Keras Applications](https://keras.io/api/applications/). prepare the model, Run as follow: 
  ```
@@ -25,6 +42,9 @@ python prepare_model.py   --output_model=/path/to/model
  ```
 `--output_model ` the model should be saved as SavedModel format or H5 format.
 
+## Write Yaml config file
+In examples directory, there is a resnetv2_101.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The resnetv2_101_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
+
 ## Run Command
   ```shell
   bash run_tuning.sh --config=resnetv2_101.yaml --input_model=./path/to/model --output_model=./result --eval_data=/path/to/evaluation/dataset --calib_data=/path/to/calibration/dataset
diff --git a/examples/tensorflow/image_recognition/keras_models/resnetv2_101/quantization/ptq/resnetv2_101.yaml b/examples/tensorflow/image_recognition/keras_models/resnetv2_101/quantization/ptq/resnetv2_101.yaml
index 77797a682b8..f5b1165bf44 100644
--- a/examples/tensorflow/image_recognition/keras_models/resnetv2_101/quantization/ptq/resnetv2_101.yaml
+++ b/examples/tensorflow/image_recognition/keras_models/resnetv2_101/quantization/ptq/resnetv2_101.yaml
@@ -19,6 +19,8 @@ model:                                               # mandatory. used to specif
   name: resnetv2_101
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/keras_models/resnetv2_101/quantization/ptq/resnetv2_101_itex.yaml b/examples/tensorflow/image_recognition/keras_models/resnetv2_101/quantization/ptq/resnetv2_101_itex.yaml
new file mode 100644
index 00000000000..3c13302f138
--- /dev/null
+++ b/examples/tensorflow/image_recognition/keras_models/resnetv2_101/quantization/ptq/resnetv2_101_itex.yaml
@@ -0,0 +1,43 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+version: 1.0
+
+model:                                               # mandatory. used to specify model specific information.
+  name: resnetv2_101
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  performance:                                       # optional. used to benchmark performance of passing model.
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/keras_models/resnetv2_50/quantization/ptq/README.md b/examples/tensorflow/image_recognition/keras_models/resnetv2_50/quantization/ptq/README.md
index 835d05634da..76f08100ef0 100644
--- a/examples/tensorflow/image_recognition/keras_models/resnetv2_50/quantization/ptq/README.md
+++ b/examples/tensorflow/image_recognition/keras_models/resnetv2_50/quantization/ptq/README.md
@@ -2,6 +2,7 @@ Step-by-Step
 ============
 
 This document is used to enable Tensorflow Keras models using Intel® Neural Compressor.
+This example can run on Intel CPUs and GPUs.
 
 
 ## Prerequisite
@@ -17,7 +18,23 @@ pip install intel-tensorflow
 ```
 > Note: Supported Tensorflow [Version](../../../../../../../README.md).
 
-### 3. Prepare Pretrained model
+### 3. Install Intel Extension for Tensorflow if needed
+#### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+#### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+
+### 4. Prepare Pretrained model
 
 The pretrained model is provided by [Keras Applications](https://keras.io/api/applications/). prepare the model, Run as follow: 
  ```
@@ -25,6 +42,9 @@ python prepare_model.py   --output_model=/path/to/model
  ```
 `--output_model ` the model should be saved as SavedModel format or H5 format.
 
+## Write Yaml config file
+In examples directory, there is a resnetv2_50.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The resnetv2_50_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
+
 ## Run Command
   ```shell
   bash run_tuning.sh --config=resnetv2_50.yaml --input_model=./path/to/model --output_model=./result --eval_data=/path/to/evaluation/dataset --calib_data=/path/to/calibration/dataset
diff --git a/examples/tensorflow/image_recognition/keras_models/resnetv2_50/quantization/ptq/resnetv2_50.yaml b/examples/tensorflow/image_recognition/keras_models/resnetv2_50/quantization/ptq/resnetv2_50.yaml
index c1368f55ffc..e65e6efce3c 100644
--- a/examples/tensorflow/image_recognition/keras_models/resnetv2_50/quantization/ptq/resnetv2_50.yaml
+++ b/examples/tensorflow/image_recognition/keras_models/resnetv2_50/quantization/ptq/resnetv2_50.yaml
@@ -18,6 +18,8 @@ model:                                               # mandatory. used to specif
   name: resnetv2_50
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/keras_models/resnetv2_50/quantization/ptq/resnetv2_50_itex.yaml b/examples/tensorflow/image_recognition/keras_models/resnetv2_50/quantization/ptq/resnetv2_50_itex.yaml
new file mode 100644
index 00000000000..efbd1a2b95a
--- /dev/null
+++ b/examples/tensorflow/image_recognition/keras_models/resnetv2_50/quantization/ptq/resnetv2_50_itex.yaml
@@ -0,0 +1,42 @@
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+version: 1.0
+
+model:                                               # mandatory. used to specify model specific information.
+  name: resnetv2_50
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  performance:                                       # optional. used to benchmark performance of passing model.
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/keras_models/vgg16/quantization/ptq/README.md b/examples/tensorflow/image_recognition/keras_models/vgg16/quantization/ptq/README.md
index cd2efee10a8..1919c9d5716 100644
--- a/examples/tensorflow/image_recognition/keras_models/vgg16/quantization/ptq/README.md
+++ b/examples/tensorflow/image_recognition/keras_models/vgg16/quantization/ptq/README.md
@@ -2,6 +2,7 @@ Step-by-Step
 ============
 
 This document is used to enable Tensorflow Keras models using Intel® Neural Compressor.
+This example can run on Intel CPUs and GPUs.
 
 
 ## Prerequisite
@@ -17,7 +18,23 @@ pip install intel-tensorflow
 ```
 > Note: Supported Tensorflow [Version](../../../../../../../README.md#supported-frameworks).
 
-### 3. Prepare Pretrained model
+### 3. Install Intel Extension for Tensorflow if needed
+#### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+#### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+
+### 4. Prepare Pretrained model
 
 The pretrained model is provided by [Keras Applications](https://keras.io/api/applications/). prepare the model, Run as follow: 
  ```
@@ -25,6 +42,9 @@ The pretrained model is provided by [Keras Applications](https://keras.io/api/ap
  ```
 `--output_model ` the model should be saved as SavedModel format or H5 format.
 
+## Write Yaml config file
+In examples directory, there is a vgg16.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The vgg16_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
+
 ## Run Command
   ```shell
   bash run_tuning.sh --config=vgg16.yaml --input_model=./path/to/model --output_model=./result
diff --git a/examples/tensorflow/image_recognition/keras_models/vgg16/quantization/ptq/vgg16.yaml b/examples/tensorflow/image_recognition/keras_models/vgg16/quantization/ptq/vgg16.yaml
index aba238629d9..11395b8c1d6 100644
--- a/examples/tensorflow/image_recognition/keras_models/vgg16/quantization/ptq/vgg16.yaml
+++ b/examples/tensorflow/image_recognition/keras_models/vgg16/quantization/ptq/vgg16.yaml
@@ -19,6 +19,8 @@ model:                                               # mandatory. used to specif
   name: vgg_16
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/keras_models/vgg16/quantization/ptq/vgg16_itex.yaml b/examples/tensorflow/image_recognition/keras_models/vgg16/quantization/ptq/vgg16_itex.yaml
new file mode 100644
index 00000000000..a72a24b5380
--- /dev/null
+++ b/examples/tensorflow/image_recognition/keras_models/vgg16/quantization/ptq/vgg16_itex.yaml
@@ -0,0 +1,78 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+version: 1.0
+
+model:                                               # mandatory. used to specify model specific information.
+  name: vgg_16
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+    postprocess:
+      transform:
+        LabelShift: 1
+  performance:                                       # optional. used to benchmark performance of passing model.
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/keras_models/vgg19/quantization/ptq/README.md b/examples/tensorflow/image_recognition/keras_models/vgg19/quantization/ptq/README.md
index a30d399c6d6..a0352b95001 100644
--- a/examples/tensorflow/image_recognition/keras_models/vgg19/quantization/ptq/README.md
+++ b/examples/tensorflow/image_recognition/keras_models/vgg19/quantization/ptq/README.md
@@ -2,6 +2,7 @@ Step-by-Step
 ============
 
 This document is used to enable Tensorflow Keras models using Intel® Neural Compressor.
+This example can run on Intel CPUs and GPUs.
 
 
 ## Prerequisite
@@ -17,7 +18,23 @@ pip install intel-tensorflow
 ```
 > Note: Supported Tensorflow [Version](../../../../../../../README.md#supported-frameworks).
 
-### 3. Prepare Pretrained model
+### 3. Install Intel Extension for Tensorflow if needed
+#### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+#### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+
+### 4. Prepare Pretrained model
 
 The pretrained model is provided by [Keras Applications](https://keras.io/api/applications/). prepare the model, Run as follow: 
  ```
@@ -25,6 +42,9 @@ The pretrained model is provided by [Keras Applications](https://keras.io/api/ap
  ```
 `--output_model ` the model should be saved as SavedModel format or H5 format.
 
+## Write Yaml config file
+In examples directory, there is a vgg19.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The vgg19_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
+
 ## Run Command
   ```shell
   bash run_tuning.sh --config=vgg19.yaml --input_model=./path/to/model --output_model=./result
diff --git a/examples/tensorflow/image_recognition/keras_models/vgg19/quantization/ptq/vgg19.yaml b/examples/tensorflow/image_recognition/keras_models/vgg19/quantization/ptq/vgg19.yaml
index 165e5d538aa..00961c48d8d 100644
--- a/examples/tensorflow/image_recognition/keras_models/vgg19/quantization/ptq/vgg19.yaml
+++ b/examples/tensorflow/image_recognition/keras_models/vgg19/quantization/ptq/vgg19.yaml
@@ -19,6 +19,8 @@ model:                                               # mandatory. used to specif
   name: vgg_19
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/keras_models/vgg19/quantization/ptq/vgg19_itex.yaml b/examples/tensorflow/image_recognition/keras_models/vgg19/quantization/ptq/vgg19_itex.yaml
new file mode 100644
index 00000000000..1d3a4f77c9e
--- /dev/null
+++ b/examples/tensorflow/image_recognition/keras_models/vgg19/quantization/ptq/vgg19_itex.yaml
@@ -0,0 +1,78 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+version: 1.0
+
+model:                                               # mandatory. used to specify model specific information.
+  name: vgg_19
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+    postprocess:
+      transform:
+        LabelShift: 1
+  performance:                                       # optional. used to benchmark performance of passing model.
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/keras_models/xception/quantization/ptq/README.md b/examples/tensorflow/image_recognition/keras_models/xception/quantization/ptq/README.md
index 6901d048b83..91e72783e80 100644
--- a/examples/tensorflow/image_recognition/keras_models/xception/quantization/ptq/README.md
+++ b/examples/tensorflow/image_recognition/keras_models/xception/quantization/ptq/README.md
@@ -2,6 +2,7 @@ Step-by-Step
 ============
 
 This document is used to enable Tensorflow Keras models using Intel® Neural Compressor.
+This example can run on Intel CPUs and GPUs.
 
 
 ## Prerequisite
@@ -17,7 +18,23 @@ pip install intel-tensorflow
 ```
 > Note: Supported Tensorflow [Version](../../../../../../../README.md).
 
-### 3. Prepare Pretrained model
+### 3. Install Intel Extension for Tensorflow if needed
+#### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+#### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+
+### 4. Prepare Pretrained model
 
 The pretrained model is provided by [Keras Applications](https://keras.io/api/applications/). prepare the model, Run as follow: 
  ```
@@ -25,6 +42,9 @@ python prepare_model.py   --output_model=/path/to/model
  ```
 `--output_model ` the model should be saved as SavedModel format or H5 format.
 
+## Write Yaml config file
+In examples directory, there is a xception.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The xception_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
+
 ## Run Command
   ```shell
   bash run_tuning.sh --config=xception.yaml --input_model=./path/to/model --output_model=./result --eval_data=/path/to/evaluation/dataset --calib_data=/path/to/calibration/dataset
diff --git a/examples/tensorflow/image_recognition/keras_models/xception/quantization/ptq/xception.yaml b/examples/tensorflow/image_recognition/keras_models/xception/quantization/ptq/xception.yaml
index 2c742fd68a9..49eb1519cf8 100644
--- a/examples/tensorflow/image_recognition/keras_models/xception/quantization/ptq/xception.yaml
+++ b/examples/tensorflow/image_recognition/keras_models/xception/quantization/ptq/xception.yaml
@@ -19,6 +19,8 @@ model:                                               # mandatory. used to specif
   name: xception
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/keras_models/xception/quantization/ptq/xception_itex.yaml b/examples/tensorflow/image_recognition/keras_models/xception/quantization/ptq/xception_itex.yaml
new file mode 100644
index 00000000000..e0521d17216
--- /dev/null
+++ b/examples/tensorflow/image_recognition/keras_models/xception/quantization/ptq/xception_itex.yaml
@@ -0,0 +1,49 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+version: 1.0
+
+model:                                               # mandatory. used to specify model specific information.
+  name: xception
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+  op_wise: {
+             'v0/cg/conv0/conv2d/Conv2D': {
+               'activation':  {'dtype': ['fp32']},
+             }
+           }
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  performance:                                       # optional. used to benchmark performance of passing model.
+    iteration: 100
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/README.md b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/README.md
index d8dd42ecd5a..1700f78970d 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/README.md
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/README.md
@@ -2,6 +2,7 @@ Step-by-Step
 ============
 
 This document list steps of reproducing Intel Optimized TensorFlow image recognition models tuning results via Neural Compressor.
+This example can run on Intel CPUs and GPUs.
 
 > **Note**: 
 > Most of those models are both supported in Intel optimized TF 1.15.x and Intel optimized TF 2.x.
@@ -9,14 +10,42 @@ This document list steps of reproducing Intel Optimized TensorFlow image recogni
 # Prerequisite
 
 ### 1. Installation
-  Recommend python 3.6 or higher version.
+Recommend python 3.6 or higher version.
 
-  ```shell
-  cd examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq
-  pip install -r requirements.txt
-  ```
+```shell
+# Install Intel® Neural Compressor
+pip install neural-compressor
+```
+
+### 2. Install Intel Tensorflow
+```shell
+pip install intel-tensorflow
+```
+> Note: Supported Tensorflow [Version](../../../../../../README.md#supported-frameworks).
+
+### 3. Installation Dependency packages
+```shell
+cd examples/tensorflow/object_detection/tensorflow_models/quantization/ptq
+pip install -r requirements.txt
+```
+
+### 4. Install Intel Extension for Tensorflow if needed
+#### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
 
-### 2. Prepare Dataset
+#### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+
+### 5. Prepare Dataset
 
   TensorFlow [models](https://github.com/tensorflow/models) repo provides [scripts and instructions](https://github.com/tensorflow/models/tree/master/research/slim#an-automated-script-for-processing-imagenet-data) to download, process and convert the ImageNet dataset to the TF records format.
   We also prepared related scripts in `imagenet_prepare` directory. To download the raw images, the user must create an account with image-net.org. If you have downloaded the raw data and preprocessed the validation data by moving the images into the appropriate sub-directory based on the label (synset) of the image. we can use below command ro convert it to tf records format.
@@ -36,7 +65,7 @@ This document list steps of reproducing Intel Optimized TensorFlow image recogni
   tar -xvf caffe_ilsvrc12.tar.gz
   ```
 
-### 3. Prepare pre-trained model
+### 6. Prepare pre-trained model
   In this version, Intel® Neural Compressor just support PB file as input for TensorFlow backend, so we need prepared model pre-trained pb files. For some models pre-trained pb can be found in [IntelAI Models](https://github.com/IntelAI/models/tree/v1.6.0/benchmarks#tensorflow-use-cases), we can found the download link in README file of each model. And for others models in Google [models](https://github.com/tensorflow/models/tree/master/research/slim#pre-trained-models), we can get the pb files by convert the checkpoint files. We will give a example with Inception_v1 to show how to get the pb file by a checkpoint file.
 
   1. Download the checkpoint file from [here](https://github.com/tensorflow/models/tree/master/research/slim#pre-trained-models)
@@ -314,7 +343,7 @@ As ResNet50 V1.5 is a typical image recognition model, use Top-K as metric which
 
 ### Write Yaml config file
 
-In examples directory, there is a template.yaml. We could remove most of the items and only keep mandatory item for tuning. 
+In examples directory, there is a resnet50_v1_5.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The resnet50_v1_5_itex.yaml is prepared for the GPU case. We could remove most of the items and only keep mandatory item for tuning.
 
 
 ```yaml
@@ -326,6 +355,8 @@ model:                                               # mandatory. used to specif
   inputs: input_tensor
   outputs: softmax_tensor
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 5, 10                             # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/densenet121.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/densenet121.yaml
index b62e4bae4a2..b9da893f6da 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/densenet121.yaml
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/densenet121.yaml
@@ -17,6 +17,8 @@ model:                                               # mandatory. used to specif
   name: densenet121
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 5, 10, 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/densenet121_itex.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/densenet121_itex.yaml
new file mode 100644
index 00000000000..34492100da2
--- /dev/null
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/densenet121_itex.yaml
@@ -0,0 +1,89 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: densenet121
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 5, 10, 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          scale: 0.017
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+    weight:
+      granularity: per_channel
+  op_wise: {
+             'densenet121/MaxPool2D/MaxPool': {
+               'activation':  {'dtype': ['fp32']}
+             },
+             'densenet121/transition_block[1-3]/AvgPool2D/AvgPool': {
+               'activation':  {'dtype': ['fp32']},
+             }
+           }
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          scale: 0.017
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+    postprocess:
+      transform:
+        LabelShift: 1
+  performance:                                       # optional. used to benchmark performance of passing model.
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          scale: 0.017
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/densenet161.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/densenet161.yaml
index 88af07e77d8..5312ed341fa 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/densenet161.yaml
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/densenet161.yaml
@@ -17,6 +17,8 @@ model:                                               # mandatory. used to specif
   name: densenet161
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/densenet161_itex.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/densenet161_itex.yaml
new file mode 100644
index 00000000000..58f17d9f479
--- /dev/null
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/densenet161_itex.yaml
@@ -0,0 +1,89 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: densenet161
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          scale: 0.017
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+    weight:
+      granularity: per_channel
+  op_wise: {
+             'densenet161/MaxPool2D/MaxPool': {
+               'activation':  {'dtype': ['fp32']}
+             },
+             'densenet161/transition_block[1-3]/AvgPool2D/AvgPool': {
+               'activation':  {'dtype': ['fp32']},
+             }
+           }
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          scale: 0.017
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+    postprocess:
+      transform:
+        LabelShift: 1
+  performance:                                       # optional. used to benchmark performance of passing model.
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          scale: 0.017
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/densenet169.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/densenet169.yaml
index f115bd440ef..b63414d8acf 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/densenet169.yaml
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/densenet169.yaml
@@ -17,6 +17,8 @@ model:                                               # mandatory. used to specif
   name: densenet169
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/densenet169_itex.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/densenet169_itex.yaml
new file mode 100644
index 00000000000..55360042c94
--- /dev/null
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/densenet169_itex.yaml
@@ -0,0 +1,89 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: densenet169
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          scale: 0.017
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+    weight:
+      granularity: per_channel
+  op_wise: {
+             'densenet169/MaxPool2D/MaxPool': {
+               'activation':  {'dtype': ['fp32']}
+             },
+             'densenet169/transition_block[1-3]/AvgPool2D/AvgPool': {
+               'activation':  {'dtype': ['fp32']},
+             }
+           }
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          scale: 0.017
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+    postprocess:
+      transform:
+        LabelShift: 1
+  performance:                                       # optional. used to benchmark performance of passing model.
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          scale: 0.017
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/efficientnet-b0.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/efficientnet-b0.yaml
index 704e43f2d3f..35dd2e08edf 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/efficientnet-b0.yaml
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/efficientnet-b0.yaml
@@ -19,6 +19,8 @@ model:                                               # mandatory. neural_compres
   inputs: truediv
   outputs: Squeeze
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 5, 10, 50, 100                    # optional. default value is the size of whole dataset. used to set how many portions of calibration dataset is used. exclusive with iterations field.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/efficientnet-b0_itex.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/efficientnet-b0_itex.yaml
new file mode 100644
index 00000000000..c986d701a38
--- /dev/null
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/efficientnet-b0_itex.yaml
@@ -0,0 +1,90 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. neural_compressor uses this model name and framework name to decide where to save tuning history and deploy yaml.
+  name: efficientnet-b0
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+  inputs: truediv
+  outputs: Squeeze
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 5, 10, 50, 100                    # optional. default value is the size of whole dataset. used to set how many portions of calibration dataset is used. exclusive with iterations field.
+    dataloader:
+      dataset:
+        ImagenetRaw:
+          data_path: /path/to/calibration/dataset     # NOTE: modify to calibration dataset location if needed
+          image_list: /path/to/calibration/label      # data file, record image_names and their labels
+      transform:
+        PaddedCenterCrop:
+          size: 224
+          crop_padding: 32
+        Resize:
+          size: 224
+          interpolation: bicubic
+        Normalize:
+          mean: [123.675, 116.28, 103.53]
+          std: [58.395, 57.12, 57.375]
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImagenetRaw:
+          data_path: /path/to/evaluation/dataset     # NOTE: modify to evaluation dataset location if needed
+          image_list: /path/to/evaluation/label      # data file, record image_names and their labels
+      transform:
+        PaddedCenterCrop:
+          size: 224
+          crop_padding: 32
+        Resize:
+          size: 224
+          interpolation: bicubic
+        Normalize:
+          mean: [123.675, 116.28, 103.53]
+          std: [58.395, 57.12, 57.375]
+  performance:                                       # optional. used to benchmark performance of passing model.
+    iteration: 100
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1
+      dataset:
+        ImagenetRaw:
+          data_path: /path/to/evaluation/dataset     # NOTE: modify to evaluation dataset location if needed
+          image_list: /path/to/evaluation/label      # data file, record image_names and their labels
+      transform:
+        PaddedCenterCrop:
+          size: 224
+          crop_padding: 32
+        Resize:
+          size: 224
+          interpolation: bicubic
+        Normalize:
+          mean: [123.675, 116.28, 103.53]
+          std: [58.395, 57.12, 57.375]
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_resnet_v2.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_resnet_v2.yaml
index 347bd905105..51910f449bb 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_resnet_v2.yaml
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_resnet_v2.yaml
@@ -17,6 +17,8 @@ model:                                               # mandatory. used to specif
   name: inception_resnet_v2
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_resnet_v2_itex.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_resnet_v2_itex.yaml
new file mode 100644
index 00000000000..a0728bf6684
--- /dev/null
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_resnet_v2_itex.yaml
@@ -0,0 +1,71 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: inception_resnet_v2
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 299
+          width: 299
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 299
+          width: 299
+  performance:                                       # optional. used to benchmark performance of passing model.
+    iteration: 100
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet:
+          height: 299
+          width: 299
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v1.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v1.yaml
index 22a0947cf58..7968e9528af 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v1.yaml
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v1.yaml
@@ -17,6 +17,8 @@ model:                                               # mandatory. used to specif
   name: inception_v1
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v1_itex.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v1_itex.yaml
new file mode 100644
index 00000000000..b901f93db6f
--- /dev/null
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v1_itex.yaml
@@ -0,0 +1,71 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: inception_v1
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+  performance:                                       # optional. used to benchmark performance of passing model.
+    iteration: 100
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v2.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v2.yaml
index fc4f739341c..25f6ec304be 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v2.yaml
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v2.yaml
@@ -17,6 +17,8 @@ model:                                               # mandatory. used to specif
   name: inception_v2
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v2_itex.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v2_itex.yaml
new file mode 100644
index 00000000000..88b4c0d1eb7
--- /dev/null
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v2_itex.yaml
@@ -0,0 +1,71 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: inception_v2
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+  performance:                                       # optional. used to benchmark performance of passing model.
+    iteration: 100
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v3.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v3.yaml
index ed9f8154263..219a99a7bbf 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v3.yaml
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v3.yaml
@@ -17,6 +17,8 @@ model:                                               # mandatory. used to specif
   name: inception_v3
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v3_itex.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v3_itex.yaml
new file mode 100644
index 00000000000..21d7453ee01
--- /dev/null
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v3_itex.yaml
@@ -0,0 +1,76 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: inception_v3
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 299
+          width: 299
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+  op_wise: {
+             'v0/cg/conv0/conv2d/Conv2D': {
+               'activation':  {'dtype': ['fp32']},
+             }
+           }
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 299
+          width: 299
+  performance:                                       # optional. used to benchmark performance of passing model.
+    iteration: 100
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 299
+          width: 299
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v4.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v4.yaml
index a247f2413ad..e4942fa1090 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v4.yaml
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v4.yaml
@@ -17,6 +17,8 @@ model:                                               # mandatory. used to specif
   name: inception_v4
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v4_itex.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v4_itex.yaml
new file mode 100644
index 00000000000..09697ce5307
--- /dev/null
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/inception_v4_itex.yaml
@@ -0,0 +1,71 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: inception_v4
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 299
+          width: 299
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 299
+          width: 299
+  performance:                                       # optional. used to benchmark performance of passing model.
+    iteration: 100
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet:
+          height: 299
+          width: 299
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/mobilenet_v1.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/mobilenet_v1.yaml
index 9eb2c3782e9..309677a5210 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/mobilenet_v1.yaml
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/mobilenet_v1.yaml
@@ -13,10 +13,14 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+version: 1.0
+
 model:                                               # mandatory. used to specify model specific information.
   name: mobilenet_v1
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 20, 50                            # optional. default value is 100. used to set how many samples should be used in calibration.
@@ -24,7 +28,7 @@ quantization:                                        # optional. tuning constrai
       batch_size: 10
       dataset:
         ImageRecord:
-          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+          root: /path/to/evaluation/dataset          # NOTE: modify to calibration dataset location if needed
       transform:
         BilinearImagenet: 
           height: 224
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/mobilenet_v1_itex.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/mobilenet_v1_itex.yaml
new file mode 100644
index 00000000000..29b4dcde606
--- /dev/null
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/mobilenet_v1_itex.yaml
@@ -0,0 +1,73 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: mobilenet_v1
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 20, 50                            # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+    weight:
+      granularity: per_channel
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+  performance:                                       # optional. used to benchmark performance of passing model.
+    iteration: 100
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet:
+          height: 224
+          width: 224
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/mobilenet_v2.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/mobilenet_v2.yaml
index b8d9b7bfd87..0b0c25b458b 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/mobilenet_v2.yaml
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/mobilenet_v2.yaml
@@ -17,6 +17,8 @@ model:                                               # mandatory. used to specif
   name: mobilenet_v2
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 20, 50                            # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/mobilenet_v2_itex.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/mobilenet_v2_itex.yaml
new file mode 100644
index 00000000000..0ae6a06a429
--- /dev/null
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/mobilenet_v2_itex.yaml
@@ -0,0 +1,82 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: mobilenet_v2
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 20, 50                            # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+    weight:
+      granularity: per_channel
+
+  op_wise: {
+             'MobilenetV2/expanded_conv/depthwise/depthwise': {
+               'activation':  {'dtype': ['fp32']},
+             },
+             'MobilenetV2/Conv_1/Conv2D': {
+               'activation':  {'dtype': ['fp32']},
+             }
+           }
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+  performance:                                       # optional. used to benchmark performance of passing model.
+    iteration: 100
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet:
+          height: 224
+          width: 224
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/mobilenet_v3.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/mobilenet_v3.yaml
index 9b7961be538..accb37095f3 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/mobilenet_v3.yaml
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/mobilenet_v3.yaml
@@ -18,6 +18,8 @@ model:                                               # mandatory. used to specif
   name: mobilenet_v3
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 10                               # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/mobilenet_v3_itex.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/mobilenet_v3_itex.yaml
new file mode 100644
index 00000000000..5561f206a52
--- /dev/null
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/mobilenet_v3_itex.yaml
@@ -0,0 +1,90 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+model:                                               # mandatory. used to specify model specific information.
+  name: mobilenet_v3
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 10                               # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 1
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+  recipes:                                           # optional. used to switch neural_compressor int8 receipts ON or OFF.
+    scale_propagation_max_pooling: False
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+    weight:
+      granularity: per_channel
+  op_wise: {
+             'MobilenetV3/expanded_conv_3/project/Conv2D': {
+               'activation':  {'dtype': ['fp32']},
+             },
+             'MobilenetV3/expanded_conv_3/squeeze_excite/Conv/Conv2D': {
+               'activation':  {'dtype': ['fp32']},
+             },
+             'MobilenetV3/expanded_conv_1/expand/Conv2D': {
+               'activation':  {'dtype': ['fp32']},
+             },
+             'MobilenetV3/Conv/Conv2D': {
+               'activation':  {'dtype': ['fp32']},
+             },
+           }
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to calibration dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+  performance:                                       # optional. used to benchmark performance of passing model.
+    iteration: 100
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to calibration dataset location if needed
+      transform:
+        BilinearImagenet:
+          height: 224
+          width: 224
+
+tuning:
+  accuracy_criterion:
+    relative:  0.02                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/nasnet_mobile.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/nasnet_mobile.yaml
index 3b20901e9c4..46566d61c7e 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/nasnet_mobile.yaml
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/nasnet_mobile.yaml
@@ -19,6 +19,8 @@ model:                                               # mandatory. used to specif
   inputs: input
   outputs: final_layer/predictions
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50                                # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/nasnet_mobile_itex.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/nasnet_mobile_itex.yaml
new file mode 100644
index 00000000000..a1c86faec21
--- /dev/null
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/nasnet_mobile_itex.yaml
@@ -0,0 +1,75 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: nasnet_mobile
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+  inputs: input
+  outputs: final_layer/predictions
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50                                # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+    weight:
+      granularity: per_channel
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+  performance:                                       # optional. used to benchmark performance of passing model.
+    iteration: 100
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet:
+          height: 224
+          width: 224
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet101.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet101.yaml
index 9209330d22a..20d1fbaad94 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet101.yaml
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet101.yaml
@@ -17,6 +17,8 @@ model:                                               # mandatory. used to specif
   name: resnet_v1_101
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet101_itex.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet101_itex.yaml
new file mode 100644
index 00000000000..27f43a3020a
--- /dev/null
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet101_itex.yaml
@@ -0,0 +1,77 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: resnet_v1_101
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+    postprocess:
+      transform:
+        LabelShift: 1
+  performance:                                       # optional. used to benchmark performance of passing model.
+    iteration: 100
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset
+      transform:
+        ResizeCropImagenet:
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet50_v1.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet50_v1.yaml
index 0cde9d1e551..3e59160457a 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet50_v1.yaml
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet50_v1.yaml
@@ -17,6 +17,8 @@ model:                                               # mandatory. used to specif
   name: resnet50_v1
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet50_v1_5.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet50_v1_5.yaml
index e2699a436c2..6857e013035 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet50_v1_5.yaml
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet50_v1_5.yaml
@@ -13,11 +13,15 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+version: 1.0
+
 model:                                               # mandatory. used to specify model specific information.
   name: resnet50_v1_5
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
   outputs: softmax_tensor
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
@@ -57,7 +61,7 @@ evaluation:                                          # optional. required if use
       batch_size: 1 
       dataset:
         ImageRecord:
-          root: /path/to/evaluation/dataset
+          root: /path/to/calibration/dataset
       transform:
         ResizeCropImagenet: 
           height: 224
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet50_v1_5_itex.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet50_v1_5_itex.yaml
new file mode 100644
index 00000000000..49e9a7f674c
--- /dev/null
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet50_v1_5_itex.yaml
@@ -0,0 +1,74 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: resnet50_v1_5
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+  outputs: softmax_tensor
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+  performance:                                       # optional. used to benchmark performance of passing model.
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1 
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset
+      transform:
+        ResizeCropImagenet: 
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet50_v1_itex.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet50_v1_itex.yaml
new file mode 100644
index 00000000000..fdadaf7d412
--- /dev/null
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet50_v1_itex.yaml
@@ -0,0 +1,71 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: resnet50_v1
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          height: 224
+          width: 224
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          height: 224
+          width: 224
+  performance:                                       # optional. used to benchmark performance of passing model.
+    iteration: 100
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet:
+          height: 224
+          width: 224
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet_v2_101.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet_v2_101.yaml
index 830cadb4e6a..bbdaeed4511 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet_v2_101.yaml
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet_v2_101.yaml
@@ -17,6 +17,8 @@ model:                                               # mandatory. used to specif
   name: resnet_v2_101
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet_v2_101_itex.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet_v2_101_itex.yaml
new file mode 100644
index 00000000000..69ed4cc2a73
--- /dev/null
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet_v2_101_itex.yaml
@@ -0,0 +1,70 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: resnet_v2_101
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+  performance:                                       # optional. used to benchmark performance of passing model.
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1 
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet_v2_152.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet_v2_152.yaml
index 32b8929b104..398e8115237 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet_v2_152.yaml
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet_v2_152.yaml
@@ -17,6 +17,8 @@ model:                                               # mandatory. used to specif
   name: resnet_v2_152
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet_v2_152_itex.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet_v2_152_itex.yaml
new file mode 100644
index 00000000000..44e1e3dff49
--- /dev/null
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet_v2_152_itex.yaml
@@ -0,0 +1,70 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: resnet_v2_152
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+  performance:                                       # optional. used to benchmark performance of passing model.
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1 
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet_v2_50.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet_v2_50.yaml
index 2bbd781babc..f3f996c85cd 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet_v2_50.yaml
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet_v2_50.yaml
@@ -17,6 +17,8 @@ model:                                               # mandatory. used to specif
   name: resnet_v2_50
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet_v2_50_itex.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet_v2_50_itex.yaml
new file mode 100644
index 00000000000..28efda5dee8
--- /dev/null
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/resnet_v2_50_itex.yaml
@@ -0,0 +1,70 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: resnet_v2_50
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+  performance:                                       # optional. used to benchmark performance of passing model.
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1 
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        BilinearImagenet: 
+          height: 224
+          width: 224
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/slim/README.md b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/slim/README.md
index 967130469c7..fc625d75b72 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/slim/README.md
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/slim/README.md
@@ -200,6 +200,8 @@ model:                                               # mandatory. used to specif
   inputs: input
   outputs: InceptionV1/Logits/Predictions/Reshape_1
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 5, 10                             # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/slim/inception_v3.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/slim/inception_v3.yaml
index ed9f8154263..219a99a7bbf 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/slim/inception_v3.yaml
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/slim/inception_v3.yaml
@@ -17,6 +17,8 @@ model:                                               # mandatory. used to specif
   name: inception_v3
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/slim/resnet_v1_152.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/slim/resnet_v1_152.yaml
index ebf3c215306..b96faa54cd3 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/slim/resnet_v1_152.yaml
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/slim/resnet_v1_152.yaml
@@ -17,6 +17,8 @@ model:                                               # mandatory. used to specif
   name: resnet_v1_152
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/slim/resnet_v1_50.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/slim/resnet_v1_50.yaml
index cb3e3768e2d..d92354e4b49 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/slim/resnet_v1_50.yaml
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/slim/resnet_v1_50.yaml
@@ -17,6 +17,8 @@ model:                                               # mandatory. used to specif
   name: resnet_v1_50
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/vgg16.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/vgg16.yaml
index c1f414f5971..928903f8b20 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/vgg16.yaml
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/vgg16.yaml
@@ -17,6 +17,8 @@ model:                                               # mandatory. used to specif
   name: vgg_16
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/vgg16_itex.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/vgg16_itex.yaml
new file mode 100644
index 00000000000..716cb757642
--- /dev/null
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/vgg16_itex.yaml
@@ -0,0 +1,76 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: vgg_16
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+    postprocess:
+      transform:
+        LabelShift: 1
+  performance:                                       # optional. used to benchmark performance of passing model.
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/vgg19.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/vgg19.yaml
index bf298bd1bcb..62453705a94 100644
--- a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/vgg19.yaml
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/vgg19.yaml
@@ -13,10 +13,14 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+version: 1.0
+
 model:                                               # mandatory. used to specify model specific information.
   name: vgg_19
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/vgg19_itex.yaml b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/vgg19_itex.yaml
new file mode 100644
index 00000000000..9b7b37ebbf8
--- /dev/null
+++ b/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq/vgg19_itex.yaml
@@ -0,0 +1,76 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: vgg_19
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      batch_size: 10
+      dataset:
+        ImageRecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 32
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        ResizeCropImagenet: 
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+    postprocess:
+      transform:
+        LabelShift: 1
+  performance:                                       # optional. used to benchmark performance of passing model.
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 7
+    dataloader:
+      batch_size: 1
+      dataset:
+        ImageRecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
+      transform:
+        ResizeCropImagenet:
+          height: 224
+          width: 224
+          mean_value: [123.68, 116.78, 103.94]
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/nlp/bert_base_mrpc/quantization/ptq/README.md b/examples/tensorflow/nlp/bert_base_mrpc/quantization/ptq/README.md
index 3cd305c5b8a..caf6231e040 100644
--- a/examples/tensorflow/nlp/bert_base_mrpc/quantization/ptq/README.md
+++ b/examples/tensorflow/nlp/bert_base_mrpc/quantization/ptq/README.md
@@ -2,6 +2,7 @@ Step-by-Step
 ============
 
 This document is used to list steps of reproducing TensorFlow Intel® Neural Compressor tuning zoo result of bert base model on mrpc task.
+This example can run on Intel CPUs and GPUs.
 
 
 ## Prerequisite
@@ -11,13 +12,29 @@ This document is used to list steps of reproducing TensorFlow Intel® Neural Com
 # Install Intel® Neural Compressor
 pip install neural-compressor
 ```
-### 2. Install Intel Tensorflow 1.15 up2
-Check your python version and use pip install 1.15.0 up2 from links below:
-https://storage.googleapis.com/intel-optimized-tensorflow/intel_tensorflow-1.15.0up2-cp36-cp36m-manylinux2010_x86_64.whl                
-https://storage.googleapis.com/intel-optimized-tensorflow/intel_tensorflow-1.15.0up2-cp37-cp37m-manylinux2010_x86_64.whl
-https://storage.googleapis.com/intel-optimized-tensorflow/intel_tensorflow-1.15.0up2-cp35-cp35m-manylinux2010_x86_64.whl
+### 2. Install Intel Tensorflow
+```shell
+pip install intel-tensorflow
+```
+
+### 3. Install Intel Extension for Tensorflow if needed
+
+#### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+#### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
 
-### 3. Prepare Dataset
+### 4. Prepare Dataset
 
 #### Automatic dataset download
 Run the `prepare_dataset.sh` script located in `examples/tensorflow/nlp/bert_base_mrpc/quantization/ptq`.
@@ -28,7 +45,7 @@ cd examples/tensorflow/nlp/bert_base_mrpc/quantization/ptq
 python prepare_dataset.py --tasks='MRPC' --output_dir=./data
 ```
 
-### 4. Prepare pretrained model
+### 5. Prepare pretrained model
 
 #### Automatic model download
 Run the `prepare_model.sh` script located in `examples/tensorflow/nlp/bert_base_mrpc/quantization/ptq`.
@@ -93,7 +110,7 @@ This is a tutorial of how to enable bert model with Intel® Neural Compressor.
 For bert, we applied the first one as we  already have write dataset and metric for bert mrpc task. 
 
 ### Write Yaml config file
-In examples directory, there is a mrpc.yaml. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
+In examples directory, there is a mrpc.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The mrpc_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
 
 ```yaml
 model:
@@ -102,6 +119,8 @@ model:
   inputs: input_file, batch_size
   outputs: loss/Softmax:0, IteratorGetNext:3
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 evaluation:
   accuracy: {}
   performance:
diff --git a/examples/tensorflow/nlp/bert_base_mrpc/quantization/ptq/mrpc.yaml b/examples/tensorflow/nlp/bert_base_mrpc/quantization/ptq/mrpc.yaml
index 21842430a71..8e2a4141f91 100644
--- a/examples/tensorflow/nlp/bert_base_mrpc/quantization/ptq/mrpc.yaml
+++ b/examples/tensorflow/nlp/bert_base_mrpc/quantization/ptq/mrpc.yaml
@@ -19,6 +19,8 @@ model:
   inputs: input_file, batch_size
   outputs: loss/Softmax:0, IteratorGetNext:3
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 evaluation:
   accuracy: {}
   performance:
diff --git a/examples/tensorflow/nlp/bert_base_mrpc/quantization/ptq/mrpc_itex.yaml b/examples/tensorflow/nlp/bert_base_mrpc/quantization/ptq/mrpc_itex.yaml
new file mode 100644
index 00000000000..000daa59cdd
--- /dev/null
+++ b/examples/tensorflow/nlp/bert_base_mrpc/quantization/ptq/mrpc_itex.yaml
@@ -0,0 +1,46 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:
+  name: bert
+  framework: tensorflow_itex
+  inputs: input_file, batch_size
+  outputs: loss/Softmax:0, IteratorGetNext:3
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+evaluation:
+  accuracy: {}
+  performance:
+    iteration: 20
+    warmup: 5
+    configs:
+      num_of_instance: 1
+      cores_per_instance: 28 
+      kmp_blocktime: 1
+
+quantization:            
+  calibration:
+    sampling_size: 500
+  model_wise:
+    weight:
+      granularity: per_channel
+tuning:
+  accuracy_criterion:
+    relative:  0.01   
+  exit_policy:
+    timeout: 0       
+    max_trials: 100 
+  random_seed: 9527
diff --git a/examples/tensorflow/nlp/bert_large_squad/quantization/ptq/README.md b/examples/tensorflow/nlp/bert_large_squad/quantization/ptq/README.md
index be65cd2b51f..49ce4fb8be7 100644
--- a/examples/tensorflow/nlp/bert_large_squad/quantization/ptq/README.md
+++ b/examples/tensorflow/nlp/bert_large_squad/quantization/ptq/README.md
@@ -2,6 +2,7 @@ Step-by-Step
 ============
 
 This document is used to list steps of reproducing TensorFlow Intel® Neural Compressor tuning zoo result of bert large model on squad v1.1 task.
+This example can run on Intel CPUs and GPUs.
 
 
 ## Prerequisite
@@ -16,7 +17,24 @@ pip install neural-compressor
 pip install intel-tensorflow
 ```
 
-### 3. Prepare Dataset
+### 3. Install Intel Extension for Tensorflow if needed
+
+#### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+#### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+
+### 4. Prepare Dataset
 ```shell
 wget https://storage.googleapis.com/bert_models/2019_05_30/wwm_uncased_L-24_H-1024_A-16.zip
 ```
@@ -44,7 +62,7 @@ Then create the tf_record file and you need to config the tf_record path in yaml
 python create_tf_record.py --vocab_file=data/vocab.txt --predict_file=data/dev-v1.1.json --output_file=./eval.tf_record
 ```
 
-### 4. Prepare Pretrained model
+### 5. Prepare Pretrained model
 
 #### Manual approach
 
@@ -91,7 +109,7 @@ This is a tutorial of how to enable bert model with Intel® Neural Compressor.
 For bert, we applied the first one as we  already have built-in dataset and metric for bert squad task. 
 
 ### Write Yaml config file
-In examples directory, there is a bert.yaml. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
+In examples directory, there is a bert.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The bert_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
 
 ```yaml
 model: 
@@ -100,6 +118,8 @@ model:
   inputs: input_file, batch_size
   outputs: IteratorGetNext:3, unstack:0, unstack:1
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 evaluation:
   accuracy:
     metric:
diff --git a/examples/tensorflow/nlp/bert_large_squad/quantization/ptq/bert.yaml b/examples/tensorflow/nlp/bert_large_squad/quantization/ptq/bert.yaml
index 2c3f3b50a22..8db7ebc194f 100644
--- a/examples/tensorflow/nlp/bert_large_squad/quantization/ptq/bert.yaml
+++ b/examples/tensorflow/nlp/bert_large_squad/quantization/ptq/bert.yaml
@@ -19,6 +19,8 @@ model:
   inputs: input_file, batch_size
   outputs: IteratorGetNext:3, unstack:0, unstack:1
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 evaluation:
   accuracy:
     metric:
diff --git a/examples/tensorflow/nlp/bert_large_squad/quantization/ptq/bert_itex.yaml b/examples/tensorflow/nlp/bert_large_squad/quantization/ptq/bert_itex.yaml
new file mode 100644
index 00000000000..e4198c3a3d4
--- /dev/null
+++ b/examples/tensorflow/nlp/bert_large_squad/quantization/ptq/bert_itex.yaml
@@ -0,0 +1,69 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:
+  name: bert
+  framework: tensorflow_itex
+  inputs: input_file, batch_size
+  outputs: IteratorGetNext:3, unstack:0, unstack:1
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+evaluation:
+  accuracy:
+    metric:
+      SquadF1:
+    dataloader:
+      dataset:
+        bert:
+          root: /path/to/eval.tf_record
+          label_file: /path/to/dev-v1.1.json
+      batch_size: 64
+    postprocess:
+      transform:
+        SquadV1:
+          label_file: /path/to/dev-v1.1.json
+          vocab_file: /path/to/vocab.txt
+  performance:
+    iteration: 10
+    configs:
+        num_of_instance: 4
+        cores_per_instance: 7
+    dataloader:
+      dataset:
+        bert:
+          root: /path/to/eval.tf_record
+          label_file: /path/to/dev-v1.1.json
+      batch_size: 64
+
+quantization:            
+  calibration:
+    sampling_size: 500
+    dataloader:
+      dataset:
+        bert:
+          root: /path/to/eval.tf_record
+          label_file: /path/to/dev-v1.1.json
+      batch_size: 64
+  model_wise:
+    weight:
+      granularity: per_channel
+tuning:
+  accuracy_criterion:
+    relative:  0.01   
+  exit_policy:
+    timeout: 0       
+    max_trials: 100 
+  random_seed: 9527
diff --git a/examples/tensorflow/nlp/bert_large_squad_model_zoo/quantization/ptq/README.md b/examples/tensorflow/nlp/bert_large_squad_model_zoo/quantization/ptq/README.md
index b8eddd989b3..db79c5d9dc4 100644
--- a/examples/tensorflow/nlp/bert_large_squad_model_zoo/quantization/ptq/README.md
+++ b/examples/tensorflow/nlp/bert_large_squad_model_zoo/quantization/ptq/README.md
@@ -2,6 +2,7 @@ Step-by-Step
 ============
 
 This document is used to list steps of reproducing TensorFlow Intel® Neural Compressor tuning result of Intel® Model Zoo bert large model on squad v1.1 task.
+This example can run on Intel CPUs and GPUs.
 
 
 ## Prerequisite
@@ -16,7 +17,24 @@ pip install neural-compressor
 pip install intel-tensorflow
 ```
 
-### 3. Prepare Dataset
+### 3. Install Intel Extension for Tensorflow if needed
+
+#### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+#### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+
+### 4. Prepare Dataset
 ```shell
 wget https://storage.googleapis.com/bert_models/2019_05_30/wwm_uncased_L-24_H-1024_A-16.zip
 ```
@@ -44,11 +62,14 @@ Then create the tf_record file and you need to config the tf_record path in yaml
 python create_tf_record.py --vocab_file=data/vocab.txt --predict_file=data/dev-v1.1.json --output_file=./eval.tf_record
 ```
 
-### 4. Prepare Pretrained model
+### 5. Prepare Pretrained model
 ```shell
 wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v2_7_0/fp32_bert_squad.pb
 ```
 
+## Write Yaml config file
+In examples directory, there is a bert.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The bert_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
+
 ## Run Command
   <b><font color='red'>Please make sure below command should be executed with the same Tensorflow runtime version as above step.</font></b>
 
diff --git a/examples/tensorflow/nlp/bert_large_squad_model_zoo/quantization/ptq/bert.yaml b/examples/tensorflow/nlp/bert_large_squad_model_zoo/quantization/ptq/bert.yaml
index de7cd81c4b1..5a9080d6f15 100644
--- a/examples/tensorflow/nlp/bert_large_squad_model_zoo/quantization/ptq/bert.yaml
+++ b/examples/tensorflow/nlp/bert_large_squad_model_zoo/quantization/ptq/bert.yaml
@@ -13,12 +13,16 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+version: 1.0
+
 model:
   name: bert
   framework: tensorflow
   inputs: input_ids, input_mask, segment_ids
   outputs: start_logits, end_logits
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 evaluation:
   accuracy:
     metric:
@@ -64,4 +68,4 @@ tuning:
   exit_policy:
     timeout: 0       
     max_trials: 100 
-  random_seed: 9527
\ No newline at end of file
+  random_seed: 9527
diff --git a/examples/tensorflow/nlp/bert_large_squad_model_zoo/quantization/ptq/bert_itex.yaml b/examples/tensorflow/nlp/bert_large_squad_model_zoo/quantization/ptq/bert_itex.yaml
new file mode 100644
index 00000000000..36730dbfa1a
--- /dev/null
+++ b/examples/tensorflow/nlp/bert_large_squad_model_zoo/quantization/ptq/bert_itex.yaml
@@ -0,0 +1,69 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:
+  name: bert
+  framework: tensorflow_itex
+  inputs: input_ids, input_mask, segment_ids
+  outputs: start_logits, end_logits
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+evaluation:
+  accuracy:
+    metric:
+      SquadF1:
+    dataloader:
+      dataset:
+        mzbert:
+          root: /path/to/eval.tf_record
+          label_file: /path/to/dev-v1.1.json
+      batch_size: 64
+    postprocess:
+      transform:
+        SquadV1ModelZoo:
+          label_file: /path/to/dev-v1.1.json
+          vocab_file: /path/to/vocab.txt
+  performance:
+    iteration: 10
+    configs:
+        num_of_instance: 4
+        cores_per_instance: 7
+    dataloader:
+      dataset:
+        mzbert:
+          root: /path/to/eval.tf_record
+          label_file: /path/to/dev-v1.1.json
+      batch_size: 64
+
+quantization:            
+  calibration:
+    sampling_size: 500
+    dataloader:
+      dataset:
+        mzbert:
+          root: /path/to/eval.tf_record
+          label_file: /path/to/dev-v1.1.json
+      batch_size: 64
+  model_wise:
+    weight:
+      granularity: per_channel
+tuning:
+  accuracy_criterion:
+    relative:  0.01   
+  exit_policy:
+    timeout: 0       
+    max_trials: 100 
+  random_seed: 9527
\ No newline at end of file
diff --git a/examples/tensorflow/nlp/transformer_lt/quantization/ptq/README.md b/examples/tensorflow/nlp/transformer_lt/quantization/ptq/README.md
index cc8346486dd..1735a0f5d7f 100644
--- a/examples/tensorflow/nlp/transformer_lt/quantization/ptq/README.md
+++ b/examples/tensorflow/nlp/transformer_lt/quantization/ptq/README.md
@@ -1,7 +1,7 @@
 Step-by-Step
 ============
 
-This document is used to list steps of reproducing TensorFlow Intel® Neural Compressor tuning zoo result of Transformer-LT.
+This document is used to list steps of reproducing TensorFlow Intel® Neural Compressor tuning zoo result of Transformer-LT. This example can run on Intel CPUs and GPUs.
 
 ## Prerequisite
 
@@ -17,7 +17,24 @@ pip install intel-tensorflow
 ```
 > Note: Supported Tensorflow [Version](../../../../../../README.md#supported-frameworks).
 
-### 3. Prepare Dataset & Pretrained model
+### 3. Install Intel Extension for Tensorflow if needed
+
+#### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+#### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+
+### 4. Prepare Dataset & Pretrained model
 
 ```shell
 wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v2_2_0/transformer-lt-official-fp32-inference.tar.gz
@@ -76,7 +93,7 @@ class Dataset(object):
 We evaluate the model with BLEU score, its source: https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/bleu_hook.py
 
 ### Write Yaml config file
-In examples directory, there is a transformer_lt.yaml. We could remove most of items and only keep mandatory item for tuning.
+In examples directory, there is a transformer_lt.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The transformer_lt_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
 
 ```yaml
 model:
@@ -85,6 +102,8 @@ model:
   inputs: input_tensor
   outputs: model/Transformer/strided_slice_19
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:
   calibration:
     sampling_size: 500
diff --git a/examples/tensorflow/nlp/transformer_lt/quantization/ptq/transformer_lt.yaml b/examples/tensorflow/nlp/transformer_lt/quantization/ptq/transformer_lt.yaml
index 185bb2b8979..f8de7826c45 100644
--- a/examples/tensorflow/nlp/transformer_lt/quantization/ptq/transformer_lt.yaml
+++ b/examples/tensorflow/nlp/transformer_lt/quantization/ptq/transformer_lt.yaml
@@ -13,12 +13,16 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+version: 1.0
+
 model:
   name: transformer_lt
   framework: tensorflow
   inputs: input_tensor
   outputs: model/Transformer/strided_slice_19
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:
   calibration:
     sampling_size: 500
diff --git a/examples/tensorflow/nlp/transformer_lt/quantization/ptq/transformer_lt_itex.yaml b/examples/tensorflow/nlp/transformer_lt/quantization/ptq/transformer_lt_itex.yaml
new file mode 100644
index 00000000000..1e285c55dd9
--- /dev/null
+++ b/examples/tensorflow/nlp/transformer_lt/quantization/ptq/transformer_lt_itex.yaml
@@ -0,0 +1,37 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:
+  name: transformer_lt
+  framework: tensorflow_itex
+  inputs: input_tensor
+  outputs: model/Transformer/strided_slice_19
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:
+  calibration:
+    sampling_size: 500
+  model_wise:
+    weight:
+      granularity: per_channel
+
+tuning:
+  accuracy_criterion:
+    relative: 0.01
+  exit_policy:
+    timeout: 0
+    max_trials: 100
+  random_seed: 9527
diff --git a/examples/tensorflow/nlp/transformer_lt_mlperf/quantization/ptq/README.md b/examples/tensorflow/nlp/transformer_lt_mlperf/quantization/ptq/README.md
index d0eb150df41..9e136c2419e 100644
--- a/examples/tensorflow/nlp/transformer_lt_mlperf/quantization/ptq/README.md
+++ b/examples/tensorflow/nlp/transformer_lt_mlperf/quantization/ptq/README.md
@@ -2,6 +2,7 @@ Step-by-Step
 ============
 
 This document is used to list steps of reproducing TensorFlow Intel® Neural Compressor tuning zoo result of Transformer_LT_mlperf. Part of the inference code is based on the transformer mlperf evaluation code. Detailed information on mlperf benchmark can be found in [mlcommons/training](https://github.com/mlperf/training/tree/master/translation/tensorflow/transformer).
+This example can run on Intel CPUs and GPUs.
 
 ## Prerequisite
 
@@ -17,7 +18,24 @@ pip install tensorflow
 ```
 > Note: Supported Tensorflow [Version](../../../../../../README.md#validated-software-environment).
 
-### 3. Prepare Dataset & Frozen Model
+### 3. Install Intel Extension for Tensorflow if needed
+
+#### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+#### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+
+### 4. Prepare Dataset & Frozen Model
 Follow the [instructions](https://github.com/IntelAI/models/blob/master/benchmarks/language_translation/tensorflow/transformer_mlperf/inference/fp32/README.md) on [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models) to download and preprocess the WMT English-German dataset and generate a FP32 frozen model. Please make sure there are the following files in the dataset directory and the input model directory.
 * DATASET_DIR: newstest2014.de, newstest2014.en, vocab.ende.32768
 
@@ -114,7 +132,7 @@ class Dataset(object):
 We evaluate the model with BLEU score, its source: https://github.com/IntelAI/models/blob/master/models/language_translation/tensorflow/transformer_mlperf/inference/fp32/transformer/compute_bleu.py
 
 ### Write Yaml config file
-In examples directory, there is a transformer_lt_mlperf.yaml. We could remove most of items and only keep mandatory item for tuning.
+In examples directory, there is a transformer_lt_mlperf.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The transformer_lt_mlperf_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
 
 ```yaml
 model:
@@ -123,6 +141,8 @@ model:
   inputs: input_tokens
   outputs: model/Transformer/strided_slice_15
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:
   calibration:
     sampling_size: 500
diff --git a/examples/tensorflow/nlp/transformer_lt_mlperf/quantization/ptq/transformer_lt_mlperf.yaml b/examples/tensorflow/nlp/transformer_lt_mlperf/quantization/ptq/transformer_lt_mlperf.yaml
index e2dd4628755..8973f47e33d 100644
--- a/examples/tensorflow/nlp/transformer_lt_mlperf/quantization/ptq/transformer_lt_mlperf.yaml
+++ b/examples/tensorflow/nlp/transformer_lt_mlperf/quantization/ptq/transformer_lt_mlperf.yaml
@@ -21,6 +21,8 @@ model:
   inputs: input_tokens
   outputs: model/Transformer/strided_slice_15
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:
   calibration:
     sampling_size: 500
diff --git a/examples/tensorflow/nlp/transformer_lt_mlperf/quantization/ptq/transformer_lt_mlperf_itex.yaml b/examples/tensorflow/nlp/transformer_lt_mlperf/quantization/ptq/transformer_lt_mlperf_itex.yaml
new file mode 100644
index 00000000000..ead560394db
--- /dev/null
+++ b/examples/tensorflow/nlp/transformer_lt_mlperf/quantization/ptq/transformer_lt_mlperf_itex.yaml
@@ -0,0 +1,39 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+version: 1.0
+
+model:
+  name: transformer_lt_mlperf
+  framework: tensorflow_itex
+  inputs: input_tokens
+  outputs: model/Transformer/strided_slice_15
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:
+  calibration:
+    sampling_size: 500
+  model_wise:
+    weight:
+      granularity: per_channel
+
+tuning:
+  accuracy_criterion:
+    relative: 0.01
+  exit_policy:
+    timeout: 0
+    max_trials: 100
+  random_seed: 9527
diff --git a/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/README.md b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/README.md
index 96cdfe2476a..fe69338bff7 100644
--- a/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/README.md
+++ b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/README.md
@@ -1,7 +1,7 @@
 Step-by-Step
 ============
 
-This document is used to list steps of reproducing TensorFlow Object Detection models tuning results.
+This document is used to list steps of reproducing TensorFlow Object Detection models tuning results. This example can run on Intel CPUs and GPUs.
 Currently, we've enabled below models.
  * ssd_resnet50_v1
  * ssd_resnet34
@@ -20,11 +20,13 @@ Recommend python 3.6 or higher version.
 # Install Intel® Neural Compressor
 pip install neural-compressor
 ```
+
 ### 2. Install Intel Tensorflow
 ```shell
 pip install intel-tensorflow
 ```
 > Note: Supported Tensorflow [Version](../../../../../../README.md#supported-frameworks).
+
 ### 3. Installation Dependency packages
 ```shell
 cd examples/tensorflow/object_detection/tensorflow_models/quantization/ptq
@@ -36,7 +38,24 @@ pip install -r requirements.txt
 `Protocol Buffer Compiler` in version higher than 3.0.0 is necessary ingredient for automatic COCO dataset preparation. To install please follow
 [Protobuf installation instructions](https://grpc.io/docs/protoc-installation/#install-using-a-package-manager).
 
-### 5. Prepare Dataset
+### 5. Install Intel Extension for Tensorflow if needed
+
+#### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+#### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+
+### 6. Prepare Dataset
 
 #### Automatic dataset download
 
@@ -56,7 +75,7 @@ tensorflow records using the `https://github.com/tensorflow/models.git` dedicate
 #### Manual dataset download
 Download CoCo Dataset from [Official Website](https://cocodataset.org/#download).
 
-### 6. Download Model
+### 7. Download Model
 
 #### Automated approach
 Run the `prepare_model.py` script located in `examples/tensorflow/object_detection/tensorflow_models/quantization/ptq`.
@@ -203,15 +222,17 @@ if input_graph:
 ```
 
 ### Write Yaml config file
-In examples directory, there is a ssd_resnet50_v1.yaml. We could remove most of items and only keep mandatory item for tuning.
+In examples directory, there is a ssd_resnet50_v1.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The ssd_resnet50_v1_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
 
 ```yaml
 model:                                               # mandatory. used to specify model specific information.
   name: ssd_resnet50_v1
-  framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+  framework: tensorflow                              # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
   inputs: image_tensor
   outputs: num_detections,detection_boxes,detection_scores,detection_classes
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 100                               # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/faster_rcnn_inception_resnet_v2.yaml b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/faster_rcnn_inception_resnet_v2.yaml
index 5fef8f7170d..2452828cee3 100644
--- a/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/faster_rcnn_inception_resnet_v2.yaml
+++ b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/faster_rcnn_inception_resnet_v2.yaml
@@ -15,10 +15,12 @@
 
 model:                                               # mandatory. used to specify model specific information.
   name: faster_rcnn_inception_resnet_v2
-  framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+  framework: tensorflow                              # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
   inputs: image_tensor
   outputs: num_detections,detection_boxes,detection_scores,detection_classes
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 10, 50, 100, 200                  # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/faster_rcnn_inception_resnet_v2_itex.yaml b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/faster_rcnn_inception_resnet_v2_itex.yaml
new file mode 100644
index 00000000000..24039f9700d
--- /dev/null
+++ b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/faster_rcnn_inception_resnet_v2_itex.yaml
@@ -0,0 +1,77 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: faster_rcnn_inception_resnet_v2
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+  inputs: image_tensor
+  outputs: num_detections,detection_boxes,detection_scores,detection_classes
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 10, 50, 100, 200                  # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:                                      # optional. if not specified, user need construct a q_dataloader in code for neural_compressor.Quantization.
+      dataset:
+        COCORecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to coco2017 validation dataset TFRecord
+      transform:
+        Resize:
+          size: 600
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+    weight:
+      granularity: per_tensor
+      algorithm: minmax
+
+evaluation:                                          # optional. used to config evaluation process.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric: 
+      COCOmAPv2:
+        output_index_mapping:
+          num_detections: 0
+          boxes: 1
+          scores: 2
+          classes: 3
+    dataloader:                                      # optional. if not specified, user need construct a q_dataloader in code for neural_compressor.Quantization.
+      batch_size: 10
+      dataset:
+        COCORecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to coco2017 validation dataset TFRecord
+      transform:
+        Resize:
+          size: 600
+  performance:
+    iteration: 100
+    configs:
+      cores_per_instance: 28
+      num_of_instance: 1
+      kmp_blocktime: 1
+    dataloader:
+      batch_size: 10
+      dataset:
+        COCORecord:
+          root: /path/to/evaluation/dataset
+      transform:
+        Resize:
+          size: 600
+tuning:
+  accuracy_criterion:
+    absolute:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/faster_rcnn_resnet101.yaml b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/faster_rcnn_resnet101.yaml
index 8fe1deaaf62..a879c83451d 100644
--- a/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/faster_rcnn_resnet101.yaml
+++ b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/faster_rcnn_resnet101.yaml
@@ -15,10 +15,12 @@
 
 model:                                               # mandatory. used to specify model specific information.
   name: faster_rcnn_resnet101
-  framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+  framework: tensorflow                              # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
   inputs: image_tensor
   outputs: num_detections,detection_boxes,detection_scores,detection_classes
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 10, 50, 100, 200                  # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/faster_rcnn_resnet101_itex.yaml b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/faster_rcnn_resnet101_itex.yaml
new file mode 100644
index 00000000000..e7ab49ab0dc
--- /dev/null
+++ b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/faster_rcnn_resnet101_itex.yaml
@@ -0,0 +1,71 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: faster_rcnn_resnet101
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+  inputs: image_tensor
+  outputs: num_detections,detection_boxes,detection_scores,detection_classes
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 10, 50, 100, 200                  # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:                                      # optional. if not specified, user need construct a q_dataloader in code for neural_compressor.Quantization.
+      dataset:
+        COCORecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to coco2017 validation dataset TFRecord
+      transform:
+        Resize:
+          size: 600
+
+evaluation:                                          # optional. used to config evaluation process.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric: 
+      COCOmAPv2:
+        output_index_mapping:
+          num_detections: 0
+          boxes: 1
+          scores: 2
+          classes: 3
+    dataloader:                                      # optional. if not specified, user need construct a q_dataloader in code for neural_compressor.Quantization.
+      batch_size: 10
+      dataset:
+        COCORecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to coco2017 validation dataset TFRecord
+      transform:
+        Resize:
+          size: 600
+  performance:
+    iteration: 100
+    configs:
+      cores_per_instance: 28
+      num_of_instance: 1
+      kmp_blocktime: 1
+    dataloader:
+      batch_size: 10
+      dataset:
+        COCORecord:
+          root: /path/to/evaluation/dataset
+      transform:
+        Resize:
+          size: 600
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/faster_rcnn_resnet50.yaml b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/faster_rcnn_resnet50.yaml
index b07bc4fca5b..91f07504f4a 100644
--- a/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/faster_rcnn_resnet50.yaml
+++ b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/faster_rcnn_resnet50.yaml
@@ -17,10 +17,12 @@ version: 1.0
 
 model:                                               # mandatory. used to specify model specific information.
   name: faster_rcnn_resnet50
-  framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+  framework: tensorflow                              # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
   inputs: image_tensor
   outputs: num_detections,detection_boxes,detection_scores,detection_classes
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 10, 50, 100, 200                  # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/faster_rcnn_resnet50_itex.yaml b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/faster_rcnn_resnet50_itex.yaml
new file mode 100644
index 00000000000..61254dfd224
--- /dev/null
+++ b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/faster_rcnn_resnet50_itex.yaml
@@ -0,0 +1,73 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+version: 1.0
+
+model:                                               # mandatory. used to specify model specific information.
+  name: faster_rcnn_resnet50
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+  inputs: image_tensor
+  outputs: num_detections,detection_boxes,detection_scores,detection_classes
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 10, 50, 100, 200                  # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:                                      # optional. if not specified, user need construct a q_dataloader in code for neural_compressor.Quantization.
+      dataset:
+        COCORecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to coco2017 validation dataset TFRecord
+      transform:
+        Resize:
+          size: 600
+
+evaluation:                                          # optional. used to config evaluation process.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric: 
+      COCOmAPv2:
+        output_index_mapping:
+          num_detections: 0
+          boxes: 1
+          scores: 2
+          classes: 3
+    dataloader:                                      # optional. if not specified, user need construct a q_dataloader in code for neural_compressor.Quantization.
+      batch_size: 10
+      dataset:
+        COCORecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to coco2017 validation dataset TFRecord
+      transform:
+        Resize:
+          size: 600
+  performance:
+    iteration: 100
+    configs:
+      cores_per_instance: 28
+      num_of_instance: 1
+      kmp_blocktime: 1
+    dataloader:
+      batch_size: 10
+      dataset:
+        COCORecord:
+          root: /path/to/evaluation/dataset
+      transform:
+        Resize:
+          size: 600
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/mask_rcnn_inception_v2.yaml b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/mask_rcnn_inception_v2.yaml
index 5114765fc69..c89ac397954 100644
--- a/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/mask_rcnn_inception_v2.yaml
+++ b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/mask_rcnn_inception_v2.yaml
@@ -15,10 +15,12 @@
 
 model:                                               # mandatory. used to specify model specific information.
   name: mask_rcnn_inception_v2
-  framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+  framework: tensorflow                              # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
   inputs: image_tensor
   outputs: num_detections,detection_boxes,detection_scores,detection_classes
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 50                                # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/mask_rcnn_inception_v2_itex.yaml b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/mask_rcnn_inception_v2_itex.yaml
new file mode 100644
index 00000000000..a253f9e1a27
--- /dev/null
+++ b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/mask_rcnn_inception_v2_itex.yaml
@@ -0,0 +1,82 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: mask_rcnn_inception_v2
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+  inputs: image_tensor
+  outputs: num_detections,detection_boxes,detection_scores,detection_classes
+
+device: cpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50                                # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:                                      # optional. if not specified, user need construct a q_dataloader in code for neural_compressor.Quantization.
+      dataset:
+        COCORecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to coco2017 validation dataset TFRecord
+      filter:
+        LabelBalance: 
+          size: 1
+  #op_wise: {
+  #        'FirstStageFeatureExtractor/InceptionV2/InceptionV2/Conv2d_1a_7x7/separable_conv2d': {
+  #          'activation':  {'dtype': ['fp32']},
+  #          }
+  #        }
+
+evaluation:                                          # optional. used to config evaluation process.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric: 
+      COCOmAPv2:
+        output_index_mapping:
+          num_detections: 0
+          boxes: 1
+          scores: 2
+          classes: 3
+    dataloader:                                      # optional. if not specified, user need construct a q_dataloader in code for neural_compressor.Quantization.
+      batch_size: 1
+      dataset:
+        COCORecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to coco2017 validation dataset TFRecord
+      transform:
+        ResizeWithRatio: 
+          min_dim: 800
+          max_dim: 1356
+          padding: False
+
+  performance:
+    iteration: 100
+    configs:
+      cores_per_instance: 28
+      num_of_instance: 1
+      kmp_blocktime: 1
+    dataloader:
+      batch_size: 10
+      dataset:
+        COCORecord:
+          root: /path/to/evaluation/dataset
+      transform:
+        ResizeWithRatio: 
+          min_dim: 800
+          max_dim: 1356
+          padding: True
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 2%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/ssd_mobilenet_v1.yaml b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/ssd_mobilenet_v1.yaml
index 66d8e77a8ad..ded63da480e 100644
--- a/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/ssd_mobilenet_v1.yaml
+++ b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/ssd_mobilenet_v1.yaml
@@ -15,10 +15,12 @@
 
 model:                                               # mandatory. used to specify model specific information.
   name: ssd_mobilenet_v1
-  framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+  framework: tensorflow                              # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
   inputs: image_tensor
   outputs: num_detections,detection_boxes,detection_scores,detection_classes
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 10, 50, 100, 200                  # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/ssd_mobilenet_v1_itex.yaml b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/ssd_mobilenet_v1_itex.yaml
new file mode 100644
index 00000000000..c312ecc348f
--- /dev/null
+++ b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/ssd_mobilenet_v1_itex.yaml
@@ -0,0 +1,74 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: ssd_mobilenet_v1
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+  inputs: image_tensor
+  outputs: num_detections,detection_boxes,detection_scores,detection_classes
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 10, 50, 100, 200                  # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:                                      # optional. if not specified, user need construct a q_dataloader in code for neural_compressor.Quantization.
+      dataset:
+        COCORecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to coco2017 validation dataset TFRecord
+      transform:
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+    weight:
+      algorithm: minmax
+
+evaluation:                                          # optional. used to config evaluation process.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric: 
+      COCOmAPv2:
+        output_index_mapping:
+          num_detections: 0
+          boxes: 1
+          scores: 2
+          classes: 3
+    dataloader:                                      # optional. if not specified, user need construct a q_dataloader in code for neural_compressor.Quantization.
+      batch_size: 10
+      dataset:
+        COCORecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to coco2017 validation dataset TFRecord
+      transform:
+        Resize:
+          size: 300
+  performance:
+    iteration: 100
+    configs:
+      cores_per_instance: 28
+      num_of_instance: 1
+      kmp_blocktime: 1
+    dataloader:
+      batch_size: 10
+      dataset:
+        COCORecord:
+          root: /path/to/evaluation/dataset
+      transform:
+        Resize:
+          size: 300
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/ssd_resnet34.yaml b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/ssd_resnet34.yaml
index 6d0fd28e76b..e4e1b978277 100644
--- a/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/ssd_resnet34.yaml
+++ b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/ssd_resnet34.yaml
@@ -15,10 +15,12 @@
 
 model:                                               # mandatory. used to specify model specific information.
   name: ssd_resnet34
-  framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+  framework: tensorflow                              # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
   inputs: image
   outputs: detection_bboxes,detection_scores,detection_classes
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 100                  # optional. default value is the size of whole dataset. used to set how many portions of calibration dataset is used. exclusive with iterations field.
diff --git a/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/ssd_resnet34_itex.yaml b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/ssd_resnet34_itex.yaml
new file mode 100644
index 00000000000..caa95464a5e
--- /dev/null
+++ b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/ssd_resnet34_itex.yaml
@@ -0,0 +1,90 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: ssd_resnet34
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+  inputs: image
+  outputs: detection_bboxes,detection_scores,detection_classes
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 100                  # optional. default value is the size of whole dataset. used to set how many portions of calibration dataset is used. exclusive with iterations field.
+    dataloader:                                      # optional. if not specified, user need construct a q_dataloader in code for neural_compressor.Quantization.
+      batch_size: 1
+      dataset:
+        COCORecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to coco2017 validation raw image folder
+      transform:
+        Rescale: {}
+        Normalize:
+          mean: [0.485, 0.456, 0.406]
+          std: [0.229, 0.224, 0.225]
+        Resize:
+          size: 1200
+
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+    weight:
+      algorithm: minmax
+      granularity: per_channel
+
+evaluation:                                          # optional. used to config evaluation process.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      COCOmAP:
+        anno_path: /path/to/annotation
+
+    dataloader:                                      # optional. if not specified, user need construct a q_dataloader in code for neural_compressor.Quantization.
+      batch_size: 1
+      dataset:
+        COCORecord:
+          root: /path/to/evaluation/dataset/          # NOTE: modify to coco2017 validation raw image datafolder
+      transform:
+        Rescale: {}
+        Normalize:
+          mean: [0.485, 0.456, 0.406]
+          std: [0.229, 0.224, 0.225]
+        Resize:
+          size: 1200
+
+  performance:
+    iteration: 100
+    configs:
+      cores_per_instance: 28
+      num_of_instance: 1
+      kmp_blocktime: 1
+    dataloader:
+      batch_size: 1
+      dataset:
+        COCORecord:
+          root: /path/to/evaluation/dataset/
+      transform:
+        Rescale: {}
+        Normalize:
+          mean: [0.485, 0.456, 0.406]
+          std: [0.229, 0.224, 0.225]
+        Resize:
+          size: 1200
+
+tuning:
+  accuracy_criterion:
+    absolute:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/ssd_resnet50_v1.yaml b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/ssd_resnet50_v1.yaml
index b7c505a2751..7ea3c76f660 100644
--- a/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/ssd_resnet50_v1.yaml
+++ b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/ssd_resnet50_v1.yaml
@@ -15,10 +15,12 @@
 
 model:                                               # mandatory. used to specify model specific information.
   name: ssd_resnet50_v1
-  framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+  framework: tensorflow                              # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
   inputs: image_tensor
   outputs: num_detections,detection_boxes,detection_scores,detection_classes
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 100                               # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/ssd_resnet50_v1_itex.yaml b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/ssd_resnet50_v1_itex.yaml
new file mode 100644
index 00000000000..faff3306989
--- /dev/null
+++ b/examples/tensorflow/object_detection/tensorflow_models/quantization/ptq/ssd_resnet50_v1_itex.yaml
@@ -0,0 +1,79 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: ssd_resnet50_v1
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+  inputs: image_tensor
+  outputs: num_detections,detection_boxes,detection_scores,detection_classes
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 100                               # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:                                      # optional. if not specified, user need construct a q_dataloader in code for neural_compressor.Quantization.
+      dataset:
+        COCORecord:
+          root: /path/to/calibration/dataset         # NOTE: modify to coco2017 validation dataset TFRecord
+      transform:
+        Resize:
+          size: 640
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+    weight:
+      algorithm: minmax
+
+evaluation:                                          # optional. used to config evaluation process.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric: 
+      COCOmAPv2:
+        output_index_mapping:
+          num_detections: 0
+          boxes: 1
+          scores: 2
+          classes: 3
+    dataloader:                                      # optional. if not specified, user need construct a q_dataloader in code for neural_compressor.Quantization.
+      batch_size: 10
+      dataset:
+        COCORecord:
+          root: /path/to/evaluation/dataset          # NOTE: modify to coco2017 validation dataset TFRecord
+      transform:
+        Resize:
+          size: 640
+ 
+  performance:
+    iteration: 100
+    configs:
+      cores_per_instance: 28
+      num_of_instance: 1
+      kmp_blocktime: 1
+    dataloader:
+      batch_size: 10
+      dataset:
+        COCORecord:
+          root: /path/to/evaluation/dataset
+      transform:
+        Resize:
+          size: 640
+ 
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+    max_trials: 100                                  # optional. max tune times. default value is 100. combine with timeout field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/object_detection/yolo_v3/quantization/ptq/README.md b/examples/tensorflow/object_detection/yolo_v3/quantization/ptq/README.md
index 2677d7c26a8..5f5a5a22033 100644
--- a/examples/tensorflow/object_detection/yolo_v3/quantization/ptq/README.md
+++ b/examples/tensorflow/object_detection/yolo_v3/quantization/ptq/README.md
@@ -1,4 +1,4 @@
-This document describes the step-by-step to reproduce Yolo-v3 tuning result with Neural Compressor.
+This document describes the step-by-step to reproduce Yolo-v3 tuning result with Neural Compressor. This example can run on Intel CPUs and GPUs.
 
 ## Prerequisite
 
@@ -10,11 +10,10 @@ Recommend python 3.6 or higher version.
 # Install Intel® Neural Compressor
 pip install neural-compressor
 ```
+
 ### 2. Install Intel Tensorflow
 ```shell
-Check your python version and pip install 1.15.0 up3 from links as below:   
-pip install https://storage.googleapis.com/intel-optimized-tensorflow/intel_tensorflow-1.15.0up3-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl   
-pip install https://storage.googleapis.com/intel-optimized-tensorflow/intel_tensorflow-1.15.0up3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
+pip install intel-tensorflow
 ```
 > Note: Supported Tensorflow versions please refer to Neural Compressor readme file.
 
@@ -24,28 +23,44 @@ cd examples/tensorflow/object_detection/yolo_v3/quantization/ptq
 pip install -r requirements.txt
 ```
 
-### 4. Downloaded Yolo-v3 model
+### 4. Install Intel Extension for Tensorflow if needed
+#### Tuning the model on Intel CPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+#### Tuning the model on Intel GPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+
+### 5. Downloaded Yolo-v3 model
 ```shell
 git clone https://github.com/mystic123/tensorflow-yolo-v3.git
 cd tensorflow-yolo-v3
 ```
 
-### 5. Download COCO Class Names File
+### 6. Download COCO Class Names File
 ```shell
 wget https://raw.githubusercontent.com/pjreddie/darknet/master/data/coco.names
 ```
 
-### 6. Download Model Weights (Full):
+### 7. Download Model Weights (Full):
 ```shell
 wget https://pjreddie.com/media/files/yolov3.weights
 ```
 
-### 7. Generate PB:
+### 8. Generate PB:
 ```shell
 python convert_weights_pb.py --class_names coco.names --weights_file yolov3.weights --data_format NHWC --size 416 --output_graph yolov3.pb
 ```
 
-### 8. Prepare Dataset
+### 9. Prepare Dataset
 
 #### Automatic dataset download
 
@@ -67,9 +82,12 @@ Download CoCo Dataset from [Official Website](https://cocodataset.org/#download)
 
 ## Get Quantized Yolo-v3 model with Neural Compressor
 
-### 1.Config the yolo_v3.yaml with the valid cocoraw data path.
+### 1.Config the yolo_v3.yaml with the valid cocoraw data path or the yolo_v3_itex.yaml if using the Intel Extension for Tensorflow.
+
+### 2.Config the yaml file
+In examples directory, there is a yolo_v3.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The yolo_v3_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
 
-### 2.Run below command one by one.
+### 3.Run below command one by one.
 Usage
 ```shell
 cd examples/tensorflow/object_detection/yolo_v3/quantization/ptq
diff --git a/examples/tensorflow/object_detection/yolo_v3/quantization/ptq/yolo_v3.yaml b/examples/tensorflow/object_detection/yolo_v3/quantization/ptq/yolo_v3.yaml
index 2ea0e167e18..91197b50762 100644
--- a/examples/tensorflow/object_detection/yolo_v3/quantization/ptq/yolo_v3.yaml
+++ b/examples/tensorflow/object_detection/yolo_v3/quantization/ptq/yolo_v3.yaml
@@ -4,6 +4,8 @@ model:                                               # mandatory. neural_compres
   inputs: inputs
   outputs: output_boxes
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 2                  # optional. default value is the size of whole dataset. used to set how many portions of calibration dataset is used. exclusive with iterations field.
diff --git a/examples/tensorflow/object_detection/yolo_v3/quantization/ptq/yolo_v3_itex.yaml b/examples/tensorflow/object_detection/yolo_v3/quantization/ptq/yolo_v3_itex.yaml
new file mode 100644
index 00000000000..6a8c4483d9b
--- /dev/null
+++ b/examples/tensorflow/object_detection/yolo_v3/quantization/ptq/yolo_v3_itex.yaml
@@ -0,0 +1,85 @@
+model:                                               # mandatory. neural_compressor uses this model name and framework name to decide where to save tuning history and deploy yaml.
+  name: yolo_v3
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, or mxnet; allow new framework backend extension.
+  inputs: inputs
+  outputs: output_boxes
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 2                  # optional. default value is the size of whole dataset. used to set how many portions of calibration dataset is used. exclusive with iterations field.
+    dataloader:                                      # optional. if not specified, user need construct a q_dataloader in code for neural_compressor.Quantization.
+      batch_size: 1
+      dataset:
+        COCORecord:
+          root: /path/to/calibration/dataset
+      filter:
+        LabelBalance:
+          size: 1
+      transform:
+        ParseDecodeCoco:
+        ResizeWithRatio:
+          min_dim: 416
+          max_dim: 416
+          padding: True
+
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+    weight:
+      granularity: per_channel
+  op_wise: {                                         # optional. tuning constraints on op-wise for advance user to reduce tuning space.
+         'detector/yolo-v3/Conv_6/Conv2D': {
+           'activation':  {'dtype': ['fp32']},
+         },
+         'detector/yolo-v3/Conv_14/Conv2D': {
+           'activation':  {'dtype': ['fp32']},
+         },
+         'detector/yolo-v3/Conv_22/Conv2D': {
+           'activation':  {'dtype': ['fp32']},
+         }
+       }
+
+evaluation:                                          # optional. used to config evaluation process.
+  accuracy:
+    metric:
+      COCOmAP:
+        map_key: 'DetectionBoxes_Precision/mAP@.50IOU'
+    dataloader:
+      batch_size: 1
+      dataset:
+        COCORecord:
+          root: /path/to/evaluation/dataset
+      transform:
+        ParseDecodeCoco: {}
+        ResizeWithRatio:
+          min_dim: 416
+          max_dim: 416
+          padding: True
+          constant_value: 128
+  performance:
+    iteration: 100
+    configs:
+      cores_per_instance: 28
+      num_of_instance: 1
+      kmp_blocktime: 1
+    dataloader:
+      batch_size: 10
+      dataset:
+        COCORecord:
+          root: /path/to/evaluation/dataset
+      transform:
+        ParseDecodeCoco:
+        ResizeWithRatio:
+          min_dim: 416
+          max_dim: 416
+          padding: True
+          constant_value: 128
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/oob_models/quantization/ptq/README.md b/examples/tensorflow/oob_models/quantization/ptq/README.md
index 85134117ee3..341956d52fc 100644
--- a/examples/tensorflow/oob_models/quantization/ptq/README.md
+++ b/examples/tensorflow/oob_models/quantization/ptq/README.md
@@ -15,11 +15,27 @@ This document is used to list steps of reproducing Intel Optimized TensorFlow OO
   ```
 > Note: Supported Tensorflow [Version](../../../../../README.md#supported-frameworks).
 
-## 2. Prepare Dataset
+## 2. Install Intel Extension for Tensorflow if needed
+### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+
+## 3. Prepare Dataset
 
   We use dummy data to do benchmarking with Tensorflow OOB models.
 
-## 3. Prepare pre-trained model
+## 4. Prepare pre-trained model
 
   * Get model from [open_model_zoo](https://github.com/openvinotoolkit/open_model_zoo/tree/2021.4/tools/downloader/README.md) 
 
@@ -70,6 +86,8 @@ List models names can get with open_model_zoo:
 |	ssd_inception_v2_coco	|	http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_2018_01_28.tar.gz	|
 |	ssd-resnet34 300x300	|	https://storage.googleapis.com/intel-optimized-tensorflow/models/v1_6/ssd_resnet34_fp32_bs1_pretrained_model.pb	|
 
+## 5. Config the yaml file
+In examples directory, there is a config.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The config_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
 
 # Run
 ## run tuning
diff --git a/examples/tensorflow/oob_models/quantization/ptq/config.yaml b/examples/tensorflow/oob_models/quantization/ptq/config.yaml
index 7f86addab02..86e26e1ff8c 100644
--- a/examples/tensorflow/oob_models/quantization/ptq/config.yaml
+++ b/examples/tensorflow/oob_models/quantization/ptq/config.yaml
@@ -19,6 +19,8 @@ model:                                               # mandatory. used to specif
   inputs: input
   outputs: output
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 1                                 # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/oob_models/quantization/ptq/config_itex.yaml b/examples/tensorflow/oob_models/quantization/ptq/config_itex.yaml
new file mode 100644
index 00000000000..44db3b3910c
--- /dev/null
+++ b/examples/tensorflow/oob_models/quantization/ptq/config_itex.yaml
@@ -0,0 +1,38 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: oob_models
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+  inputs: input
+  outputs: output
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 1                                 # optional. default value is 100. used to set how many samples should be used in calibration.
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+    weight:
+      algorithm: minmax
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/recommendation/wide_deep_large_ds/quantization/ptq/README.md b/examples/tensorflow/recommendation/wide_deep_large_ds/quantization/ptq/README.md
index 708c4001395..c4c79149fb7 100644
--- a/examples/tensorflow/recommendation/wide_deep_large_ds/quantization/ptq/README.md
+++ b/examples/tensorflow/recommendation/wide_deep_large_ds/quantization/ptq/README.md
@@ -2,6 +2,7 @@ Step-by-Step
 ============
 
 This document is used to list steps of reproducing TensorFlow Wide & Deep tuning zoo result.
+This example can run on Intel CPUs and GPUs.
 
 ## Prerequisite
 
@@ -16,13 +17,29 @@ pip install intel-tensorflow
 ```
 > Note: Supported Tensorflow [Version](../../../../../../README.md#supported-frameworks).
 
-### 3. Install Additional Dependency packages
+### 3. Install Intel Extension for Tensorflow if needed
+#### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+#### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+
+### 4. Install Additional Dependency packages
 ```shell
 cd examples/tensorflow/recommendation/wide_deep_large_ds/quantization/ptq
 pip install -r requirements.txt
 ```
 
-### 4. Prepare Dataset
+### 5. Prepare Dataset
 Download training dataset: (8 million samples)
 ```bash
 $ wget https://storage.googleapis.com/dataset-uploader/criteo-kaggle/large_version/train.csv
@@ -32,7 +49,7 @@ Download evaluation dataset (2 million samples)
 $ wget https://storage.googleapis.com/dataset-uploader/criteo-kaggle/large_version/eval.csv
 ```
 
-### 5. Process Dataset
+### 6. Process Dataset
 Process calib dataset
 ```bash
 python preprocess_csv_tfrecords.py \
@@ -51,12 +68,15 @@ Two .tfrecords files are generated and will be used later on:
 1) train_processed_data.tfrecords
 2) eval_processed_data.tfrecords
 
-### 6. Download Frozen PB
+### 7. Download Frozen PB
 ```shell
 wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v1_6/wide_deep_fp32_pretrained_model.pb
 ```
 
-### 7. Run Command
+### 8. Config the yaml file
+In examples directory, there is a wide_deep_large_ds.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The wide_deep_large_ds_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
+
+### 9. Run Command
   # The cmd of running WnD
   ```shell
   bash run_tuning.sh --dataset_location=/path/to/datasets --input_model=/path/to/wide_deep_fp32_pretrained_model.pb --output_model=./wnd_int8_opt.pb
diff --git a/examples/tensorflow/recommendation/wide_deep_large_ds/quantization/ptq/wide_deep_large_ds.yaml b/examples/tensorflow/recommendation/wide_deep_large_ds/quantization/ptq/wide_deep_large_ds.yaml
index ec9152c457f..f464c17260d 100644
--- a/examples/tensorflow/recommendation/wide_deep_large_ds/quantization/ptq/wide_deep_large_ds.yaml
+++ b/examples/tensorflow/recommendation/wide_deep_large_ds/quantization/ptq/wide_deep_large_ds.yaml
@@ -19,6 +19,8 @@ model:                                               # mandatory. used to specif
   inputs: new_numeric_placeholder,new_categorical_placeholder
   outputs: import/head/predictions/probabilities     # optional. inputs and outputs fields are only required for tensorflow backend.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
   calibration:
     sampling_size: 2000                              # optional. default value is 100. used to set how many samples should be used in calibration.
diff --git a/examples/tensorflow/recommendation/wide_deep_large_ds/quantization/ptq/wide_deep_large_ds_itex.yaml b/examples/tensorflow/recommendation/wide_deep_large_ds/quantization/ptq/wide_deep_large_ds_itex.yaml
new file mode 100644
index 00000000000..59a9d2e6ee3
--- /dev/null
+++ b/examples/tensorflow/recommendation/wide_deep_large_ds/quantization/ptq/wide_deep_large_ds_itex.yaml
@@ -0,0 +1,42 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+model:                                               # mandatory. used to specify model specific information.
+  name: wide_deep_large_ds
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+  inputs: new_numeric_placeholder,new_categorical_placeholder
+  outputs: import/head/predictions/probabilities     # optional. inputs and outputs fields are only required for tensorflow backend.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 2000                              # optional. default value is 100. used to set how many samples should be used in calibration.
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    activation:
+      algorithm: minmax
+  op_wise: {
+             'import/dnn/hiddenlayer_0/MatMul': {
+               'activation':  {'dtype': ['uint8'], 'algorithm': ['minmax'], 'scheme':['asym']},
+             }
+           }
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+    max_trials: 100                                  # optional. max tune times. default value is 100. combine with timeout field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/semantic_image_segmentation/3dunet-mlperf/quantization/ptq/3dunet-mlperf.yaml b/examples/tensorflow/semantic_image_segmentation/3dunet-mlperf/quantization/ptq/3dunet-mlperf.yaml
index 529197e59c0..2d973ab66e9 100644
--- a/examples/tensorflow/semantic_image_segmentation/3dunet-mlperf/quantization/ptq/3dunet-mlperf.yaml
+++ b/examples/tensorflow/semantic_image_segmentation/3dunet-mlperf/quantization/ptq/3dunet-mlperf.yaml
@@ -15,6 +15,8 @@
 
 version: 1.0
 
+device: cpu                     # optional. default value is cpu, other value is gpu.
+
 model:
   name: 3dunet-mlperf
   framework: tensorflow
diff --git a/examples/tensorflow/semantic_image_segmentation/3dunet-mlperf/quantization/ptq/3dunet-mlperf_itex.yaml b/examples/tensorflow/semantic_image_segmentation/3dunet-mlperf/quantization/ptq/3dunet-mlperf_itex.yaml
new file mode 100644
index 00000000000..801aec72d52
--- /dev/null
+++ b/examples/tensorflow/semantic_image_segmentation/3dunet-mlperf/quantization/ptq/3dunet-mlperf_itex.yaml
@@ -0,0 +1,34 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+version: 1.0
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+model:
+  name: 3dunet-mlperf
+  framework: tensorflow_itex
+
+quantization:
+  calibration:
+    sampling_size: 40
+
+tuning:
+  accuracy_criterion:
+    relative: 0.01
+  exit_policy:
+    timeout: 0
+    max_trials: 100
+  random_seed: 9527
diff --git a/examples/tensorflow/semantic_image_segmentation/3dunet-mlperf/quantization/ptq/README.md b/examples/tensorflow/semantic_image_segmentation/3dunet-mlperf/quantization/ptq/README.md
index d8a6023978e..5a2048e00b5 100644
--- a/examples/tensorflow/semantic_image_segmentation/3dunet-mlperf/quantization/ptq/README.md
+++ b/examples/tensorflow/semantic_image_segmentation/3dunet-mlperf/quantization/ptq/README.md
@@ -2,6 +2,7 @@ Step-by-Step
 ============
 
 This document is used to list steps of reproducing TensorFlow Intel® Neural Compressor tuning zoo result of 3dunet-mlperf.
+This example can run on Intel CPUs and GPUs.
 
 ## Prerequisite
 
@@ -17,22 +18,41 @@ pip install intel-tensorflow
 ```
 > Note: Supported Tensorflow [Version](../../../../../../README.md#supported-frameworks).
 
-### 3. Download BraTS 2019 dataset
+### 3. Install Intel Extension for Tensorflow if needed
+#### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+#### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+
+### 4. Download BraTS 2019 dataset
    Please download [Brats 2019](https://www.med.upenn.edu/cbica/brats2019/data.html)
    separately and unzip the dataset. The directory that contains the dataset files will be
    passed to the launch script when running the benchmarking script.
 
-### 4. Download Pre-trained model
+### 5. Download Pre-trained model
    Download the pre-trained model from the
    [3DUnetCNN](https://storage.googleapis.com/intel-optimized-tensorflow/models/v2_7_0/3dunet_dynamic_ndhwc.pb).
    In this example, we are using the model,
    trained using the fold 1 BRATS 2019 data.
    The validation files have been copied from [here](https://github.com/mlcommons/inference/tree/r0.7/vision/medical_imaging/3d-unet/folds)
 
-### 5. Prepare Calibration set
+### 6. Prepare Calibration set
    The calibration set is the forty images listed in brats_cal_images_list.txt. They are randomly selected from Fold 0, Fold 2, Fold 3, and Fold 4 of BraTS 2019 Training Dataset.
 
-### 6. Test command
+### 7. Config the yaml file
+In examples directory, there is a 3dunet-mlperf.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The 3dunet-mlperf_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
+
+### 8. Test command
 * `export nnUNet_preprocessed=<path/to/build>/build/preprocessed_data`
 * `export nnUNet_raw_data_base=<path/to/build>/build/raw_data`
 * `export RESULTS_FOLDER=<path/to/build>/build/result`
diff --git a/examples/tensorflow/semantic_image_segmentation/deeplab/quantization/ptq/README.md b/examples/tensorflow/semantic_image_segmentation/deeplab/quantization/ptq/README.md
index 574afc379f4..47629480edd 100644
--- a/examples/tensorflow/semantic_image_segmentation/deeplab/quantization/ptq/README.md
+++ b/examples/tensorflow/semantic_image_segmentation/deeplab/quantization/ptq/README.md
@@ -2,6 +2,7 @@ Step-by-Step
 ============
 
 This document list steps of reproducing Intel Optimized TensorFlow image recognition models tuning results via Neural Compressor.
+This example can run on Intel CPUs and GPUs.
 
 > **Note**: 
 > Most of those models are both supported in Intel optimized TF 1.15.x and Intel optimized TF 2.x.
@@ -16,14 +17,36 @@ This document list steps of reproducing Intel Optimized TensorFlow image recogni
   pip install -r requirements.txt
   ```
 
-### 2. Prepare Dataset
+### 2. Install Intel Tensorflow
+```shell
+pip install intel-tensorflow
+```
+> Note: Supported Tensorflow [Version](../../../../../../README.md#supported-frameworks).
+
+### 3. Install Intel Extension for Tensorflow if needed
+#### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+#### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+
+### 4. Prepare Dataset
 Please use the script under the folder `datasets` to download and convert PASCAL VOC 2012 semantic segmentation dataset to TFRecord. Refer to [Running DeepLab on PASCAL VOC 2012 Semantic Segmentation Dataset](https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/pascal.md#running-deeplab-on-pascal-voc-2012-semantic-segmentation-dataset) for more details.
 ```shell
 # From the examples/tensorflow/semantic_image_segmentation/deeplab/datasets directory.
 sh download_and_convert_voc2012.sh
 ```
 
-### 3. Prepare pre-trained model
+### 5. Prepare pre-trained model
 Refer to [Export trained deeplab model to frozen inference graph](https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/export_model.md#export-trained-deeplab-model-to-frozen-inference-graph) for more details.
 
 1. Download the checkpoint file
@@ -80,12 +103,11 @@ We provide Deeplab model pretrained on PASCAL VOC 2012, Using mIOU as metric whi
 
 ### Write Yaml config file
 
-In examples directory, there is a template.yaml. We could remove most of the items and only keep mandatory item for tuning. 
-
+In examples directory, there is a deeplab.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The deeplab_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
 
 ```yaml
 # deeplab.yaml
-
+device: cpu                                          # optional. default value is cpu, other value is gpu.
 model:                                               # mandatory. neural_compressor uses this model name and framework name to decide where to save tuning history and deploy yaml.
   name: deeplab
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
diff --git a/examples/tensorflow/semantic_image_segmentation/deeplab/quantization/ptq/deeplab.yaml b/examples/tensorflow/semantic_image_segmentation/deeplab/quantization/ptq/deeplab.yaml
index ad98faedd8e..98b362456f5 100644
--- a/examples/tensorflow/semantic_image_segmentation/deeplab/quantization/ptq/deeplab.yaml
+++ b/examples/tensorflow/semantic_image_segmentation/deeplab/quantization/ptq/deeplab.yaml
@@ -13,6 +13,8 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 model:                                               # mandatory. neural_compressor uses this model name and framework name to decide where to save tuning history and deploy yaml.
   name: deeplab
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
diff --git a/examples/tensorflow/semantic_image_segmentation/deeplab/quantization/ptq/deeplab_itex.yaml b/examples/tensorflow/semantic_image_segmentation/deeplab/quantization/ptq/deeplab_itex.yaml
new file mode 100644
index 00000000000..5f5ae67361d
--- /dev/null
+++ b/examples/tensorflow/semantic_image_segmentation/deeplab/quantization/ptq/deeplab_itex.yaml
@@ -0,0 +1,65 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+model:                                               # mandatory. neural_compressor uses this model name and framework name to decide where to save tuning history and deploy yaml.
+  name: deeplab
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+  inputs: ImageTensor
+  outputs: SemanticPredictions
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    sampling_size: 50, 100                           # optional. default value is 100. used to set how many samples should be used in calibration.
+    dataloader:
+      dataset:
+        VOCRecord:
+          root: /path/to/pascal_voc_seg/tfrecord     # NOTE: modify to calibration dataset location if needed
+      transform:
+        ParseDecodeVoc: {}
+
+
+evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
+    metric:
+      mIOU: 
+        num_classes: 21                              # built-in metrics are topk, map, f1, allow user to register new metric.
+    dataloader:
+      batch_size: 1
+      dataset:
+        VOCRecord:
+          root: /path/to/pascal_voc_seg/tfrecord     # NOTE: modify to evaluation dataset location if needed
+      transform:
+        ParseDecodeVoc: {}
+  performance:                                       # optional. used to benchmark performance of passing model.
+    iteration: 100
+    configs:
+      cores_per_instance: 4
+      num_of_instance: 6
+    dataloader:
+      batch_size: 1
+      dataset:
+        VOCRecord:
+          root: /path/to/pascal_voc_seg/tfrecord     # NOTE: modify to evaluation dataset location if needed
+      transform:
+        ParseDecodeVoc: {}
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.
diff --git a/examples/tensorflow/style_transfer/arbitrary_style_transfer/quantization/ptq/README.md b/examples/tensorflow/style_transfer/arbitrary_style_transfer/quantization/ptq/README.md
index 18fc867bb93..c454a7f3d6a 100644
--- a/examples/tensorflow/style_transfer/arbitrary_style_transfer/quantization/ptq/README.md
+++ b/examples/tensorflow/style_transfer/arbitrary_style_transfer/quantization/ptq/README.md
@@ -2,7 +2,7 @@ Step-by-Step
 ============
 
 This document is used to list steps of reproducing TensorFlow style transfer Intel® Neural Compressor tuning zoo result.
-
+This example can run on Intel CPUs and GPUs.
 
 ## Prerequisite
 
@@ -23,12 +23,28 @@ cd examples/tensorflow/style_transfer/arbitrary_style_transfer/quantization/ptq
 pip install -r requirements.txt
 ```
 
-### 4. Prepare Dataset
+### 4. Install Intel Extension for Tensorflow if needed
+#### Tuning the model on Intel GPU(Mandatory)
+Intel Extension for Tensorflow is mandatory to be installed for tuning the model on Intel GPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+```
+For any more details, please follow the procedure in [install-gpu-drivers](https://github.com/intel-innersource/frameworks.ai.infrastructure.intel-extension-for-tensorflow.intel-extension-for-tensorflow/blob/master/docs/install/install_for_gpu.md#install-gpu-drivers)
+
+#### Tuning the model on Intel CPU(Experimental)
+Intel Extension for Tensorflow for Intel CPUs is experimental currently. It's not mandatory for tuning the model on Intel CPUs.
+
+```shell
+pip install --upgrade intel-extension-for-tensorflow[cpu]
+```
+
+### 5. Prepare Dataset
 There are two folders named style_images and content_images
 you can use these two folders to generated stylized images for test
 you can also prepare your own style_images or content_images
 
-### 5. Prepare Pretrained model
+### 6. Prepare Pretrained model
 
 #### Automated approach
 Run the `prepare_model.py` script located in `LowPrecisionInferenceTool/examples/tensorflow/style_transfer`.
@@ -81,9 +97,11 @@ def eval_func(model):
 ```
 
 ### Write Yaml config file
-In examples directory, there is a conf.yaml. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader
+In examples directory, there is a conf.yaml for tuning the model on Intel CPUs. The 'framework' in the yaml is set to 'tensorflow'. If running this example on Intel GPUs, the 'framework' should be set to 'tensorflow_itex' and the device in yaml file should be set to 'gpu'. The conf_itex.yaml is prepared for the GPU case. We could remove most of items and only keep mandatory item for tuning. We also implement a calibration dataloader and have evaluation field for creation of evaluation function at internal neural_compressor.
 
 ```yaml
+device: cpu                                          # NOTE: optional. default value is cpu, other value is gpu.
+
 model:
   name: style_transfer
   framework: tensorflow
diff --git a/examples/tensorflow/style_transfer/arbitrary_style_transfer/quantization/ptq/conf.yaml b/examples/tensorflow/style_transfer/arbitrary_style_transfer/quantization/ptq/conf.yaml
index 68160b1e254..89664a00e35 100644
--- a/examples/tensorflow/style_transfer/arbitrary_style_transfer/quantization/ptq/conf.yaml
+++ b/examples/tensorflow/style_transfer/arbitrary_style_transfer/quantization/ptq/conf.yaml
@@ -13,6 +13,8 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+device: cpu                                          # optional. default value is cpu, other value is gpu.
+
 model:                                               # mandatory. used to specify model specific information.
   name: style_transfer
   framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
diff --git a/examples/tensorflow/style_transfer/arbitrary_style_transfer/quantization/ptq/conf_itex.yaml b/examples/tensorflow/style_transfer/arbitrary_style_transfer/quantization/ptq/conf_itex.yaml
new file mode 100644
index 00000000000..23a38a18893
--- /dev/null
+++ b/examples/tensorflow/style_transfer/arbitrary_style_transfer/quantization/ptq/conf_itex.yaml
@@ -0,0 +1,48 @@
+#
+# Copyright (c) 2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+device: gpu                                          # optional. set cpu if installed intel-extension-for-tensorflow[cpu], set gpu if installed intel-extension-for-tensorflow[gpu].
+
+model:                                               # mandatory. used to specify model specific information.
+  name: style_transfer
+  framework: tensorflow_itex                         # mandatory. supported values are tensorflow, tensorflow_itex, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
+  inputs: style_input,content_input                  # optional. inputs and outputs fields are only required for tensorflow backend.
+  outputs: transformer/expand/conv3/conv/Sigmoid
+
+quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+  calibration:
+    dataloader:                                      # optional. if not specified, user need construct a q_dataloader in code for neural_compressor.Quantization.
+      batch_size: 2
+      dataset:
+        style_transfer:
+          content_folder: ./content_images/          # NOTE: modify to content images path if needed
+          style_folder: ./style_images/              # NOTE: modify to style images path if needed
+
+evaluation:
+  performance:
+    dataloader:                                      # optional. if not specified, user need construct a q_dataloader in code for neural_compressor.Quantization.
+      batch_size: 2
+      dataset:
+        style_transfer:
+          content_folder: ./content_images/          # NOTE: modify to content images path if needed
+          style_folder: ./style_images/              # NOTE: modify to style images path if needed
+
+tuning:
+  accuracy_criterion:
+    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
+  exit_policy:
+    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
+    max_trials: 100                                  # optional. max tune times. default value is 100. combine with timeout field to decide when to exit.
+  random_seed: 9527                                  # optional. random seed for deterministic tuning.