Neural Insights: step by step debug example docs (#1103)

Signed-off-by: aradys-intel <agata.radys@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com>
intel · Aug 7, 2023 · 99c3b06 · 99c3b06
1 parent da4c92c
commit 99c3b06
Show file tree

Hide file tree

Showing 14 changed files with 243 additions and 5 deletions.
diff --git a/docs/source/diagnosis.md b/docs/source/diagnosis.md
@@ -1,17 +1,19 @@
 # Diagnosis
-1. [Diagnosis introduction](#diagnosis-introduction)
+1. [Diagnosis Introduction](#diagnosis-introduction)
 2. [Supported Feature Matrix](#supported-feature-matrix)
-3. [Get started](#get-started)
+3. [Get Started](#get-started)
 4. [Example](#example)
+5. [Step by Step Diagnosis Example with TensorFlow](https://github.com/intel/neural-compressor/tree/master/neural_insights/docs/source/tf_accuracy_debug.md)
+5. [Step by Step Diagnosis Example with ONNXRT](https://github.com/intel/neural-compressor/tree/master/neural_insights/docs/source/onnx_accuracy_debug.md)
 
-# Diagnosis introduction
+# Diagnosis Introduction
 The diagnosis feature provides methods to debug the accuracy loss during quantization and profile the performance gap during benchmark.
 There are 2 ways to diagnose a model with Intel® Neural Compressor. First is non-GUI mode that is described below and second is GUI mode with [Neural Insights](https://github.com/intel/neural-compressor/tree/master/neural_insights) component.
 
 The workflow is described in the diagram below. First we have to configure scripts with diagnosis, then run them and check diagnosis info in the terminal. Test if the result is satisfying and repeat the steps if needed.
 ![workflow](./imgs/workflow.jpg)
 
-# Supported feature matrix
+# Supported Feature Matrix
 <table class="center">
     <thead>
         <tr>
@@ -45,7 +47,7 @@ The workflow is described in the diagram below. First we have to configure scrip
     </tbody>
 </table>
 
-# Get started 
+# Get Started 
 ## Install Intel® Neural Compressor
 First you need to install Intel® Neural Compressor.
 ```shell

diff --git a/neural_insights/README.md b/neural_insights/README.md
@@ -113,6 +113,9 @@ When the quantization is started, the workload should appear on the Neural Insig
 
 > Note that above example uses dummy data which is used to describe usage of Neural Insights. For diagnosis purposes you should use real dataset specific for your use case.
 
+## Step by Step Diagnosis Example
+Refer to [Step by Step Diagnosis Example with TensorFlow](https://github.com/intel/neural-compressor/tree/master/neural_insights/docs/source/tf_accuracy_debug.md) and [Step by Step Diagnosis Example with ONNXRT](https://github.com/intel/neural-compressor/tree/master/neural_insights/docs/source/onnx_accuracy_debug.md) to get started with some basic quantization accuracy diagnostic skills.
+
 ## Research Collaborations
 
 Welcome to raise any interesting research ideas on model compression techniques and feel free to reach us (inc.maintainers@intel.com). Look forward to our collaborations on Neural Insights!
diff --git a/neural_insights/docs/source/imgs/accuracy_decrease.png b/neural_insights/docs/source/imgs/accuracy_decrease.png
diff --git a/neural_insights/docs/source/imgs/activations_summary.png b/neural_insights/docs/source/imgs/activations_summary.png
diff --git a/neural_insights/docs/source/imgs/activations_summary_onnx.jpg b/neural_insights/docs/source/imgs/activations_summary_onnx.jpg
diff --git a/neural_insights/docs/source/imgs/min-max.jpg b/neural_insights/docs/source/imgs/min-max.jpg
diff --git a/neural_insights/docs/source/imgs/ops_weights.png b/neural_insights/docs/source/imgs/ops_weights.png
diff --git a/neural_insights/docs/source/imgs/tune_result.png b/neural_insights/docs/source/imgs/tune_result.png
diff --git a/neural_insights/docs/source/imgs/tune_result2.png b/neural_insights/docs/source/imgs/tune_result2.png
diff --git a/neural_insights/docs/source/imgs/weights_histogram.png b/neural_insights/docs/source/imgs/weights_histogram.png
diff --git a/neural_insights/docs/source/imgs/weights_histograms.png b/neural_insights/docs/source/imgs/weights_histograms.png
diff --git a/neural_insights/docs/source/imgs/weights_summary_onnx.jpg b/neural_insights/docs/source/imgs/weights_summary_onnx.jpg
diff --git a/neural_insights/docs/source/onnx_accuracy_debug.md b/neural_insights/docs/source/onnx_accuracy_debug.md
@@ -0,0 +1,120 @@
+# Step by step example how to debug accuracy with Neural Insights
+1. [Introduction](#introduction)
+2. [Preparation](#preparation)
+3. [Running the quantization](#running-the-quantization)
+4. [Analyzing the result of quantization](#-analyzing-the-result-of-quantization)
+
+# Introduction
+In this instruction accuracy issue will be debugged using Neural Insights. ONNX LayoutLMv3 model will be used as an example. It will be quantized and the results will be analyzed to find the cause of the accuracy loss.
+
+# Preparation
+## Requirements
+First you need to install Intel® Neural Compressor and other requirements.
+```shell
+pip install neural-compressor 
+pip install datasets transformers torch torchvision
+pip install onnx onnxruntime onnxruntime-extensions
+pip install accelerate seqeval tensorboard sentencepiece timm fvcore Pillow einops textdistance shapely protobuf setuptools optimum
+```
+
+## Model
+Get the LayoutLMv3 model from Intel® Neural Compressor [LayoutLMv3 example](https://github.com/intel/neural-compressor/tree/master/examples/onnxrt/nlp/huggingface_model/token_classification/layoutlmv3/quantization/ptq_static).
+```shell
+optimum-cli export onnx --model HYPJUDY/layoutlmv3-base-finetuned-funsd layoutlmv3-base-finetuned-funsd-onnx/ --task=token-classification
+```
+
+# Running the quantization
+Generate a quantized model.
+```python
+onnx_model = onnx.load(input_model)
+calib_dataset = IncDataset(eval_dataset, onnx_model)
+config = PostTrainingQuantConfig(approach='static', quant_format="QOperator")
+q_model = quantization.fit(onnx_model, 
+                           config,
+                           calib_dataloader=DataLoader(framework='onnxruntime', dataset=calib_dataset))
+```
+
+Execute benchmark to get the F1 score of both FP32 and INT8 models and then compute the relative accuracy ratio.
+The output results indicate that the quantized model's accuracy is noticeably poor.
+
+```
+fp32 f1 = 0.9049, int8 f1 = 0.2989, accuracy ratio = -66.9631%
+```
+
+# Analyzing the result of quantization
+In this section, the diagnosis tool is used for debugging to achieve higher INT8 model accuracy.
+We need to set `diagnosis` parameter to `True` as shown below.
+```python
+config = PostTrainingQuantConfig(approach="static", quant_format="QOperator", quant_level=1, diagnosis=True) # set 'diagnosis' to True
+q_model = quantization.fit(onnx_model, 
+                           config, 
+                           eval_func=eval_func, 
+                           calib_dataloader=DataLoader(framework='onnxruntime', dataset=calib_dataset))
+```
+The diagnosis tool will output `Activations summary` and `Weights summary` in terminal. 
+
+For easy to check, here we reload them to .csv files as shown below.
+```python
+import glob
+import pandas as pd
+pd.set_option('display.max_rows',None)
+pd.set_option('display.max_columns',None)
+
+subfolders = glob.glob("./nc_workspace" + "/*/")
+subfolders.sort(key=os.path.getmtime, reverse=True)
+if subfolders:
+    activations_table = os.path.join(subfolders[0], "activations_table.csv")
+    weights_table = os.path.join(subfolders[0], "weights_table.csv")
+
+    activations_table = pd.read_csv(activations_table)
+    weights_table = pd.read_csv(weights_table)
+
+    print("Activations summary")
+    display(activations_table)
+
+    print("\nWeights summary")
+    display(weights_table)
+```
+
+## Weights summary
+These are the top 10 rows from weights summary table:
+
+![weights_summary_onnx](./imgs/weights_summary_onnx.jpg)
+
+## Activations summary
+These are the top 10 rows from activations summary table:
+
+![activations_summary_onnx](./imgs/activations_summary_onnx.jpg)
+
+In the Activations summary table, there are some nodes showing dispersed activation data range. Therefore, we calculate the `Min-Max data range` for activations data and sort the results in descending order.
+
+```python
+activations_table["Min-Max data range"] = activations_table["Activation max"] - activations_table["Activation min"]
+sorted_data = activations_table.sort_values(by="Min-Max data range", ascending=False)
+display(sorted_data)
+```
+
+The results should look like below:
+
+![min-max](./imgs/min-max.jpg)
+
+According to the results displayed above, it is evident that the nodes of type `/layoutlmv3/encoder/layer.\d+/output/Add` and `/layoutlmv3/encoder/layer.\d+/output/dense/MatMul` have significantly higher values for `Min-Max data range` compared to other node types. This indicates that they may have caused a loss of accuracy. Therefore, we can try to fallback these nodes.
+
+Refer to [diagnosis.md](https://github.com/intel/neural-compressor/blob/master/docs/source/diagnosis.md) for more tips for diagnosis.
+
+```python
+from neural_compressor.utils.constant import FP32
+config = PostTrainingQuantConfig(approach="static", 
+                                 quant_format="QOperator",
+                                 op_name_dict={"/layoutlmv3/encoder/layer.\d+/output/dense/MatMul":FP32,
+                                               "/layoutlmv3/encoder/layer.\d+/output/Add":FP32})
+q_model = quantization.fit(onnx_model, 
+                           config,
+                           calib_dataloader=DataLoader(framework='onnxruntime', dataset=calib_dataset))
+q_model.save(output_model)
+```
+
+Execute benchmark on the new quantized model again and the accuracy ratio is improved to <1%.
+```
+fp32 f1 = 0.9049, int8 f1 = 0.8981, accuracy ratio = -0.7502%
+```
diff --git a/neural_insights/docs/source/tf_accuracy_debug.md b/neural_insights/docs/source/tf_accuracy_debug.md
@@ -0,0 +1,113 @@
+# Step by step example how to debug accuracy with Neural Insights
+1. [Introduction](#introduction)
+2. [Preparation](#preparation)
+3. [Running the quantization](#running-the-quantization)
+4. [Analyzing the result of quantization](#-analyzing-the-result-of-quantization)
+5. [Analyzing weight histograms](#-analyzing-weight-histograms)
+
+# Introduction
+In this instruction accuracy issue will be debugged using Neural Insights. TensorFlow Inception_v3 model will be used as an example. It will be quantized and the results will be analyzed to find the cause of the accuracy loss.
+
+# Preparation
+## Source
+First you need to install Intel® Neural Compressor.
+```shell
+# Install Neural Compressor
+git clone https://github.com/intel/neural-compressor.git
+cd neural-compressor 
+pip install -r requirements.txt 
+python setup.py install
+
+# Install Neural Insights
+pip install -r neural_insights/requirements.txt
+python setup.py install neural_insights
+```
+
+## Requirements
+```shell
+cd examples/tensorflow/image_recognition/tensorflow_models/inception_v3/quantization/ptq
+pip install -r requirements.txt
+```
+
+## Model
+Download pre-trained PB model file.
+```shell
+wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v1_6/inceptionv3_fp32_pretrained_model.pb
+```
+
+## Prepare the dataset
+Download dataset from ImageNet and process the data to TensorFlow Record format.
+```shell
+cd examples/tensorflow/image_recognition/tensorflow_models/
+bash prepare_dataset.sh --output_dir=./inception_v3/quantization/ptq/data --raw_dir=/PATH/TO/img_raw/val/ --subset=validation
+bash prepare_dataset.sh --output_dir=./inception_v3/quantization/ptq/data --raw_dir=/PATH/TO/img_raw/train/ --subset=train
+```
+
+# Running the quantization
+Before applying quantization, modify some code to enable Neural Insights:
+1. Set the argument `diagnosis` to be `True` in `PostTrainingQuantConfig` so that Neural Insights will dump weights and activations of quantizable Ops in this model.
+2. Delete the `op_name_dict` argument because that’s the answer of our investigation.
+```python
+conf = PostTrainingQuantConfig(calibration_sampling_size=[50, 100], diagnosis=True)
+```
+3. Quantize the model with following command:
+```shell
+bash run_tuning.sh --input_model=/PATH/TO/inceptionv3_fp32_pretrained_model.pb --output_model=./nc_inception_v3.pb --dataset_location=/path/to/ImageNet/
+```
+
+The accuracy of this model will decrease a lot if all Ops are quantized to int8 as default strategy:
+
+![accuracy_decrease](./imgs/accuracy_decrease.png)
+
+# Analyzing the result of quantization
+Then, if you run quantization, you will find the following table:
+
+![activations_summary](./imgs/activations_summary.png)
+
+The MSE (Mean Square Error) of the Ops’ activation are listed from high to low, there are also min-max values.
+Usually, MSE can be referred as one of a typical indexes leading to accuracy loss.
+
+![ops_weights](./imgs/ops_weights.png)
+
+There are also relevant information about Ops’ weights.
+Often Op with highest MSE will cause the highest accuracy loss, but it is not always the case.
+
+Experiment with disabling the quantization of some of the Ops with top 5 highest MSE in both tables is not satisfactory, as results show in this example:
+
+![tune_result](./imgs/tune_result.png)
+
+Then weights histograms can be analyzed to find the reason of the accuracy loss.
+
+# Analyzing weight histograms
+## Open Neural Insights
+```shell
+neural_insights
+```
+
+Then you will get a webpage address with Neural insights GUI mode. You can find there histograms of weights and activations.
+```
+Neural Insights Server started.
+Open address [...]
+```
+
+The weights of Ops are usually distributed in one spike like the following graph:
+
+![weights_histograms](./imgs/weights_histograms.png)
+
+When you click on the Op in the Op list, you can get weight and activation histograms at the bottom of the page.
+One of the weights histograms looks different than the examples above.
+
+![weights_histogram](./imgs/weights_histogram.png)
+
+As is shown in the chart, the distribution of weights often concentrates in a small range of min-max values, when the accuracy loss of an Op is tolerable. But in this Op the min-max values of weights are significantly high (range is bigger than [-20, 20]) because of some outliers. The values near zero point, which are the majority, will be mapped to a very small range in int8, leading to a huge accuracy loss. Besides, since the min-max values vary in different channels, the accuracy will decrease without using channel-wise quantization.
+
+Therefore, you can disable this Op:
+```python
+op_name_dict = {'v0/cg/conv0/conv2d/Conv2D': {
+   'activation':  {'dtype': ['fp32']}}}
+conf = PostTrainingQuantConfig(calibration_sampling_size=[50, 100], op_name_dict=op_name_dict)
+```
+
+After running quantization again, you can see that accuracy result has increased. The Op that caused accuracy loss was found.
+
+![tune_result2](./imgs/tune_result2.png)