Skip to content

Commit

Permalink
Neural Insights: step by step debug example docs (#1103)
Browse files Browse the repository at this point in the history
Signed-off-by: aradys-intel <agata.radys@intel.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
  • Loading branch information
aradys and chensuyue committed Aug 7, 2023
1 parent da4c92c commit 99c3b06
Show file tree
Hide file tree
Showing 14 changed files with 243 additions and 5 deletions.
12 changes: 7 additions & 5 deletions docs/source/diagnosis.md
@@ -1,17 +1,19 @@
# Diagnosis
1. [Diagnosis introduction](#diagnosis-introduction)
1. [Diagnosis Introduction](#diagnosis-introduction)
2. [Supported Feature Matrix](#supported-feature-matrix)
3. [Get started](#get-started)
3. [Get Started](#get-started)
4. [Example](#example)
5. [Step by Step Diagnosis Example with TensorFlow](https://github.com/intel/neural-compressor/tree/master/neural_insights/docs/source/tf_accuracy_debug.md)
5. [Step by Step Diagnosis Example with ONNXRT](https://github.com/intel/neural-compressor/tree/master/neural_insights/docs/source/onnx_accuracy_debug.md)

# Diagnosis introduction
# Diagnosis Introduction
The diagnosis feature provides methods to debug the accuracy loss during quantization and profile the performance gap during benchmark.
There are 2 ways to diagnose a model with Intel® Neural Compressor. First is non-GUI mode that is described below and second is GUI mode with [Neural Insights](https://github.com/intel/neural-compressor/tree/master/neural_insights) component.

The workflow is described in the diagram below. First we have to configure scripts with diagnosis, then run them and check diagnosis info in the terminal. Test if the result is satisfying and repeat the steps if needed.
![workflow](./imgs/workflow.jpg)

# Supported feature matrix
# Supported Feature Matrix
<table class="center">
<thead>
<tr>
Expand Down Expand Up @@ -45,7 +47,7 @@ The workflow is described in the diagram below. First we have to configure scrip
</tbody>
</table>

# Get started
# Get Started
## Install Intel® Neural Compressor
First you need to install Intel® Neural Compressor.
```shell
Expand Down
3 changes: 3 additions & 0 deletions neural_insights/README.md
Expand Up @@ -113,6 +113,9 @@ When the quantization is started, the workload should appear on the Neural Insig

> Note that above example uses dummy data which is used to describe usage of Neural Insights. For diagnosis purposes you should use real dataset specific for your use case.
## Step by Step Diagnosis Example
Refer to [Step by Step Diagnosis Example with TensorFlow](https://github.com/intel/neural-compressor/tree/master/neural_insights/docs/source/tf_accuracy_debug.md) and [Step by Step Diagnosis Example with ONNXRT](https://github.com/intel/neural-compressor/tree/master/neural_insights/docs/source/onnx_accuracy_debug.md) to get started with some basic quantization accuracy diagnostic skills.

## Research Collaborations

Welcome to raise any interesting research ideas on model compression techniques and feel free to reach us (inc.maintainers@intel.com). Look forward to our collaborations on Neural Insights!
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added neural_insights/docs/source/imgs/min-max.jpg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added neural_insights/docs/source/imgs/ops_weights.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added neural_insights/docs/source/imgs/tune_result.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added neural_insights/docs/source/imgs/tune_result2.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
120 changes: 120 additions & 0 deletions neural_insights/docs/source/onnx_accuracy_debug.md
@@ -0,0 +1,120 @@
# Step by step example how to debug accuracy with Neural Insights
1. [Introduction](#introduction)
2. [Preparation](#preparation)
3. [Running the quantization](#running-the-quantization)
4. [Analyzing the result of quantization](#-analyzing-the-result-of-quantization)

# Introduction
In this instruction accuracy issue will be debugged using Neural Insights. ONNX LayoutLMv3 model will be used as an example. It will be quantized and the results will be analyzed to find the cause of the accuracy loss.

# Preparation
## Requirements
First you need to install Intel® Neural Compressor and other requirements.
```shell
pip install neural-compressor
pip install datasets transformers torch torchvision
pip install onnx onnxruntime onnxruntime-extensions
pip install accelerate seqeval tensorboard sentencepiece timm fvcore Pillow einops textdistance shapely protobuf setuptools optimum
```

## Model
Get the LayoutLMv3 model from Intel® Neural Compressor [LayoutLMv3 example](https://github.com/intel/neural-compressor/tree/master/examples/onnxrt/nlp/huggingface_model/token_classification/layoutlmv3/quantization/ptq_static).
```shell
optimum-cli export onnx --model HYPJUDY/layoutlmv3-base-finetuned-funsd layoutlmv3-base-finetuned-funsd-onnx/ --task=token-classification
```

# Running the quantization
Generate a quantized model.
```python
onnx_model = onnx.load(input_model)
calib_dataset = IncDataset(eval_dataset, onnx_model)
config = PostTrainingQuantConfig(approach='static', quant_format="QOperator")
q_model = quantization.fit(onnx_model,
config,
calib_dataloader=DataLoader(framework='onnxruntime', dataset=calib_dataset))
```

Execute benchmark to get the F1 score of both FP32 and INT8 models and then compute the relative accuracy ratio.
The output results indicate that the quantized model's accuracy is noticeably poor.

```
fp32 f1 = 0.9049, int8 f1 = 0.2989, accuracy ratio = -66.9631%
```

# Analyzing the result of quantization
In this section, the diagnosis tool is used for debugging to achieve higher INT8 model accuracy.
We need to set `diagnosis` parameter to `True` as shown below.
```python
config = PostTrainingQuantConfig(approach="static", quant_format="QOperator", quant_level=1, diagnosis=True) # set 'diagnosis' to True
q_model = quantization.fit(onnx_model,
config,
eval_func=eval_func,
calib_dataloader=DataLoader(framework='onnxruntime', dataset=calib_dataset))
```
The diagnosis tool will output `Activations summary` and `Weights summary` in terminal.

For easy to check, here we reload them to .csv files as shown below.
```python
import glob
import pandas as pd
pd.set_option('display.max_rows',None)
pd.set_option('display.max_columns',None)

subfolders = glob.glob("./nc_workspace" + "/*/")
subfolders.sort(key=os.path.getmtime, reverse=True)
if subfolders:
activations_table = os.path.join(subfolders[0], "activations_table.csv")
weights_table = os.path.join(subfolders[0], "weights_table.csv")

activations_table = pd.read_csv(activations_table)
weights_table = pd.read_csv(weights_table)

print("Activations summary")
display(activations_table)

print("\nWeights summary")
display(weights_table)
```

## Weights summary
These are the top 10 rows from weights summary table:

![weights_summary_onnx](./imgs/weights_summary_onnx.jpg)

## Activations summary
These are the top 10 rows from activations summary table:

![activations_summary_onnx](./imgs/activations_summary_onnx.jpg)

In the Activations summary table, there are some nodes showing dispersed activation data range. Therefore, we calculate the `Min-Max data range` for activations data and sort the results in descending order.

```python
activations_table["Min-Max data range"] = activations_table["Activation max"] - activations_table["Activation min"]
sorted_data = activations_table.sort_values(by="Min-Max data range", ascending=False)
display(sorted_data)
```

The results should look like below:

![min-max](./imgs/min-max.jpg)

According to the results displayed above, it is evident that the nodes of type `/layoutlmv3/encoder/layer.\d+/output/Add` and `/layoutlmv3/encoder/layer.\d+/output/dense/MatMul` have significantly higher values for `Min-Max data range` compared to other node types. This indicates that they may have caused a loss of accuracy. Therefore, we can try to fallback these nodes.

Refer to [diagnosis.md](https://github.com/intel/neural-compressor/blob/master/docs/source/diagnosis.md) for more tips for diagnosis.

```python
from neural_compressor.utils.constant import FP32
config = PostTrainingQuantConfig(approach="static",
quant_format="QOperator",
op_name_dict={"/layoutlmv3/encoder/layer.\d+/output/dense/MatMul":FP32,
"/layoutlmv3/encoder/layer.\d+/output/Add":FP32})
q_model = quantization.fit(onnx_model,
config,
calib_dataloader=DataLoader(framework='onnxruntime', dataset=calib_dataset))
q_model.save(output_model)
```

Execute benchmark on the new quantized model again and the accuracy ratio is improved to <1%.
```
fp32 f1 = 0.9049, int8 f1 = 0.8981, accuracy ratio = -0.7502%
```
113 changes: 113 additions & 0 deletions neural_insights/docs/source/tf_accuracy_debug.md
@@ -0,0 +1,113 @@
# Step by step example how to debug accuracy with Neural Insights
1. [Introduction](#introduction)
2. [Preparation](#preparation)
3. [Running the quantization](#running-the-quantization)
4. [Analyzing the result of quantization](#-analyzing-the-result-of-quantization)
5. [Analyzing weight histograms](#-analyzing-weight-histograms)

# Introduction
In this instruction accuracy issue will be debugged using Neural Insights. TensorFlow Inception_v3 model will be used as an example. It will be quantized and the results will be analyzed to find the cause of the accuracy loss.

# Preparation
## Source
First you need to install Intel® Neural Compressor.
```shell
# Install Neural Compressor
git clone https://github.com/intel/neural-compressor.git
cd neural-compressor
pip install -r requirements.txt
python setup.py install

# Install Neural Insights
pip install -r neural_insights/requirements.txt
python setup.py install neural_insights
```

## Requirements
```shell
cd examples/tensorflow/image_recognition/tensorflow_models/inception_v3/quantization/ptq
pip install -r requirements.txt
```

## Model
Download pre-trained PB model file.
```shell
wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v1_6/inceptionv3_fp32_pretrained_model.pb
```

## Prepare the dataset
Download dataset from ImageNet and process the data to TensorFlow Record format.
```shell
cd examples/tensorflow/image_recognition/tensorflow_models/
bash prepare_dataset.sh --output_dir=./inception_v3/quantization/ptq/data --raw_dir=/PATH/TO/img_raw/val/ --subset=validation
bash prepare_dataset.sh --output_dir=./inception_v3/quantization/ptq/data --raw_dir=/PATH/TO/img_raw/train/ --subset=train
```

# Running the quantization
Before applying quantization, modify some code to enable Neural Insights:
1. Set the argument `diagnosis` to be `True` in `PostTrainingQuantConfig` so that Neural Insights will dump weights and activations of quantizable Ops in this model.
2. Delete the `op_name_dict` argument because that’s the answer of our investigation.
```python
conf = PostTrainingQuantConfig(calibration_sampling_size=[50, 100], diagnosis=True)
```
3. Quantize the model with following command:
```shell
bash run_tuning.sh --input_model=/PATH/TO/inceptionv3_fp32_pretrained_model.pb --output_model=./nc_inception_v3.pb --dataset_location=/path/to/ImageNet/
```

The accuracy of this model will decrease a lot if all Ops are quantized to int8 as default strategy:

![accuracy_decrease](./imgs/accuracy_decrease.png)

# Analyzing the result of quantization
Then, if you run quantization, you will find the following table:

![activations_summary](./imgs/activations_summary.png)

The MSE (Mean Square Error) of the Ops’ activation are listed from high to low, there are also min-max values.
Usually, MSE can be referred as one of a typical indexes leading to accuracy loss.

![ops_weights](./imgs/ops_weights.png)

There are also relevant information about Ops’ weights.
Often Op with highest MSE will cause the highest accuracy loss, but it is not always the case.

Experiment with disabling the quantization of some of the Ops with top 5 highest MSE in both tables is not satisfactory, as results show in this example:

![tune_result](./imgs/tune_result.png)

Then weights histograms can be analyzed to find the reason of the accuracy loss.

# Analyzing weight histograms
## Open Neural Insights
```shell
neural_insights
```

Then you will get a webpage address with Neural insights GUI mode. You can find there histograms of weights and activations.
```
Neural Insights Server started.
Open address [...]
```

The weights of Ops are usually distributed in one spike like the following graph:

![weights_histograms](./imgs/weights_histograms.png)

When you click on the Op in the Op list, you can get weight and activation histograms at the bottom of the page.
One of the weights histograms looks different than the examples above.

![weights_histogram](./imgs/weights_histogram.png)

As is shown in the chart, the distribution of weights often concentrates in a small range of min-max values, when the accuracy loss of an Op is tolerable. But in this Op the min-max values of weights are significantly high (range is bigger than [-20, 20]) because of some outliers. The values near zero point, which are the majority, will be mapped to a very small range in int8, leading to a huge accuracy loss. Besides, since the min-max values vary in different channels, the accuracy will decrease without using channel-wise quantization.

Therefore, you can disable this Op:
```python
op_name_dict = {'v0/cg/conv0/conv2d/Conv2D': {
'activation': {'dtype': ['fp32']}}}
conf = PostTrainingQuantConfig(calibration_sampling_size=[50, 100], op_name_dict=op_name_dict)
```

After running quantization again, you can see that accuracy result has increased. The Op that caused accuracy loss was found.

![tune_result2](./imgs/tune_result2.png)

0 comments on commit 99c3b06

Please sign in to comment.