Skip to content

Commit

Permalink
Migrate onnx nlp and obj examples into 2.x API (#579)
Browse files Browse the repository at this point in the history
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
  • Loading branch information
yuwenzho committed Mar 16, 2023
1 parent ba7db7e commit a4228df
Show file tree
Hide file tree
Showing 215 changed files with 22,040 additions and 452 deletions.
3 changes: 3 additions & 0 deletions .azure-pipelines/scripts/codeScan/pyspelling/inc_dict.txt
Expand Up @@ -2539,4 +2539,7 @@ zalandoresearch
emCgSTlJaAg
matsubara
yoshitomo
deepset
FAC
electra
parallelizes
328 changes: 175 additions & 153 deletions examples/.config/model_params_onnxrt.json

Large diffs are not rendered by default.

134 changes: 96 additions & 38 deletions examples/README.md
Expand Up @@ -953,58 +953,58 @@ Intel® Neural Compressor validated examples with multiple compression technique
<td><a href="./onnxrt/body_analysis/onnx_model_zoo/arcface/quantization/ptq">qlinearops</a></td>
</tr>
<tr>
<td>*BERT base MRPC</td>
<td>BERT base MRPC</td>
<td>Natural Language Processing</td>
<td>Post-Training Static Quantization</td>
<td><a href="https://github.com/intel/neural-compressor/tree/old_api_examples/examples/onnxrt/nlp/bert/quantization/ptq">integerops</a> / <a href="https://github.com/intel/neural-compressor/tree/old_api_examples/examples/onnxrt/nlp/bert/quantization/ptq">qdq</a></td>
<td><a href="./onnxrt/nlp/bert/quantization/ptq_static">integerops</a> / <a href="./onnxrt/nlp/bert/quantization/ptq_static">qdq</a></td>
</tr>
<tr>
<td>*BERT base MRPC</td>
<td>BERT base MRPC</td>
<td>Natural Language Processing</td>
<td>Post-Training Dynamic Quantization</td>
<td><a href="https://github.com/intel/neural-compressor/tree/old_api_examples/examples/onnxrt/nlp/bert/quantization/ptq">integerops</a></td>
<td><a href="./onnxrt/nlp/bert/quantization/ptq_dynamic">integerops</a></td>
</tr>
<tr>
<td>*DistilBERT base MRPC</td>
<td>DistilBERT base MRPC</td>
<td>Natural Language Processing</td>
<td>Post-Training Dynamic / Static Quantization</td>
<td><a href="https://github.com/intel/neural-compressor/tree/old_api_examples/examples/onnxrt/nlp/distilbert/quantization/ptq">integerops</a> / <a href="https://github.com/intel/neural-compressor/tree/old_api_examples/examples/onnxrt/nlp/distilbert/quantization/ptq">qdq</a></td>
<td><a href="./onnxrt/nlp/distilbert/quantization/ptq_dynamic">integerops</a> / <a href="./onnxrt/nlp/distilbert/quantization/ptq_static">qdq</a></td>
</tr>
<tr>
<td>*Mobile bert MRPC</td>
<td>Mobile bert MRPC</td>
<td>Natural Language Processing</td>
<td>Post-Training Dynamic / Static Quantization</td>
<td><a href="https://github.com/intel/neural-compressor/tree/old_api_examples/examples/onnxrt/nlp/mobilebert/quantization/ptq">integerops</a> / <a href="https://github.com/intel/neural-compressor/tree/old_api_examples/examples/onnxrt/nlp/mobilebert/quantization/ptq">qdq</a></td>
<td><a href="./onnxrt/nlp/mobilebert/quantization/ptq_dynamic">integerops</a> / <a href="./onnxrt/nlp/mobilebert/quantization/ptq_static">qdq</a></td>
</tr>
<tr>
<td>*Roberta base MRPC</td>
<td>Roberta base MRPC</td>
<td>Natural Language Processing</td>
<td>Post-Training Dynamic / Static Quantization</td>
<td><a href="https://github.com/intel/neural-compressor/tree/old_api_examples/examples/onnxrt/nlp/roberta/quantization/ptq">integerops</a> / <a href="https://github.com/intel/neural-compressor/tree/old_api_examples/examples/onnxrt/nlp/roberta/quantization/ptq">qdq</a></td>
<td><a href="./onnxrt/nlp/roberta/quantization/ptq_dynamic">integerops</a> / <a href="./onnxrt/nlp/roberta/quantization/ptq_static">qdq</a></td>
</tr>
<tr>
<td>BERT SQuAD</td>
<td>Natural Language Processing</td>
<td>Post-Training Dynamic / Static Quantization</td>
<td><a href="./onnxrt/nlp/onnx_model_zoo/bert-squad/quantization/ptq">integerops</a> / <a href="./onnxrt/nlp/onnx_model_zoo/bert-squad/quantization/ptq">qdq</a></td>
<td><a href="./onnxrt/nlp/onnx_model_zoo/bert-squad/quantization/ptq_dynamic">integerops</a> </td>
</tr>
<tr>
<td>GPT2 lm head WikiText</td>
<td>Natural Language Processing</td>
<td>Post-Training Dynamic Quantization</td>
<td><a href="./onnxrt/nlp/onnx_model_zoo/gpt2/quantization/ptq">integerops</a></td>
<td><a href="./onnxrt/nlp/onnx_model_zoo/gpt2/quantization/ptq_dynamic">integerops</a></td>
</tr>
<tr>
<td>MobileBERT SQuAD MLPerf</td>
<td>Natural Language Processing</td>
<td>Post-Training Dynamic / Static Quantization</td>
<td><a href="./onnxrt/nlp/onnx_model_zoo/mobilebert/quantization/ptq">integerops</a> / <a href="./onnxrt/nlp/onnx_model_zoo/mobilebert/quantization/ptq">qdq</a></td>
<td><a href="./onnxrt/nlp/onnx_model_zoo/mobilebert/quantization/ptq_dynamic">integerops</a> / <a href="./onnxrt/nlp/onnx_model_zoo/mobilebert/quantization/ptq_static">qdq</a></td>
</tr>
<tr>
<td>BiDAF</td>
<td>Natural Language Processing</td>
<td>Post-Training Dynamic Quantization</td>
<td><a href="./onnxrt/nlp/onnx_model_zoo/BiDAF/quantization/ptq">integerops</a></td>
<td><a href="./onnxrt/nlp/onnx_model_zoo/BiDAF/quantization/ptq_dynamic">integerops</a></td>
</tr>
<tr>
<td>BERT base uncased MRPC (HuggingFace)</td>
Expand All @@ -1025,17 +1025,17 @@ Intel® Neural Compressor validated examples with multiple compression technique
<tr>
<td>XLM Roberta base MRPC (HuggingFace)</td>
<td>Natural Language Processing</td>
<td>Post-Training Dynamic / Static Quantization</td>
<td>Post-Training Dynamic Quantization</td>
<td>
<a href="./onnxrt/nlp/huggingface_model/text_classification/quantization/ptq_dynamic">integerops</a> / <a href="./onnxrt/nlp/huggingface_model/text_classification/quantization/ptq_static">qlinearops</a>
<a href="./onnxrt/nlp/huggingface_model/text_classification/quantization/ptq_dynamic">integerops</a></td>
</td>
</tr>
<tr>
<td>Camembert base MRPC (HuggingFace)</td>
<td>Natural Language Processing</td>
<td>Post-Training Dynamic / Static Quantization</td>
<td>Post-Training Dynamic Quantization</td>
<td>
<a href="./onnxrt/nlp/huggingface_model/text_classification/quantization/ptq_dynamic">integerops</a> / <a href="./onnxrt/nlp/huggingface_model/text_classification/quantization/ptq_static">qlinearops</a>
<a href="./onnxrt/nlp/huggingface_model/text_classification/quantization/ptq_dynamic">integerops</a></td>
</td>
</tr>
<tr>
Expand All @@ -1057,19 +1057,59 @@ Intel® Neural Compressor validated examples with multiple compression technique
<tr>
<td>Albert base v2 SST-2 (HuggingFace)</td>
<td>Natural Language Processing</td>
<td>Post-Training Dynamic Quantization</td>
<td>
<a href="./onnxrt/nlp/huggingface_model/text_classification/quantization/ptq_dynamic">integerops</a></td>
</td>
</tr>
<tr>
<td>MiniLM L6 H384 uncased SST-2 (HuggingFace)</td>
<td>Natural Language Processing</td>
<td>Post-Training Dynamic / Static Quantization</td>
<td>
<a href="./onnxrt/nlp/huggingface_model/text_classification/quantization/ptq_dynamic">integerops</a> / <a href="./onnxrt/nlp/huggingface_model/text_classification/quantization/ptq_static">qlinearops</a>
</td>
</tr>
<tr>
<td>MiniLM L6 H384 uncased SST-2 (HuggingFace)</td>
<td>BERT base cased MRPC (HuggingFace)</td>
<td>Natural Language Processing</td>
<td>Post-Training Dynamic / Static Quantization</td>
<td>
<a href="./onnxrt/nlp/huggingface_model/text_classification/quantization/ptq_dynamic">integerops</a> / <a href="./onnxrt/nlp/huggingface_model/text_classification/quantization/ptq_static">qlinearops</a>
</td>
</tr>
<tr>
<td>Electra small discriminator MRPC (HuggingFace)</td>
<td>Natural Language Processing</td>
<td>Post-Training Dynamic / Static Quantization</td>
<td>
<a href="./onnxrt/nlp/huggingface_model/text_classification/quantization/ptq_dynamic">integerops</a> / <a href="./onnxrt/nlp/huggingface_model/text_classification/quantization/ptq_static">qlinearops</a>
</td>
</tr>
<tr>
<td>BERT mini MRPC (HuggingFace)</td>
<td>Natural Language Processing</td>
<td>Post-Training Dynamic / Static Quantization</td>
<td>
<a href="./onnxrt/nlp/huggingface_model/text_classification/quantization/ptq_dynamic">integerops</a> / <a href="./onnxrt/nlp/huggingface_model/text_classification/quantization/ptq_static">qlinearops</a>
</td>
</tr>
<tr>
<td>Xlnet base cased MRPC (HuggingFace)</td>
<td>Natural Language Processing</td>
<td>Post-Training Dynamic / Static Quantization</td>
<td>
<a href="./onnxrt/nlp/huggingface_model/text_classification/quantization/ptq_dynamic">integerops</a> / <a href="./onnxrt/nlp/huggingface_model/text_classification/quantization/ptq_static">qlinearops</a>
</td>
</tr>
<tr>
<td>BART large MRPC (HuggingFace)</td>
<td>Natural Language Processing</td>
<td>Post-Training Dynamic Quantization</td>
<td>
<a href="./onnxrt/nlp/huggingface_model/text_classification/quantization/ptq_dynamic">integerops</a></td>
</td>
</tr>
<tr>
<td>Spanbert SQuAD (HuggingFace)</td>
<td>Natural Language Processing</td>
Expand All @@ -1083,64 +1123,82 @@ Intel® Neural Compressor validated examples with multiple compression technique
<td><a href="./onnxrt/nlp/huggingface_model/question_answering/quantization/ptq_dynamic">integerops</a> / <a href="./onnxrt/nlp/huggingface_model/question_answering/quantization/ptq_static">qlinearops</a></td>
</tr>
<tr>
<td>*SSD MobileNet V1</td>
<td>DistilBert base uncased SQuAD (HuggingFace)</td>
<td>Natural Language Processing</td>
<td>Post-Training Dynamic Quantization</td>
<td><a href="./onnxrt/nlp/huggingface_model/question_answering/quantization/ptq_dynamic">integerops</a></td>
</tr>
<tr>
<td>BERT large uncased whole word masking SQuAD (HuggingFace)</td>
<td>Natural Language Processing</td>
<td>Post-Training Dynamic / Static Quantization</td>
<td><a href="./onnxrt/nlp/huggingface_model/question_answering/quantization/ptq_dynamic">integerops</a> / <a href="./onnxrt/nlp/huggingface_model/question_answering/quantization/ptq_static">qlinearops</a></td>
</tr>
<tr>
<td>Roberta large SQuAD v2 (HuggingFace)</td>
<td>Natural Language Processing</td>
<td>Post-Training Dynamic Quantization</td>
<td><a href="./onnxrt/nlp/huggingface_model/question_answering/quantization/ptq_dynamic">integerops</a></td>
</tr>
<tr>
<td>SSD MobileNet V1</td>
<td>Object Detection</td>
<td>Post-Training Static Quantization</td>
<td><a href="https://github.com/intel/neural-compressor/tree/old_api_examples/examples/onnxrt/object_detection/ssd_mobilenet_v1/quantization/ptq">qlinearops</a> / <a href="https://github.com/intel/neural-compressor/tree/old_api_examples/examples/onnxrt/object_detection/ssd_mobilenet_v1/quantization/ptq">qdq</a></td>
<td><a href="./onnxrt/object_detection/ssd_mobilenet_v1/quantization/ptq_static">qlinearops</a> / <a href="./onnxrt/object_detection/ssd_mobilenet_v1/quantization/ptq_static">qdq</a></td>
</tr>
<tr>
<td>*SSD MobileNet V2</td>
<td>SSD MobileNet V2</td>
<td>Object Detection</td>
<td>Post-Training Static Quantization</td>
<td><a href="https://github.com/intel/neural-compressor/tree/old_api_examples/examples/onnxrt/object_detection/ssd_mobilenet_v2/quantization/ptq">qlinearops</a> / <a href="https://github.com/intel/neural-compressor/tree/old_api_examples/examples/onnxrt/object_detection/ssd_mobilenet_v2/quantization/ptq">qdq</a></td>
<td><a href="./onnxrt/object_detection/ssd_mobilenet_v2/quantization/ptq_static">qlinearops</a> / <a href="./onnxrt/object_detection/ssd_mobilenet_v2/quantization/ptq_static">qdq</a></td>
</tr>
<tr>
<td>*SSD MobileNet V1 (ONNX Model Zoo)</td>
<td>SSD MobileNet V1 (ONNX Model Zoo)</td>
<td>Object Detection</td>
<td>Post-Training Static Quantization</td>
<td><a href="https://github.com/intel/neural-compressor/tree/old_api_examples/examples/onnxrt/object_detection/onnx_model_zoo/ssd_mobilenet_v1/quantization/ptq">qlinearops</a> / <a href="https://github.com/intel/neural-compressor/tree/old_api_examples/examples/onnxrt/object_detection/onnx_model_zoo/ssd_mobilenet_v1/quantization/ptq">qdq</a></td>
<td><a href="./onnxrt/object_detection/onnx_model_zoo/ssd_mobilenet_v1/quantization/ptq_static">qlinearops</a> / <a href="./onnxrt/object_detection/onnx_model_zoo/ssd_mobilenet_v1/quantization/ptq_static">qdq</a></td>
</tr>
<tr>
<td>DUC</td>
<td>Object Detection</td>
<td>Post-Training Static Quantization</td>
<td><a href="./onnxrt/object_detection/onnx_model_zoo/DUC/quantization/ptq">qlinearops</a></td>
<td><a href="./onnxrt/object_detection/onnx_model_zoo/DUC/quantization/ptq_static">qlinearops</a></td>
</tr>
<tr>
<td>*Faster R-CNN</td>
<td>Faster R-CNN</td>
<td>Object Detection</td>
<td>Post-Training Static Quantization</td>
<td><a href="https://github.com/intel/neural-compressor/tree/old_api_examples/examples/onnxrt/object_detection/onnx_model_zoo/faster_rcnn/quantization/ptq">qlinearops</a> / <a href="https://github.com/intel/neural-compressor/tree/old_api_examples/examples/onnxrt/object_detection/onnx_model_zoo/faster_rcnn/quantization/ptq">qdq</a></td>
<td><a href="./onnxrt/object_detection/onnx_model_zoo/faster_rcnn/quantization/ptq_static">qlinearops</a> / <a href="./onnxrt/object_detection/onnx_model_zoo/faster_rcnn/quantization/ptq_static">qdq</a></td>
</tr>
<tr>
<td>*Mask R-CNN</td>
<td>Mask R-CNN</td>
<td>Object Detection</td>
<td>Post-Training Static Quantization</td>
<td><a href="https://github.com/intel/neural-compressor/tree/old_api_examples/examples/onnxrt/object_detection/onnx_model_zoo/mask_rcnn/quantization/ptq">qlinearops</a> / <a href="https://github.com/intel/neural-compressor/tree/old_api_examples/examples/onnxrt/object_detection/onnx_model_zoo/mask_rcnn/quantization/ptq">qdq</a></td>
<td><a href="./onnxrt/object_detection/onnx_model_zoo/mask_rcnn/quantization/ptq_static">qlinearops</a> / <a href="./onnxrt/object_detection/onnx_model_zoo/mask_rcnn/quantization/ptq_static">qdq</a></td>
</tr>
<tr>
<td>*SSD</td>
<td>SSD</td>
<td>Object Detection</td>
<td>Post-Training Static Quantization</td>
<td><a href="https://github.com/intel/neural-compressor/tree/old_api_examples/examples/onnxrt/object_detection/onnx_model_zoo/ssd/quantization/ptq">qlinearops</a> / <a href="https://github.com/intel/neural-compressor/tree/old_api_examples/examples/onnxrt/object_detection/onnx_model_zoo/ssd/quantization/ptq">qdq</a></td>
<td><a href="./onnxrt/object_detection/onnx_model_zoo/ssd/quantization/ptq_static">qlinearops</a> / <a href="./onnxrt/object_detection/onnx_model_zoo/ssd/quantization/ptq_static">qdq</a></td>
</tr>
<tr>
<td>*Tiny YOLOv3</td>
<td>Tiny YOLOv3</td>
<td>Object Detection</td>
<td>Post-Training Static Quantization</td>
<td><a href="https://github.com/intel/neural-compressor/tree/old_api_examples/examples/onnxrt/object_detection/onnx_model_zoo/tiny_yolov3/quantization/ptq">qlinearops</a></td>
<td><a href="./onnxrt/object_detection/onnx_model_zoo/yolov3/quantization/ptq_static">qlinearops</a></td>
</tr>
<tr>
<td>*YOLOv3</td>
<td>YOLOv3</td>
<td>Object Detection</td>
<td>Post-Training Static Quantization</td>
<td><a href="https://github.com/intel/neural-compressor/tree/old_api_examples/examples/onnxrt/object_detection/onnx_model_zoo/yolov3/quantization/ptq">qlinearops</a></td>
<td><a href="./onnxrt/object_detection/onnx_model_zoo/yolov3/quantization/ptq_static">qlinearops</a></td>
</tr>
<tr>
<td>*YOLOv4</td>
<td>YOLOv4</td>
<td>Object Detection</td>
<td>Post-Training Static Quantization</td>
<td><a href="https://github.com/intel/neural-compressor/tree/old_api_examples/examples/onnxrt/object_detection/onnx_model_zoo/yolov4/quantization/ptq">qlinearops</a></td>
<td><a href="./onnxrt/object_detection/onnx_model_zoo/yolov4/quantization/ptq_static">qlinearops</a></td>
</tr>
<tr>
<td>Emotion FERPlus</td>
Expand Down
59 changes: 59 additions & 0 deletions examples/onnxrt/nlp/bert/quantization/ptq_dynamic/README.md
@@ -0,0 +1,59 @@
Step-by-Step
============

This example load a BERT model and confirm its accuracy and speed based on [GLUE data](https://gluebenchmark.com/).

# Prerequisite

## 1. Environment

```shell
pip install neural-compressor
pip install -r requirements.txt
```
> Note: Validated ONNX Runtime [Version](/docs/source/installation_guide.md#validated-software-environment).
## 2. Prepare Dataset

download the GLUE data with `prepare_data.sh` script.
```shell
export GLUE_DIR=path/to/glue_data
export TASK_NAME=MRPC

bash prepare_data.sh --data_dir=$GLUE_DIR --task_name=$TASK_NAME
```

## 3. Prepare Model

Please refer to [Bert-GLUE_OnnxRuntime_quantization guide](https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/python/tools/quantization/notebooks/Bert-GLUE_OnnxRuntime_quantization.ipynb) for detailed model export.

Run the `prepare_model.sh` script


Usage:
```shell
bash prepare_model.sh --input_dir=./MRPC \
--task_name=$TASK_NAME \
--output_model=path/to/model # model path as *.onnx
```

# Run

## 1. Quantization

Dynamic quantization:

```bash
bash run_tuning.sh --input_model=path/to/model \ # model path as *.onnx
--output_model=path/to/model_tune \ # model path as *.onnx
--dataset_location=path/to/glue_data
```

## 2. Benchmark

```bash
bash run_benchmark.sh --input_model=path/to/model \ # model path as *.onnx
--dataset_location=path/to/glue_data \
--batch_size=batch_size \
--mode=performance # or accuracy
```

0 comments on commit a4228df

Please sign in to comment.