Skip to content

lintseju/model_quantization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

model_quantization

A model quantization example using ONNX.

For more details, please read my blog in 中文 or in English.

Environment Setup

Make sure git-lfs is installed in your computer.

Install the Python environment:

git lfs install
git submodule update --init
pip install poetry
poetry install --no-root

If you are using GPU, please run the following command after poetry install --no-root

poetry remove onnxruntime
poetry add onnxruntime-gpu

Run

poetry run python main.py

Experiment Results on CPU

Experimenting with one of the DistilBERT models fine-tuned on the IMDB dataset from HuggingFace, available here.

The results running on a MacBook Air M1 CPU and Windows 10 WSL with an i5-8400 CPU are provided below (results may vary on different platforms):

Model Size Inference Time per Instance Accuracy
PyTorch Model (MAC) 256MB 71.1ms 93.8%
ONNX Model(MAC) 256MB 113.5ms 93.8%
ONNX 8-bit Model(MAC) 64MB 87.7ms 93.75%
PyTorch Model (Win) 256MB 78.6ms 93.8%
ONNX Model(Win) 256MB 85.1ms 93.8%
ONNX 8-bit Model(Win) 64MB 61.1ms 93.85%

About

Model lightweighting example using ONNX.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages