model_quantization

A model quantization example using ONNX.

For more details, please read my blog in 中文 or in English.

Environment Setup

Make sure git-lfs is installed in your computer.

Install the Python environment:

git lfs install
git submodule update --init
pip install poetry
poetry install --no-root

If you are using GPU, please run the following command after poetry install --no-root

poetry remove onnxruntime
poetry add onnxruntime-gpu

Run

poetry run python main.py

Experiment Results on CPU

Experimenting with one of the DistilBERT models fine-tuned on the IMDB dataset from HuggingFace, available here.

The results running on a MacBook Air M1 CPU and Windows 10 WSL with an i5-8400 CPU are provided below (results may vary on different platforms):

	Model Size	Inference Time per Instance	Accuracy
PyTorch Model (MAC)	256MB	71.1ms	93.8%
ONNX Model(MAC)	256MB	113.5ms	93.8%
ONNX 8-bit Model(MAC)	64MB	87.7ms	93.75%
PyTorch Model (Win)	256MB	78.6ms	93.8%
ONNX Model(Win)	256MB	85.1ms	93.8%
ONNX 8-bit Model(Win)	64MB	61.1ms	93.85%

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
imdb @ f27efa2		imdb @ f27efa2
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE.md		LICENSE.md
README.md		README.md
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

imdb @ f27efa2

imdb @ f27efa2

.gitignore

.gitignore

.gitmodules

.gitmodules

LICENSE.md

LICENSE.md

README.md

README.md

main.py

main.py

poetry.lock

poetry.lock

pyproject.toml

pyproject.toml

Repository files navigation

model_quantization

Environment Setup

Run

Experiment Results on CPU

About

Releases

Packages

Languages

License

lintseju/model_quantization

Folders and files

Latest commit

History

Repository files navigation

model_quantization

Environment Setup

Run

Experiment Results on CPU

About

Resources

License

Stars

Watchers

Forks

Languages