Skip to content

An inference engine for 8-bit quantized neural network

Notifications You must be signed in to change notification settings

haojunyong/INT8InferenceEngine

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

INT8InfernceEngine

An inference engine for 8-bit quantized neural network

The computing kernel design leverage intel mkl gemm and openmp to parallize the procedure.

The design of interface mimic pytorch's in order to make pytorch user get started quickly.

Other miscs, thanks to numpy array interface, you can transfer data between pytorch, numpy and this inference engine with little overhead. Also, something like data loading/preprocessing could be done in pytorch or numpy to reduce repeated work.

requirement

python

  • torch
  • torchvision
  • numpy
  • pybind11

lib

  • mkl
  • openmp

Installation troubleshooting

Benchmark

Testing on resized 224x224 cifar10 with alexnet

cpu: i9-9900k

original accuracy: 77.8%

pytorch int8 accuracy: 77.4%

my engine int8 accuracy: 76.1%

Batch size Pytorch FP32 Pytorch INT8 My Engine FP32 My Engine INT8
10 50.4s 28.6s 1m16s 1m2s
100 37s 23.9s 48.3s 36.6s
1000 37.9s 27.4s 45.9s 34.2s

Testing with different size of network and input data(100 minibatch)

Model/Data Pytorch FP32 My Engine FP32 My Engine INT8
one fully connected/mnist 9ms 3.76ms 19.6ms
simple convolution/cifar10 1.29s 1.43s 1.39s

About

An inference engine for 8-bit quantized neural network

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 62.7%
  • Python 32.5%
  • CMake 3.2%
  • C 1.6%