Paddle Benchmark

Inference benchmark of deep learning models implemented by paddlepaddle.

Environment

MI 5, Android 7.0, Snapdragon 820 1.8GHz
android-ndk-r13b
- gcc version 4.9.x 20150123 (prerelease) (GCC)
- Android clang version 3.8.256229 (based on LLVM 3.8.256229)

Mobilenet

Benchmark for Mobilenet inference(input image 3x224x224).

Currently, on MI 5 phones, single-threaded inference takes 122.607ms and takes up 48M of system memory.

version	times(ms)	mem(MB)	size(KB)	optimization(accelerate)
d2258a4	321.682	-	-	base
d2258a4	225.044	-	-	merge bn(30%)
b45d020	148.201	-	-	depthwise convolution(34.1%)
0146e8b	127.032	-	-	clang compile(14.3%)
d59295f	122.607	48	4306 -> 1431	neon::relu(3.5%)

The convolution layer of the Base version is achieved by im2col + gemm way.
The merge bn optimization is merge the parameters of batch normalization layer's into the parameters of convolution layer.
The depthwise convolution is a depthwise convolution optimization base on arm neon intrinsics.
The clang compile is better than gcc compile.
The test method of mem(MB) is running the paddle inference program, and use the free command access the changes of memory usage in the system.
The previous value in size (KB) column is the size of the paddle inference.so, and the latter is the size after zip compressed.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
FindPaddle.cmake		FindPaddle.cmake
README.md		README.md
inference.cc		inference.cc