benchmark

mobilenet benchmark on xiaomi MI5 armv7-a（interactive模式）
fator=1.0

framework	speed	cpu	memory	size
tensorflow-lite	401ms	42%	120M	801K
ncnn(1 threads)	310ms	25%	43M	322K
ncnn(2 threads)	172ms	40%	43M	322K
ncnn(4 threads)	133ms	70%	43M	322K
paddlepaddle	306ms	25%	210M	3M
caffe2

mobilenet benchmark on xiaomi MI5 armv7-a（performance模式）
fator=1.0

framework	speed	cpu	memory	size
tensorflow-lite	380ms	42%	120M	801K
ncnn(1 threads)	297ms	25%	43M	322K
ncnn(2 threads)	160ms	45%	43M	322K
ncnn(4 threads)	133ms	70%	43M	322K
paddlepaddle	303ms	25%	210M	3M
caffe2

mobilenet benchmark on xiaomi MI5 armv7-a（userspace模式）
cpu0和cpu1锁频到1363MHz，cpu2和cpu3锁频到1401MHz
fator=1.0

framework	speed	cpu	memory	size
tensorflow-lite	405ms	42%	120M	801K
ncnn(1 threads)	376ms	25%	43M	322K
ncnn(2 threads)	206ms	45%	43M	322K
ncnn(4 threads)	138ms	70%	43M	322K
paddlepaddle	353ms	25%	210M	3M
paddlepaddle(2 threads)	290ms	42%	210M	3M
paddlepaddle(4 threads)	253ms	50%	210M	3M
caffe2

在测tensorflow-lite的benchmark过程中，发现tensorflow-lite的8bit量化后计算效率相比float有将近4倍的加速，所以整理了一下tensorflow-lite中使用的一些kernel的加速方案。具体可以查看tensorflow-lite的计算加速方案

Provide feedback

framework	speed	cpu	memory	size
paddlepaddle	247ms	25%	91M	3M
paddlepaddle(2 threads)	167ms	42%	91M	3M
paddlepaddle(4 threads)	121ms	50%	91M	3M