-
Notifications
You must be signed in to change notification settings - Fork 2
benchmark
Houjiang Chen edited this page Dec 21, 2017
·
24 revisions
-
mobilenet benchmark on xiaomi MI5 armv7-a(interactive模式)
fator=1.0framework speed cpu memory size tensorflow-lite 401ms 42% 120M 801K ncnn(1 threads) 310ms 25% 43M 322K ncnn(2 threads) 172ms 40% 43M 322K ncnn(4 threads) 133ms 70% 43M 322K paddlepaddle 306ms 25% 210M 3M caffe2 -
mobilenet benchmark on xiaomi MI5 armv7-a(performance模式)
fator=1.0framework speed cpu memory size tensorflow-lite 380ms 42% 120M 801K ncnn(1 threads) 297ms 25% 43M 322K ncnn(2 threads) 160ms 45% 43M 322K ncnn(4 threads) 133ms 70% 43M 322K paddlepaddle 303ms 25% 210M 3M caffe2 -
mobilenet benchmark on xiaomi MI5 armv7-a(userspace模式)
cpu0和cpu1锁频到1363MHz,cpu2和cpu3锁频到1401MHz
fator=1.0framework speed cpu memory size tensorflow-lite 405ms 42% 120M 801K ncnn(1 threads) 376ms 25% 43M 322K ncnn(2 threads) 206ms 45% 43M 322K ncnn(4 threads) 138ms 70% 43M 322K paddlepaddle 353ms 25% 210M 3M paddlepaddle(2 threads) 290ms 42% 210M 3M paddlepaddle(4 threads) 253ms 50% 210M 3M caffe2 -
mobilenet benchmark on xiaomi MI5 armv7-a(userspace模式)
cpu0和cpu1锁频到1363MHz,cpu2和cpu3锁频到1401MHz
fator=1.0 merge batchnormframework speed cpu memory size paddlepaddle 247ms 25% 91M 3M paddlepaddle(2 threads) 167ms 42% 91M 3M paddlepaddle(4 threads) 121ms 50% 91M 3M
在测tensorflow-lite的benchmark过程中,发现tensorflow-lite的8bit量化后计算效率相比float有将近4倍的加速,所以整理了一下tensorflow-lite中使用的一些kernel的加速方案。具体可以查看tensorflow-lite的计算加速方案