Skip to content

benchmark

Houjiang Chen edited this page Dec 21, 2017 · 24 revisions
  • mobilenet benchmark on xiaomi MI5 armv7-a(interactive模式)
    fator=1.0

    framework speed cpu memory size
    tensorflow-lite 401ms 42% 120M 801K
    ncnn(1 threads) 310ms 25% 43M 322K
    ncnn(2 threads) 172ms 40% 43M 322K
    ncnn(4 threads) 133ms 70% 43M 322K
    paddlepaddle 306ms 25% 210M 3M
    caffe2
  • mobilenet benchmark on xiaomi MI5 armv7-a(performance模式)
    fator=1.0

    framework speed cpu memory size
    tensorflow-lite 380ms 42% 120M 801K
    ncnn(1 threads) 297ms 25% 43M 322K
    ncnn(2 threads) 160ms 45% 43M 322K
    ncnn(4 threads) 133ms 70% 43M 322K
    paddlepaddle 303ms 25% 210M 3M
    caffe2
  • mobilenet benchmark on xiaomi MI5 armv7-a(userspace模式)
    cpu0和cpu1锁频到1363MHz,cpu2和cpu3锁频到1401MHz
    fator=1.0

    framework speed cpu memory size
    tensorflow-lite 405ms 42% 120M 801K
    ncnn(1 threads) 376ms 25% 43M 322K
    ncnn(2 threads) 206ms 45% 43M 322K
    ncnn(4 threads) 138ms 70% 43M 322K
    paddlepaddle 353ms 25% 210M 3M
    paddlepaddle(2 threads) 290ms 42% 210M 3M
    paddlepaddle(4 threads) 253ms 50% 210M 3M
    caffe2
  • mobilenet benchmark on xiaomi MI5 armv7-a(userspace模式)
    cpu0和cpu1锁频到1363MHz,cpu2和cpu3锁频到1401MHz
    fator=1.0 merge batchnorm

    framework speed cpu memory size
    paddlepaddle 247ms 25% 91M 3M
    paddlepaddle(2 threads) 167ms 42% 91M 3M
    paddlepaddle(4 threads) 121ms 50% 91M 3M

在测tensorflow-lite的benchmark过程中,发现tensorflow-lite的8bit量化后计算效率相比float有将近4倍的加速,所以整理了一下tensorflow-lite中使用的一些kernel的加速方案。具体可以查看tensorflow-lite的计算加速方案

Clone this wiki locally