Skip to content
This repository has been archived by the owner on Aug 5, 2022. It is now read-only.

performance dropped to the bottom when running on Atom device linked with MKL2017 #38

Closed
delock opened this issue Jan 25, 2017 · 5 comments

Comments

@delock
Copy link

delock commented Jan 25, 2017

We tried to evaluate performance of Intel Caffe running on Atom. The out-of-box Intel Caffe out perform bvlc Caffe, thats nice to have. However, when we tried running Intel Caffe with MKL2017, and expect it would give performance boost as on Xeon and Xeon Phi, the performance dropped more than 20x instead.

AlexNet, batch size =1
bvlc Caffe + mkl: 12 FPS
Intel Caffe OOB: 15 FPS
Intel Caffe with MKL2017: 0.7 FPS

GoogleNet, batch size = 8
bvlc Caffe + mkl: 4.2 FPS
Intel Caffe OOB: 11 FPS
Intel Caffe with MKL2017: 0.25 FPS

@michalkuligowski
Copy link
Contributor

Hi,
Intel® Math Kernel Library 2017 is optimized for the Intel® Xeon Phi™ processors. Please try using older Intel® MKL versions for Atom processor.

@pnoga
Copy link
Contributor

pnoga commented Jan 25, 2017

  1. Did you tried on Atom or Xeon Phi processor?
  2. Did you had AVX 512 enabled?
  3. Did you used proper engine selection to use MKL 2017? (in command line add -engine=MKL2017)

@pnoga
Copy link
Contributor

pnoga commented Feb 1, 2017

No response - closing

@pnoga pnoga closed this as completed Feb 1, 2017
@delock
Copy link
Author

delock commented Feb 3, 2017

back from vacation.

  1. Did you tried on Atom or Xeon Phi processor?
    A: on Atom processor. Model name is Atom C2750 2.4GHz
  2. Did you had AVX 512 enabled?
    A: This process does not support AVX512 or AVX2
  3. proper engine selection is used, on latest Intel Caffe (with new MKL release 2017.0.2.20170110), performance is better but still lower than OOB Intel Caffe:
    batch size changed to 1
    ./build/tools/caffe time -engine=MKL2017 -model models/bvlc_alexnet/deploy.prototxt
    Average Forward pass: 147.002 ms.
    ./build/tools/caffe time -model models/bvlc_alexnet/deploy.prototxt
    Average Forward pass: 66.6948 ms.

@jdukat
Copy link

jdukat commented Feb 3, 2017

This work is aimed at Intel(r) Xeon(tm) and Intel XeonPhi(tm) optimization.
Atom processors are not our target and we will not optimize for it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants