Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU version with MKL #12

Open
dwSun opened this issue Dec 6, 2017 · 11 comments
Open

CPU version with MKL #12

dwSun opened this issue Dec 6, 2017 · 11 comments

Comments

@dwSun
Copy link
Member

dwSun commented Dec 6, 2017

Can you provide a CPU version built with MKL?

@danqing
Copy link
Contributor

danqing commented Dec 7, 2017

Will provide in the next few days.

@dwSun
Copy link
Member Author

dwSun commented Dec 7, 2017

Thanks.

@dwSun
Copy link
Member Author

dwSun commented Dec 8, 2017

TensorFlow 1.4 (CPU, MKL) is slower than TensorFlow 1.4 (CPU)
inference time:
TensorFlow 1.4 (CPU) about 280ms
TensorFlow 1.4 (CPU, MKL) about 400ms
TensorFlow 1.4 (from pypi use pip) about 800ms

I also noticed that, this version doesn't need MKL. I don't have MKL installed on my system when using this version the first time.

My system information:
OS: debian-sid
CPU: i7 6500U codename skylake.

@danqing
Copy link
Contributor

danqing commented Dec 9, 2017

That's weird. Will take a look tomorrow.

@WorksWellWithOthers
Copy link

@dwSun @danqing
I would also like to know the result of this as well.

My 1060 is looking to be twice as fast than my Xeon Phi on a different benchmark which makes me believe that I'm doing something drastically wrong with the installation (no MKL). I used the wheel provided by Intel for TF 1.4.

Benchmarks:
tensorflow/tensorflow#8584

@chricke
Copy link

chricke commented Jan 15, 2018

I also have issues with the MKL versions provided. It even seems to have an impact on the speed of the computations on the GPU. I had to switch back to an older version of Tensorflow without MKL to be able to continue testing and development.

Please also provide wheels for 1.4.1 without MKL.

@danqing
Copy link
Contributor

danqing commented Jan 15, 2018

Yeah I'm still investigating this issue. The GPU builds with MKL should be fine though - @chricke do you have problem with those or just the CPU-only ones?

@chricke
Copy link

chricke commented Jan 16, 2018

I had this issue with the GPU versions while testing a new GPU server we setup at work. Training on the server using one GPU (V100) was slower than training on my lokal GTX 1070.

First i thought it has to do with Keras as switching the optimizer from Keras to native Tensorflow had a big impact (training time down from > 4 hours to 30 minutes per epoc). But that was still slower than on my lokal machine. After checking every step of the setup on another machine it turned out that it was the MKL build of Tensorflow which was causing this. After changing to a build without MKL training on the server was much faster (as expected).

BTW: As i mentioned switching from Keras Optimizer to the Tensorflow version the difference in training time was huge (4hours to 30 minutes) with MKL build. Using non MKL even this difference was much less (only ~5 minutes difference).

@danqing
Copy link
Contributor

danqing commented Jan 24, 2018

That seems to be related to tensorflow/tensorflow#14496. Trying to figure out if there are solutions around this issue.

@vivek-rane
Copy link

I would strongly recommend reading through https://www.tensorflow.org/performance/performance_guide#tensorflow_with_intel_mkl_dnn to tune CPU performance. It basically involves tweaking a few variables related to your environment.

@beew
Copy link

beew commented May 18, 2018

On Linux what if you add this line to .profile or .bashrc?

export MKL_THREADING_LAYER=GNU

Not sure if it has to do with TF, but MKL was much slower than openblas for things like matrix multiplication. After adding this it beats openblas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants