-
Notifications
You must be signed in to change notification settings - Fork 979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference performance of bvlc_alexnet is far more slower on mkl-dnn #17
Comments
Hi @etaf, By default IntelCaffe uses lazy stream for mkl-dnn execution, which means actual execution might be postponed until MKL-DNN integration also has lazy primitive initialization -- primitive creation happens during the first run (i.e. during first |
Thanks for your answer.
Is the gap determined by the the position of dropout layer ( the previous layer) ? Why?
Thanks. |
Hi @etaf,
It is not the performance gap -- all the computations simply happen in drop-out layer (from Caffe perspective). Overall run-time is absolutely the same. If this confuses replace lazy stream with eager one: the behavior will be more intuitive in this case.
For now mkl-dnn supports convolution, relu, lrn, pooling, inner-product, concat, split, and elwise. All the other layers will fallback to native Caffe implementation. For the popular topologies like AlexNet, GoogleNet, ResNet, VGG all the most compute intensive layers are covered. We don't have particular plans for new primitives for now... our main current focus is to optimized backward computations and provide optimizations at least on the same level as Intel MKL does. |
I've replace lazy stream with eager one. The execute time is shared among different layers now. But I still confused why the first time and second time in AlexNet is more than 30X but google net is less than 2X. Thanks! |
Hi, @emfomenk |
I believe I've already answered here:
Compare mkl vs. mkl-dnn integration. |
Closing as no further questions are being posted. |
I found the inference performance of bvlc_alexnet is far more slower on mkl-dnn.
In intel-caffe build with mkl-dnn engine, I run the caffe/examples/cpp_classification example. And collect the elapsed time of the line: net_->Forward();
It's about 835 ms.
But the result of intel-caffe with mkl-2017 is 16 ms.
Then I add the following code to caffe/examples/cpp_classification/classification.cpp
The result is:
first time: 835 ms
second time: 15.32 ms
It's wired there is huge gape between first time and second time.
In intel-caffe with mkl2017, the result is
first time: 18 ms
second time: 16 ms
I collected each layer's forward time for the first inference.
I found the time is wasted in the first dropout layer. In
caffe/src/caffe/mkldnn_memory.cpp => MKLDNNMemoryDescriptor<Dtype, is_diff>::on_to_cpu()
=> StreamHolder::Instance().current_stream()->wait();
The first dropout layer is connected behind a fc(full connected) layer.
I've tried other models, the result of bvlc_reference_caffenet , vgg_16, vgg_19 is similar with bvlc_alexnet.
They all have a dropout layer behind a fc layer.
But bvlc_googlenet do not have large gape between first and second time.
The dropout layer of bvlc_googlenet is not connected behind a fc layer.
Is this an known issue?
The text was updated successfully, but these errors were encountered: