Why the speed in minicaffe is much worse than caffe with same prototxt #48

yonghenglh6 · 2017-09-08T13:10:29Z

I compare the resnet from your run_test.cpp.
But the performance is like below. The speed drop down than caffe.
Any ideas?
I'm useing the newest caffe with cudnn 5.1.5.

Thanks

luoyetx · 2017-09-10T04:26:40Z

would you mind to paste your test code?

yonghenglh6 · 2017-09-11T05:28:14Z

In mini-caffe, I just add a "for(int i=0;i<100;i++)" before "test.Forward();" in run_net.cpp, and get the time from your output.
In caffe, I use the caffe time, which is a mistake for memory use because it includes the backward process.

yonghenglh6 · 2017-09-11T05:31:56Z

I get the memory use from "nvidia-smi" with watching in the flesh.

yonghenglh6 · 2017-09-11T05:37:31Z

would u like to put up your performance with the resnet for a bug checking of mine? Thank you.

luoyetx · 2017-09-11T07:09:27Z

I will test the network prototxt on 1070 later. With more details on mini-caffe and official caffe.

yonghenglh6 · 2017-09-12T07:53:59Z

I use those code to test your every layer's time. But cannot find the reason.

And the result is:

yonghenglh6 · 2017-09-12T07:55:34Z

Your net->Forward(2,3) give me an error. So I can only use net->Forward(0,x) to get the time from the begin and sub them.

yonghenglh6 · 2017-09-12T08:12:17Z

As the net get longer, the performance diff between mini-caffe and caffe become larger, when I test them by adding layer step by step for net construction.

yonghenglh6 · 2017-09-12T08:29:06Z

Update the performance up.

yonghenglh6 · 2017-09-12T09:17:43Z

I checked the cudnn and assured it ran well by adding some output info.

luoyetx · 2017-09-12T10:15:10Z

please refer to profile.md to check the layer wise performance. I am writing tools to do the network benchmark.

yonghenglh6 · 2017-09-12T10:26:59Z

I have tried Profile in the beginning, but the time shown in chrome is not consistent with my test result. However, it's most possible that I made the wrong usage way.

luoyetx · 2017-09-12T10:28:44Z

Pay attention to the Timer, it's not accuracy, use Profiler::Now() instead.

luoyetx · 2017-09-13T11:45:35Z

@yonghenglh6 You can try the benchmark branch, I modify the Profiler log, it now prints layer name not layer type. You can get same layer wise performance through profile.json

luoyetx · 2017-09-13T12:04:32Z

I find the performance is not stable under Windows platform, I will test on Linux later.

yonghenglh6 · 2017-09-14T13:04:47Z

With your new benchmark tool, I found the bn layer is the main part that cause the difference, your bn is cost twice time than conv. So I changed to test it on google without bn layer and the result shows similar performance between caffe and mini-caffe.
Here is the details. By the way, my caffe use atlas lib, which I think it's not important.

luoyetx · 2017-09-14T14:11:00Z

There is an optimization in this commit for BatchNorm Layer. I will update this from official caffe.

yonghenglh6 · 2017-09-22T06:49:34Z

Everytime you request memory from pool, the blob will be in uninitial state, then it will call gpu_memset function in "to_gpu()", which will cost about 10% time more than original caffe whose blob will be keeped and in head_on_gpu state.

luoyetx · 2017-09-22T08:20:46Z

The default behave is the same in official Caffe here.

yonghenglh6 · 2017-09-22T08:42:19Z

Yes, but the official Caffe need not reallocate blob every forward and not call the function frequently. But your minicaffe keeping on setting the blob state to uninitial to reuse the memory and then rememset the memory when calling to_gpu(), which has caused a performance problem.
It can be optimized by designing the memset moment in layers carefully instead of doing it everytime "to_gpu()" called.
In short words, this may be a problem of only minicaffe, even the code is the same

luoyetx · 2017-09-22T08:47:18Z

you mean this function? It is a problem that this function called every time a new memory is requested. I think we can remove this function call, as the dirty data in the memory seems no problem for late use. What do you think?

yonghenglh6 · 2017-09-22T08:48:23Z

Yes.

yonghenglh6 closed this as completed Oct 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why the speed in minicaffe is much worse than caffe with same prototxt #48

Why the speed in minicaffe is much worse than caffe with same prototxt #48

yonghenglh6 commented Sep 8, 2017 •

edited

luoyetx commented Sep 10, 2017

yonghenglh6 commented Sep 11, 2017

yonghenglh6 commented Sep 11, 2017

yonghenglh6 commented Sep 11, 2017

luoyetx commented Sep 11, 2017

yonghenglh6 commented Sep 12, 2017 •

edited

yonghenglh6 commented Sep 12, 2017 •

edited

yonghenglh6 commented Sep 12, 2017 •

edited

yonghenglh6 commented Sep 12, 2017

yonghenglh6 commented Sep 12, 2017

luoyetx commented Sep 12, 2017

yonghenglh6 commented Sep 12, 2017

luoyetx commented Sep 12, 2017

luoyetx commented Sep 13, 2017

luoyetx commented Sep 13, 2017

yonghenglh6 commented Sep 14, 2017

luoyetx commented Sep 14, 2017

yonghenglh6 commented Sep 22, 2017 •

edited

luoyetx commented Sep 22, 2017

yonghenglh6 commented Sep 22, 2017 •

edited

luoyetx commented Sep 22, 2017

yonghenglh6 commented Sep 22, 2017

Why the speed in minicaffe is much worse than caffe with same prototxt #48

Why the speed in minicaffe is much worse than caffe with same prototxt #48

Comments

yonghenglh6 commented Sep 8, 2017 • edited

luoyetx commented Sep 10, 2017

yonghenglh6 commented Sep 11, 2017

yonghenglh6 commented Sep 11, 2017

yonghenglh6 commented Sep 11, 2017

luoyetx commented Sep 11, 2017

yonghenglh6 commented Sep 12, 2017 • edited

yonghenglh6 commented Sep 12, 2017 • edited

yonghenglh6 commented Sep 12, 2017 • edited

yonghenglh6 commented Sep 12, 2017

yonghenglh6 commented Sep 12, 2017

luoyetx commented Sep 12, 2017

yonghenglh6 commented Sep 12, 2017

luoyetx commented Sep 12, 2017

luoyetx commented Sep 13, 2017

luoyetx commented Sep 13, 2017

yonghenglh6 commented Sep 14, 2017

luoyetx commented Sep 14, 2017

yonghenglh6 commented Sep 22, 2017 • edited

luoyetx commented Sep 22, 2017

yonghenglh6 commented Sep 22, 2017 • edited

luoyetx commented Sep 22, 2017

yonghenglh6 commented Sep 22, 2017

yonghenglh6 commented Sep 8, 2017 •

edited

yonghenglh6 commented Sep 12, 2017 •

edited

yonghenglh6 commented Sep 12, 2017 •

edited

yonghenglh6 commented Sep 12, 2017 •

edited

yonghenglh6 commented Sep 22, 2017 •

edited

yonghenglh6 commented Sep 22, 2017 •

edited