Skip to content
This repository has been archived by the owner on Aug 5, 2022. It is now read-only.

Result not correct when applying it to SSD (Single Shot MultiBox Detector) #10

Closed
heagoo opened this issue Oct 27, 2016 · 15 comments
Closed

Comments

@heagoo
Copy link

heagoo commented Oct 27, 2016

I merged to code of SSD (https://github.com/weiliu89/caffe/tree/ssd) and intel caffe, but when run ssd_detect with SSD300, the result is not correct if MKL2017_AS_DEFAULT_ENGINE is enabled.
After code merge, following 2 things have been confirmed:

  1. when only with MKL, ssd_detect shows correct result
  2. when enable MKL2017_AS_DEFAULT_ENGINE, cpp_classification work correctly with VGG16 (classification result is correct)
    I guess SSD300 has a different input shape (3x300x300, not 3x224x224), thus make the convolution result is not correct.
    Could you please help to look into it? Thanks!
@nickltj86
Copy link

hi @heagoo would you be willing to share your work on the merge with only MKL? would be interesting to see the results and performance with intelcaffe. do you have rough performance numbers per frame?

@heagoo
Copy link
Author

heagoo commented Oct 27, 2016

@nickltj86 The performance is not so good (262ms per frame with 32 cores), I think it could be much better after enable MKL2017_AS_DEFAULT_ENGINE, but unfortunately, the result is not correct yet.

@nickltj86
Copy link

Thanks. Good to know the performance. I think even after enabling MKL2017, the performance may not increase too much. From the benchmark, the new layers which are not optimized with MKL or MKL2017 such as the mbox layer will cause the most delays. The performance of the common cnn layers are very optimized and seems like as fast as those running on CUDA. the other non-optimized layers are a different story.

Do you have a fork with the merge? I think the Intel folks will need that to evaluate it as well.

@heagoo
Copy link
Author

heagoo commented Oct 27, 2016

@jdukat
Copy link

jdukat commented Oct 28, 2016

Could you please try with MKL2017 engine enabled, but for layers that cause problems explicitly select engine CAFFE?
For reference, you can see here how to select different engine is selected layer:
https://github.com/intel/caffe/blob/master/models/mkl2017_googlenet_v1_knl/train_val.prototxt

We are working on fixes to simplify this process and avoid manual engine selection.

@heagoo
Copy link
Author

heagoo commented Oct 28, 2016

@jdukat Yes, I've already do this, but the result is different from just using MKL, much different, and totally not correct. I tried:

  1. only add MKL2017 engine to the first conv layer, works. (but seems still call the im2col implementation)
  2. add MKL2017 to any other conv layers, NOT work.
    It may be related with the DNN function. Is there any detail description for the MKL DNN functions?
    And, the strange thing is that VGG16 works, do you think the input size matters? (VGG16 is 224x224, and SSD is 300x300 or 500x500)

@jdukat
Copy link

jdukat commented Oct 29, 2016

Input size matters for performance with MKL2017 for sure. It should not matter for correctness and it seems to be a bug. We should take a closer look at this, but at the moment I am not able to put this on top of my priorities.

I'll see what I can do about documentation for MKL DNN functions.

@jdukat
Copy link

jdukat commented Oct 30, 2016

@heagoo
Copy link
Author

heagoo commented Oct 31, 2016

Thanks!
Confirmed, it's a bug of MKL, we can solve the issue by downloading https://github.com/intel-mxnet/mkl-release/releases/download/self_containted_MKLGOLD/mklml_lnx_2017.0.1.20161005.tgz
And now, performance boost to 88ms under the same setting, great!

@heagoo heagoo closed this as completed Nov 2, 2016
@jdukat
Copy link

jdukat commented Nov 2, 2016

Great, the same MKL version will be released for Caffe within next few days.
@heagoo do you plan to issue a pull request with your changes for Intel Caffe?

@heagoo
Copy link
Author

heagoo commented Nov 5, 2016

Sure, will do it. @jdukat

@ngaloppo
Copy link

ngaloppo commented Nov 9, 2016

Is it confirmed that this bug exists in current master? If so, can we open this issue again to track the bug until it is fixed?

@heagoo
Copy link
Author

heagoo commented Nov 10, 2016

This bug does not exist any more in the latest update with MKL 2017 update 1.

@amoussawi
Copy link

Hey @heagoo
Could you please tell me what intel processor are you using if you don't mind? Thanks :)

@heagoo
Copy link
Author

heagoo commented Nov 25, 2016 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants