Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Halide backend support for deep learning layers #1186

Merged
merged 1 commit into from
Jun 15, 2017

Conversation

dkurt
Copy link
Member

@dkurt dkurt commented May 25, 2017

Implemented some layers using Halide language backend:

Layer Covered all configurations Test
Concat by channels (axis = 1) x
Convolution x x
Deconvolution no group factor x
(attachable) ReLU, TanH, Sigmoid, Abs, BNLL, Power, PReLU, Batch normalization, Scale x x
Element-wise addition, multiplication, maximum x x
Fully-connected axis = 1 x
LRN (across and within channels) x x
Maximum unpooling with indices support x x
Maximum pooling x x
Average pooling only whole kernels, no padding x
SoftMax axis == 1, no spatial size more than 1x1 x

Supported CPU only, float32 computations. No ahead-of-time compilation.
Added tutorial with installation guide and sample, tests for some layers and networks, performance tests.

Generalized manual scheduling of 5 networks into the layer-wise automatic scheduling.

Intel® Core™ i7-6700K CPU @ 4.00GHz x 8

Architecture Manual scheduling Automatic scheduling Ratio
AlexNet 22.38ms 25.5ms x0.87
ResNet-50 73.91ms 84.26ms x0.87
SqueezeNet v1.1 8.21ms 6.44ms x1.27
Inception-5h 37.06ms 32.01ms x1.15
ENet @ 3x512x256 41.21ms 41.07ms x1.00

For testing purpose have taken GoogLeNet network. L1 diff is 1.26789e-09. Efficiency is 33ms using Halide (automatic scheduling) against 34.89ms using MKL (x1.05).

TODO list:

  • Reproduce Halide installation by tutorial (Linux) halide/Halide@abef1d9
  • Reproduce Halide installation by tutorial (Windows): installation reproduced but SegFault appears (Some layers works with replacing vectorization by x variable to factor 4 (from 8 or 16). All right in case of no scheduling).
  • Cover more layers and configurations in tests
  • Write scheduling tutorial
  • Remove local paths
  • Figure out about supportBackend and layer configurations (axis, etc.)
  • Automatic scheduling with no more than 20% efficiency loss.

@dkurt dkurt force-pushed the halide_support branch 2 times, most recently from 71cd4b3 to 3ee3f41 Compare May 25, 2017 14:06
{ \
net.forward(net.getLayerId(outputLayer)); \
} \
SANITY_CHECK_NOTHING();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use "static void" function instead. It is very hard to debug multiline macro.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Replaced to static function.

set(DNN_LIBRARIES
${DNN_LIBRARIES}
${HALIDE_LIBRARIES}
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list(APPEND DNN_INCLUDE_DIRS ${HALIDE_INCLUDE_DIRS})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eliminated CMakeLists changes at all.


inline std::string getOpenCVExtraDir()
{
return "/home/dkurtaev/opencv_extra/testdata/";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1162 + findDataFile() function

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified modules/dnn/perf/perf_main.cpp for it.

Measured on Intel® Core™ i7-6700K CPU @ 4.00GHz x 8.

<html>
<table>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use markdown tables. Example. Tutorial

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

using namespace cv;
using namespace dnn;

void loadNet(const std::string& weights, const std::string& proto,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use "static" or "anonymous namespace" for internal functions.

BTW, Where is #include directive in this file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's working somehow. Should I add it for prevent fails in different build cases?

@dkurt dkurt force-pushed the halide_support branch 6 times, most recently from 226db43 to e9b2d2f Compare May 29, 2017 15:13
@dkurt dkurt force-pushed the halide_support branch 14 times, most recently from 1070e98 to 17e9f46 Compare June 8, 2017 10:33
*/
enum Backend
{
DEFAULT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename them to DNN_BACKEND_DEFAULT, DNN_BACKEND_HALIDE, to make it more self-explanatory

*/
enum HalideTarget
{
HALIDE_CPU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, let's use DNN_TARGET_CPU, DNN_TARGET_GPU etc. I would not use the word "Halide" here, because other backends, like OpenVX or even the default one, can be potentially customized to use CPU, GPU or other device to run the networks

* it helps prevent some memory management issues (if something wrong,
* Halide tests will be failed).
*/
virtual std::vector<void*> initHalide(const std::vector<void*> &inputs);
Copy link
Contributor

@vpisarev vpisarev Jun 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of very low-level void* I would use some abstract class HalideFunc (that can encapsulate void*) with a smart pointer to it. It's comparatively low overhead, but it would be much more safe from memory management point of view. That is, what about changing std::vector<void*> to std::vector<Ptr<HalideFunc>>?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduced BackendNode and BackendWrapper whose derivatives store explicit Halide::Func and Halide::Buffer correspondingly. Moreover, it helps remove Halide::Func members from layers as it was before.

@dkurt dkurt force-pushed the halide_support branch 3 times, most recently from 2c28d87 to aedce13 Compare June 14, 2017 14:00
@dkurt dkurt force-pushed the halide_support branch 8 times, most recently from fc18830 to 2e94270 Compare June 15, 2017 14:27
@vpisarev
Copy link
Contributor

ok, let's merge it in while it's fresh! 👍

@vpisarev vpisarev self-assigned this Jun 15, 2017
@opencv-pushbot opencv-pushbot merged commit dc93eed into opencv:master Jun 15, 2017
@alalek
Copy link
Member

alalek commented Jun 15, 2017

This PR is rely on opencv/opencv#8794 . Need to review/merge them too.

@dkurt dkurt deleted the halide_support branch June 21, 2017 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants