Advanced memory management for DNN blobs #9389

dkurt · 2017-08-17T08:27:52Z

This pullrequest changes

There are two methods: optimal for host memory that will be processed on CPU and a less optimal for device memory (OpenCL). The last one follows the same reuse or create strategy as we use now, but it makes more dense packing. Achieved results:

Model	with PR, host memory	with PR, device memory	before PR, host / dev	No reusing
AlexNet	2.32MB	2.6MB	3.72MB	6.14MB
Inception-5h	5.2MB	8.65MB	15.62MB	33.13MB
GoogLeNet	5.21MB	8.76MB	15.74MB	33.43MB
ENet, 512x256	23.59MB	26.24MB	72.22MB	137.9MB
SqueezeNet v1.1	4.87MB	5.07MB	11.71MB	20.71MB
ResNet-50	9.63MB	10.43MB	31.51MB	66.64MB

Despite at the same time is actual only one memory block (host or device) and we can skip some allocations of host memory, we can't make a decision about it at the allocation stage. So, in case of OpenCL backend, we allocate both CPU and GPU memory (as it was before PR).
Outputs with no references (like indices of MaxPooling) are allocated too. Otherwise it will be more complicated for maintaining.
Required changes: Torch's Concat and ConcatTable doesn't use Split layer #9384.

dkurt · 2017-08-29T07:05:42Z

Maximum resident set size according to /usr/bin/time --verbose.
Test: load network and make single forward pass

Model	origin framework	DNN
AlexNet	974MB (Caffe)	744MB (x1.3)
Inception-5h	483MB (TensorFlow)	155MB (x3.11)
GoogLeNet	435MB (Caffe)	187MB (x2.32)
ENet, 512x256	456MB (Torch)	62.4MB (x7.3)
SqueezeNet v1.1	233MB (Caffe)	47.4MB (x4.9)
ResNet-50	373MB (Caffe)	238MB (x1.56)

TensorFlow script:

import numpy as np
import tensorflow as tf

with tf.gfile.FastGFile('opencv_extra/testdata/dnn/tensorflow_inception_graph.pb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())

with tf.Session() as sess:
    sess.graph.as_default()
    tf.import_graph_def(graph_def, name='')

    # Generate input
    np.random.seed(2701)
    inp = np.random.standard_normal([1, 224, 224, 3]).astype(np.float32)

    # Receive output
    outTensor = sess.graph.get_tensor_by_name('softmax2:0')
    out = sess.run(outTensor, feed_dict={'input:0': inp})

DNN:

int main(int argc, char** argv) {
  cv::dnn::Net net = cv::dnn::readNetFromTensorflow("opencv_extra/testdata/dnn/tensorflow_inception_graph.pb");
  cv::Mat input({1, 3, 224, 224}, CV_32FC1);
  cv::randu(input, 0.0f, 1.0f);
  net.setInput(input);
  cv::Mat output = net.forward();
  return 0;
}

Caffe:

int main(int argc, char** argv) {
  std::string proto = "opencv_extra/testdata/dnn/ResNet-50-deploy.prototxt";
  std::string weights = "opencv_extra/testdata/dnn/ResNet-50-model.caffemodel";

  caffe::Caffe::set_mode(caffe::Caffe::CPU);
  caffe::Net<float>* net = new caffe::Net<float>(proto, caffe::TEST);
  net->CopyTrainedLayersFrom(weights);
  net->Forward();
  return 0;
}

Torch (model is in CPU mode):

require 'nn'

torch.setdefaulttensortype('torch.FloatTensor')

net = torch.load('ENet-model.t7'):float()

input = torch.FloatTensor(torch.LongStorage({1, 3, 256, 512}))
output = net:forward(input)

dkurt · 2018-10-24T13:54:05Z

Some of the ideas from this PR have been merged in different PRs.
Perhaps we can reuse proposed memory scheduling approach later. In example, as a part of some students project.

dkurt force-pushed the advanced_memory_management branch from 8f5bc0c to cbc9378 Compare August 18, 2017 11:24

dkurt force-pushed the advanced_memory_management branch from cbc9378 to 078e6de Compare August 27, 2017 16:36

dkurt mentioned this pull request Sep 6, 2017

MobileNet-SSD and VGG-SSD topologies in Halide #9569

Merged

dkurt force-pushed the advanced_memory_management branch from 078e6de to 1634302 Compare September 19, 2017 14:32

Advanced memory manager for DNN

ce4eb05

dkurt force-pushed the advanced_memory_management branch from 1634302 to ce4eb05 Compare October 31, 2017 08:59

dkurt closed this Oct 24, 2018

dkurt self-assigned this Oct 24, 2018

dkurt mentioned this pull request Dec 11, 2018

Reduce top memory consumption of dnn module for FP16 precision #13413

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advanced memory management for DNN blobs #9389

Advanced memory management for DNN blobs #9389

dkurt commented Aug 17, 2017 •

edited

dkurt commented Aug 29, 2017 •

edited

dkurt commented Oct 24, 2018

Advanced memory management for DNN blobs #9389

Advanced memory management for DNN blobs #9389

Conversation

dkurt commented Aug 17, 2017 • edited

This pullrequest changes

dkurt commented Aug 29, 2017 • edited

dkurt commented Oct 24, 2018

dkurt commented Aug 17, 2017 •

edited

dkurt commented Aug 29, 2017 •

edited