[Proposal] Strategy to reduce memory usage #29

luoyetx · 2017-03-29T06:38:37Z

Caffe currently consumes too much memory during the Forward phase. It's mainly because the internal temporary buffer held by the Layer. e.g Convolution Layer need to cache im2col result for gemm operation. These temporary buffer is not shared between layers which causing too much memory usage. Second, since we don't perform backward operation, network internal buffer can also be reused or freed as long as no other layer needs it.

Mini-Caffe should change this situation without breaking any high level API exposed in include (maybe add some APIs).

Some ideas.

Layer who needs temporary memory should requests memory from a global memory source manager. The Manager itself hold the data and borrow the memory buffer to which requests. A memory pool can be implemented or just reuse the same memory and resize if request is too big. The network internal buffer can also be requested through the manager but need to track the dependency of this named blob, return the memory to manager when no other layer needs it. This strategy comes within every Forward phase.
Since Caffe network graph is static, we can plan the memory before forwarding the graph. Some Layer API changes will be helpful. Layer itself should only gives network the memory size it needs and let network holds the memory and borrow it to the layer during Forward. This includes bottom, top and temporary memory. Change Reshape function of every layer. Counting the dependency of network internal blobs, plan the memory and reuse the internal blobs. This strategy comes before every Forward phase.

The text was updated successfully, but these errors were encountered:

luoyetx · 2017-04-05T03:08:55Z

#31 follows Strategy-1. I Implement a Memory Pool which is transparent to Blob and called by class SyncMem. Add an API for class Blob which can release the memory friendly and Reshape always will realloc SyncMem if size is not equal. Memory Pool hold all memory from CPU and GPU, and only borrow the memory if (real_size, request_size) satisfy some conditions. Since all network internal buffer has a name, track the dependency is pretty easy, Lift time of Blobs combined with Layers, release Blob if no later layer needs it.

Also, the Memory Pool is thread local, And can shared memory between networks in the same thread, Which may be helpful in some situation. Current implement can help example r-fcn which uses ResNet-50 model, memory usage reduces from 2G to 500M on CPU context. It's reasonable as conv3x3 pattern is repeated and all memory can be reused between layers.

luoyetx · 2017-04-07T12:30:54Z

More memory analyse is needed to improve this pool implement. Currently I can see some fixed pattern in memory requests which can help to improve the pool memory management. We can use Least Frequently Used algorithm to reduce the memory. Some memory pattern can be view in #31

liyancas · 2017-05-22T12:58:32Z

The memory for model parameters are also managed by memory pool? If multiple images pass the same model, do you fix the memory for the model?

luoyetx · 2017-05-22T13:07:21Z

Yes, parameters are also managed by the pool. However the parameters memory won't be released unless the Net object is destructed. Clearing the Memory Pool only release the memory which is not used. Temporary memory and network internal buffer will be marked as unused during the forward, but model parameters won't.

liyancas · 2017-05-22T15:22:45Z

Thanks. Another question: the buffers allocated for feature maps (in/out) aren't shared the same memory block during the forward ?

luoyetx · 2017-05-22T15:45:34Z

The whole idea is pretty simple. Determine the life time of all Blobs in network layer by layer. As network definition itself is already in topo order. If a Blob has no later layer using it as input, we can mark this Blob reusable and share the memory. The input Blob can naturally be shared/reused as Forward goes on. However, I think it's more convenient to not reused the input Blob, at the next Forward, you don't need to call Reshape for the same input size. If reuse input Blob, a Reshape call is needed to get a memory reference. And one more important thing is that for most cases, input and output Blobs consume little memory compared to internal Blobs. Never mind if these memory are shared or not :)

luoyetx · 2018-04-12T08:07:29Z

Static memory place is impl in #70

luoyetx added the enhancement label Mar 31, 2017

luoyetx mentioned this issue Apr 5, 2017

Memory Pool #31

Merged

luoyetx mentioned this issue Apr 12, 2018

Static memory place #70

Merged

luoyetx closed this as completed Apr 12, 2018

fanxingzju mentioned this issue Dec 12, 2018

SSD MobileNet内存占用过大 #71

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Strategy to reduce memory usage #29

[Proposal] Strategy to reduce memory usage #29

luoyetx commented Mar 29, 2017

luoyetx commented Apr 5, 2017

luoyetx commented Apr 7, 2017

liyancas commented May 22, 2017

luoyetx commented May 22, 2017

liyancas commented May 22, 2017

luoyetx commented May 22, 2017

luoyetx commented Apr 12, 2018

[Proposal] Strategy to reduce memory usage #29

[Proposal] Strategy to reduce memory usage #29

Comments

luoyetx commented Mar 29, 2017

luoyetx commented Apr 5, 2017

luoyetx commented Apr 7, 2017

liyancas commented May 22, 2017

luoyetx commented May 22, 2017

liyancas commented May 22, 2017

luoyetx commented May 22, 2017

luoyetx commented Apr 12, 2018