Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Strategy to reduce memory usage #29

Closed
luoyetx opened this issue Mar 29, 2017 · 7 comments
Closed

[Proposal] Strategy to reduce memory usage #29

luoyetx opened this issue Mar 29, 2017 · 7 comments

Comments

@luoyetx
Copy link
Owner

luoyetx commented Mar 29, 2017

Caffe currently consumes too much memory during the Forward phase. It's mainly because the internal temporary buffer held by the Layer. e.g Convolution Layer need to cache im2col result for gemm operation. These temporary buffer is not shared between layers which causing too much memory usage. Second, since we don't perform backward operation, network internal buffer can also be reused or freed as long as no other layer needs it.

Mini-Caffe should change this situation without breaking any high level API exposed in include (maybe add some APIs).

Some ideas.

  1. Layer who needs temporary memory should requests memory from a global memory source manager. The Manager itself hold the data and borrow the memory buffer to which requests. A memory pool can be implemented or just reuse the same memory and resize if request is too big. The network internal buffer can also be requested through the manager but need to track the dependency of this named blob, return the memory to manager when no other layer needs it. This strategy comes within every Forward phase.

  2. Since Caffe network graph is static, we can plan the memory before forwarding the graph. Some Layer API changes will be helpful. Layer itself should only gives network the memory size it needs and let network holds the memory and borrow it to the layer during Forward. This includes bottom, top and temporary memory. Change Reshape function of every layer. Counting the dependency of network internal blobs, plan the memory and reuse the internal blobs. This strategy comes before every Forward phase.

@luoyetx
Copy link
Owner Author

luoyetx commented Apr 5, 2017

#31 follows Strategy-1. I Implement a Memory Pool which is transparent to Blob and called by class SyncMem. Add an API for class Blob which can release the memory friendly and Reshape always will realloc SyncMem if size is not equal. Memory Pool hold all memory from CPU and GPU, and only borrow the memory if (real_size, request_size) satisfy some conditions. Since all network internal buffer has a name, track the dependency is pretty easy, Lift time of Blobs combined with Layers, release Blob if no later layer needs it.

Also, the Memory Pool is thread local, And can shared memory between networks in the same thread, Which may be helpful in some situation. Current implement can help example r-fcn which uses ResNet-50 model, memory usage reduces from 2G to 500M on CPU context. It's reasonable as conv3x3 pattern is repeated and all memory can be reused between layers.

@luoyetx
Copy link
Owner Author

luoyetx commented Apr 7, 2017

More memory analyse is needed to improve this pool implement. Currently I can see some fixed pattern in memory requests which can help to improve the pool memory management. We can use Least Frequently Used algorithm to reduce the memory. Some memory pattern can be view in #31

@liyancas
Copy link

The memory for model parameters are also managed by memory pool? If multiple images pass the same model, do you fix the memory for the model?

@luoyetx
Copy link
Owner Author

luoyetx commented May 22, 2017

Yes, parameters are also managed by the pool. However the parameters memory won't be released unless the Net object is destructed. Clearing the Memory Pool only release the memory which is not used. Temporary memory and network internal buffer will be marked as unused during the forward, but model parameters won't.

@liyancas
Copy link

Thanks. Another question: the buffers allocated for feature maps (in/out) aren't shared the same memory block during the forward ?

@luoyetx
Copy link
Owner Author

luoyetx commented May 22, 2017

The whole idea is pretty simple. Determine the life time of all Blobs in network layer by layer. As network definition itself is already in topo order. If a Blob has no later layer using it as input, we can mark this Blob reusable and share the memory. The input Blob can naturally be shared/reused as Forward goes on. However, I think it's more convenient to not reused the input Blob, at the next Forward, you don't need to call Reshape for the same input size. If reuse input Blob, a Reshape call is needed to get a memory reference. And one more important thing is that for most cases, input and output Blobs consume little memory compared to internal Blobs. Never mind if these memory are shared or not :)

@luoyetx
Copy link
Owner Author

luoyetx commented Apr 12, 2018

Static memory place is impl in #70

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants