Fix rpn memory leak and dataType errors. #1657

tengerye · 2019-12-11T08:53:56Z

Current version causes memory leak, which can be easily repeated using the repository here. The reason is that rpn._cache keeps the pair with the model and the number of tensors it holds expands as proposal grows;
During the process fixing the first problem, I found some data type errors in the development version and fix it.

codecov-io · 2019-12-11T09:16:54Z

Codecov Report

Merging #1657 into master will increase coverage by 0.01%.
The diff coverage is 20%.

@@            Coverage Diff             @@
##           master    #1657      +/-   ##
==========================================
+ Coverage      66%   66.02%   +0.01%     
==========================================
  Files          92       92              
  Lines        7330     7331       +1     
  Branches     1107     1107              
==========================================
+ Hits         4838     4840       +2     
+ Misses       2176     2175       -1     
  Partials      316      316

Impacted Files	Coverage Δ
torchvision/models/detection/_utils.py	`46.25% <0%> (ø)`	⬆️
torchvision/models/detection/rpn.py	`81.57% <33.33%> (+0.52%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 333af7a...12d79ac. Read the comment docs.

fmassa

Thanks for the PR and the fixes!

I have some comments regarding completely dropping the cache, but we can move on with this for now and optimize it again later.

fmassa · 2019-12-11T11:53:11Z

torchvision/models/detection/rpn.py

@@ -163,6 +163,8 @@ def forward(self, image_list, feature_maps):
                anchors_in_image.append(anchors_per_feature_map)
            anchors.append(anchors_in_image)
        anchors = [torch.cat(anchors_per_image) for anchors_per_image in anchors]
+        # Clear the cache in case that memory leaks.
+        self._cache.clear()


I think this is a bit too drastic: the size of the cache is in general fairly small (less than 1MB per key), and we can get some speedup if we cache the most used sizes. This is effectively removing the caching altogether.

Maybe for a follow-up PR: What about using some variant of functools.lru_cache instead of manually keeping a dictionary for the cache?

This way, we could potentially keep save the 32 last sizes.

Fix rpn memory leak and dataType errors.

12d79ac

fmassa approved these changes Dec 11, 2019

View reviewed changes

fmassa merged commit 5c03d59 into pytorch:master Dec 11, 2019

fmassa mentioned this pull request Dec 19, 2019

Memory leak in forward pass on FasterRCNN with varying aspect ratio #1689

Closed

fmassa pushed a commit to fmassa/vision-1 that referenced this pull request Jun 9, 2020

Fix rpn memory leak and dataType errors. (pytorch#1657)

8b0582e

datumbox mentioned this pull request Apr 14, 2021

Caching disabled on AnchorGenerator #3667

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix rpn memory leak and dataType errors. #1657

Fix rpn memory leak and dataType errors. #1657

tengerye commented Dec 11, 2019

codecov-io commented Dec 11, 2019 •

edited

fmassa left a comment

fmassa Dec 11, 2019

Fix rpn memory leak and dataType errors. #1657

Fix rpn memory leak and dataType errors. #1657

Conversation

tengerye commented Dec 11, 2019

codecov-io commented Dec 11, 2019 • edited

Codecov Report

fmassa left a comment

Choose a reason for hiding this comment

fmassa Dec 11, 2019

Choose a reason for hiding this comment

codecov-io commented Dec 11, 2019 •

edited