Skip to content

Conversation

ekagra-ranjan
Copy link
Contributor

@ekagra-ranjan ekagra-ranjan commented Mar 12, 2019

In reference to #691, this PR provides the option for memory efficient implement of densenet models.

I tested the models (original implementation and the new implementation with efficient=False as well as with efficient=True) on hymenoptera dataset which gave the follow results

Benchmark results (Batch size = 8, image size = 224x224):


                                                 Time taken         GPU Memory Consumption
Original                                          2m 55s                    1668mb
New (`efficient=False`)                           2m 58s                    1667mb
New (`efficient=True`)                            4m 6s                     1115mb

There was no significant change in the accuracy of the trained models. The implementation does not change the performance in terms of accuracy.

The new implementation with efficient=True seems to comsume ~1.5 times lesser GPU memory at the cost of ~1.4 times increased compute time.

cc: @soumith

@codecov-io
Copy link

codecov-io commented Mar 12, 2019

Codecov Report

Merging #797 into master will decrease coverage by 8.15%.
The diff coverage is 92.3%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #797      +/-   ##
==========================================
- Coverage   60.03%   51.87%   -8.16%     
==========================================
  Files          64       34      -30     
  Lines        5054     3352    -1702     
  Branches      754      534     -220     
==========================================
- Hits         3034     1739    -1295     
+ Misses       1817     1484     -333     
+ Partials      203      129      -74
Impacted Files Coverage Δ
torchvision/models/densenet.py 67.4% <92.3%> (-17.82%) ⬇️
torchvision/models/vgg.py 65.65% <0%> (-23.9%) ⬇️
torchvision/datasets/utils.py 35.1% <0%> (-13.32%) ⬇️
torchvision/utils.py 51.92% <0%> (-9.62%) ⬇️
torchvision/datasets/folder.py 68.23% <0%> (-8.02%) ⬇️
torchvision/datasets/coco.py 22.41% <0%> (-6.86%) ⬇️
torchvision/datasets/svhn.py 30% <0%> (-4.62%) ⬇️
torchvision/datasets/semeion.py 29.82% <0%> (-3.51%) ⬇️
torchvision/datasets/cifar.py 33.66% <0%> (-3.3%) ⬇️
... and 53 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2b3a1b6...a8e7da1. Read the comment docs.

@ekagra-ranjan
Copy link
Contributor Author

@soumith Is the PR fine?

@ekagra-ranjan
Copy link
Contributor Author

@soumith Is there anything else that needs to be done?

Copy link
Member

@soumith soumith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

efficient as a flag is a bit misleading. Can you rename it to memory_efficient

self.add_module('denselayer%d' % (i + 1), layer)

def forward(self, init_features):
features = [init_features]
for name, layer in self.named_children():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the ordering of this correct across all python versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worked on python 3.6. Will check them on 2.7 as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumith Works on python 2.7.

@@ -16,8 +17,17 @@
}


def _bn_function_factory(norm, relu, conv):
def bn_function(*inputs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i forgot why this concatenation is needed for checkpointing, can you remind me?

Copy link
Contributor Author

@ekagra-ranjan ekagra-ranjan Mar 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a densenet block, the previous outputs are concatenated with current input before passing through a layer. So the checkpoints saves memory by not saving these activations in the computation graph for backward pass. Instead it recomputes these intermediate activations during backward pass which makes them slower.

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a quick look, it is non-trivial to me that both represent the same model.

Can you add a test that compares the output of the model using both memory_efficient=False and memory_efficient=True?

@ekagra-ranjan
Copy link
Contributor Author

Okay, I will do it.

@soumith
Copy link
Member

soumith commented May 5, 2019

I re-reviewed it today as well. After the test above ^^ which makes sure the same function is computed, this PR is good to go.

@ekagra-ranjan
Copy link
Contributor Author

ekagra-ranjan commented May 31, 2019

I have added the test but there are conflicts. @fmassa Can you please help me resolve it?

@fmassa
Copy link
Member

fmassa commented Jun 7, 2019

@ekagra-ranjan do you want me to resolve the conflicts?

@ekagra-ranjan
Copy link
Contributor Author

Yes @fmassa, that would be very helpful.

@fmassa fmassa mentioned this pull request Jun 7, 2019
@fmassa
Copy link
Member

fmassa commented Jun 7, 2019

I've sent a new PR in #1003

All the history of changes that you have made have been kept.
Thanks a lot for the awesome work @ekagra-ranjan !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants