-
Notifications
You must be signed in to change notification settings - Fork 7.1k
GPU efficient Densenets #797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
update before transform
update 9/03/19
11/03/19
12/03/10 10:34pm
Codecov Report
@@ Coverage Diff @@
## master #797 +/- ##
==========================================
- Coverage 60.03% 51.87% -8.16%
==========================================
Files 64 34 -30
Lines 5054 3352 -1702
Branches 754 534 -220
==========================================
- Hits 3034 1739 -1295
+ Misses 1817 1484 -333
+ Partials 203 129 -74
Continue to review full report at Codecov.
|
@soumith Is the PR fine? |
@soumith Is there anything else that needs to be done? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
efficient
as a flag is a bit misleading. Can you rename it to memory_efficient
self.add_module('denselayer%d' % (i + 1), layer) | ||
|
||
def forward(self, init_features): | ||
features = [init_features] | ||
for name, layer in self.named_children(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the ordering of this correct across all python versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worked on python 3.6. Will check them on 2.7 as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soumith Works on python 2.7.
@@ -16,8 +17,17 @@ | |||
} | |||
|
|||
|
|||
def _bn_function_factory(norm, relu, conv): | |||
def bn_function(*inputs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i forgot why this concatenation is needed for checkpointing, can you remind me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a densenet block, the previous outputs are concatenated with current input before passing through a layer. So the checkpoints saves memory by not saving these activations in the computation graph for backward pass. Instead it recomputes these intermediate activations during backward pass which makes them slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a quick look, it is non-trivial to me that both represent the same model.
Can you add a test that compares the output of the model using both memory_efficient=False
and memory_efficient=True
?
Okay, I will do it. |
I re-reviewed it today as well. After the test above ^^ which makes sure the same function is computed, this PR is good to go. |
I have added the test but there are conflicts. @fmassa Can you please help me resolve it? |
@ekagra-ranjan do you want me to resolve the conflicts? |
Yes @fmassa, that would be very helpful. |
I've sent a new PR in #1003 All the history of changes that you have made have been kept. |
In reference to #691, this PR provides the option for memory efficient implement of densenet models.
I tested the models (original implementation and the new implementation with
efficient=False
as well as withefficient=True
) on hymenoptera dataset which gave the follow resultsBenchmark results (Batch size = 8, image size = 224x224):
There was no significant change in the accuracy of the trained models. The implementation does not change the performance in terms of accuracy.
The new implementation with
efficient=True
seems to comsume ~1.5 times lesser GPU memory at the cost of ~1.4 times increased compute time.cc: @soumith