New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/unfreeze layers fpn backbone #2160
Feat/unfreeze layers fpn backbone #2160
Conversation
If pretrained backbone is not used and one intends to train the entire network from scratch, no layers should be frozen.
Depending on the size of dataset one might want to control the number of tunable parameters in the backbone, and this parameter in hyper parameter optimization for the dataset. It would be nice to have this function support this.
Handle backbone freezing in fasterrcnn_resnet50_fpn function rather than the resnet_fpn_backbone function that it uses to get the backbone.
layer freezing code has been moved to fasterrcnn_resnet50_fpn function that consumes resnet_fpn_backbone function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Thanks for the PR!
In it's current state, it changes the behavior for Mask R-CNN and Keypoint R-CNN, so it can't be merged as is.
I propose that we keep the freezing logic inside resnet_fpn_backbone
, but we expose another argument to allow users to change it.
Let me know what you think
Moved layer freezing logic to resnet_fpn_backbone with an additional parameter.
Layer freezing logic has been moved to resnet_fpn_backbone. This function only ensures that the all layers are made trainable if pretrained models are not used.
@fmassa oh, understood. I have made the requested changes and moved freezing logic back into the resnet_fpn_backbone function. The current behavior is:
Please let me know if you require any other changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvements and the documentation!
Can you add a test for resnet_fpn_backbone
in test/test_detection_utils.py
checking that the parameters are correctly set to require / not require gradient?
I think there could be an issue with the conv1
handling, although it might just work as is (although it looks a bit fragile to me).
""" | ||
# select layers that wont be frozen | ||
assert trainable_layers<=5 and trainable_layers >=0 | ||
layers_to_train = ['layer4', 'layer3', 'layer2', 'layer1', 'conv1'][:trainable_layers] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conv1
is part of every Bottleneck
and BasicBlock
, see
vision/torchvision/models/resnet.py
Line 91 in bd27e94
self.conv1 = conv1x1(inplanes, width) |
so I'm not sure that the
conv1
part in here has the expected behavior (it might have, but can be by accident).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will add test cases and update the PR.
Regarding the conv1 part, I understand that it looks hacky and is not intuitive. But it works it this case because of the way freezing is done.
Because of the order of strings in layers_to_train 'conv1' is only used when trainable_layers is set to 5 and the entire network should be unfrozen. So even if 'conv1' finds matches in other blocks, its okay.
The fully qualified layer names that the resnet_fpn_backbone observes for res50 are:
conv1.weight
layer1.0.conv1.weight
layer1.0.conv2.weight
layer1.0.conv3.weight
layer1.0.downsample.0.weight
layer1.1.conv1.weight
layer1.1.conv2.weight
layer1.1.conv3.weight
layer1.2.conv1.weight
layer1.2.conv2.weight
layer1.2.conv3.weight
layer2.0.conv1.weight
layer2.0.conv2.weight
layer2.0.conv3.weight
layer2.0.downsample.0.weight
layer2.1.conv1.weight
layer2.1.conv2.weight
layer2.1.conv3.weight
layer2.2.conv1.weight
layer2.2.conv2.weight
layer2.2.conv3.weight
layer2.3.conv1.weight
layer2.3.conv2.weight
layer2.3.conv3.weight
layer3.0.conv1.weight
layer3.0.conv2.weight
layer3.0.conv3.weight
layer3.0.downsample.0.weight
layer3.1.conv1.weight
layer3.1.conv2.weight
layer3.1.conv3.weight
layer3.2.conv1.weight
layer3.2.conv2.weight
layer3.2.conv3.weight
layer3.3.conv1.weight
layer3.3.conv2.weight
layer3.3.conv3.weight
layer3.4.conv1.weight
layer3.4.conv2.weight
layer3.4.conv3.weight
layer3.5.conv1.weight
layer3.5.conv2.weight
layer3.5.conv3.weight
layer4.0.conv1.weight
layer4.0.conv2.weight
layer4.0.conv3.weight
layer4.0.downsample.0.weight
layer4.1.conv1.weight
layer4.1.conv2.weight
layer4.1.conv3.weight
layer4.2.conv1.weight
layer4.2.conv2.weight
layer4.2.conv3.weight
fc.weight
fc.bias
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I can do to make this more intuitive is to replace :
if all([layer not in name for layer in layers_to_train]):
with
if all([not name.startswith(layer) for layer in layers_to_train]):
this will make sure that only clean matches are found. What are your thoughts on this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a notebook which tests the desired functionality.
I will add the test cases based on it into test_models_detection_utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I can do to make this more intuitive is to replace :
if all([layer not in name for layer in layers_to_train]):with
if all([not name.startswith(layer) for layer in layers_to_train]):this will make sure that only clean matches are found. What are your thoughts on this
sure, looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test case has been added.
This PR adds functionality to specify the number of trainable layers while initializing the faster rcnn using fasterrcnn_resnet50_fpn function. This commits adds a test case to test this functionality.
More information in PR
Test cases have been added. Let me know if any other change is required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
For a follow-up PR, can you add the same flags to Mask R-CNN and Keypoint R-CNN? |
@fmassa absolutely. Will do that in a couple of days. |
Currently, fasterrcnn_resnet50_fpn function is used to create a faster rcnn with resnet50 backbone and fpn. resnet_fpn_backbone function is backbone utils is used by this function. This function freezes the backbone layers in resnet apart form layer2, layer3 and layer4. This freezing is hard coded to reflect the faster rcnn paper which frooze the initial layers of pretrained backbone.
If pretrained backbone is not used and one intends to train the entire network from scratch, no layers should be frozen. Otherwise initial layers will always have randomly initialized weights. I think this can be considered a bug, because layer freezing is not even mentioned in the function docs so user are not aware of it.
This resulted in poor AP when we were training faster rcnn with resnet backbone from scratch on Detrac dataset. And it took a while a figure out.
Furthermore, the number of resnet backbone layers that should be frozen is an important hyper parameter in my experience, and it needs to be tuned for each dataset. So adding an argument to control this enabled me to integrate it in hyper-parameter tuning.
I have created this pull request so that others don't run into the same problem when conducting experiments similar to mine.
I have moved the layer freezing logic to fasterrcnn_resnet50_fpn so that layers can be frozen if either a pretrained backbone or pretrained faster rcnn are used, and are not frozen otherwise.