Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/unfreeze layers fpn backbone #2160

Merged
merged 17 commits into from May 15, 2020
Merged

Feat/unfreeze layers fpn backbone #2160

merged 17 commits into from May 15, 2020

Conversation

muaz-urwa
Copy link
Contributor

@muaz-urwa muaz-urwa commented Apr 29, 2020

Currently, fasterrcnn_resnet50_fpn function is used to create a faster rcnn with resnet50 backbone and fpn. resnet_fpn_backbone function is backbone utils is used by this function. This function freezes the backbone layers in resnet apart form layer2, layer3 and layer4. This freezing is hard coded to reflect the faster rcnn paper which frooze the initial layers of pretrained backbone.

If pretrained backbone is not used and one intends to train the entire network from scratch, no layers should be frozen. Otherwise initial layers will always have randomly initialized weights. I think this can be considered a bug, because layer freezing is not even mentioned in the function docs so user are not aware of it.

This resulted in poor AP when we were training faster rcnn with resnet backbone from scratch on Detrac dataset. And it took a while a figure out.

Furthermore, the number of resnet backbone layers that should be frozen is an important hyper parameter in my experience, and it needs to be tuned for each dataset. So adding an argument to control this enabled me to integrate it in hyper-parameter tuning.

I have created this pull request so that others don't run into the same problem when conducting experiments similar to mine.

I have moved the layer freezing logic to fasterrcnn_resnet50_fpn so that layers can be frozen if either a pretrained backbone or pretrained faster rcnn are used, and are not frozen otherwise.

If pretrained backbone is not used and one intends to train the entire network from scratch, no layers should be frozen.
Depending on the size of dataset one might want to control the number of tunable parameters in the backbone, and this parameter in hyper parameter optimization for the dataset. It would be nice to have this function support this.
Handle backbone freezing in fasterrcnn_resnet50_fpn function rather than the resnet_fpn_backbone function that it uses to get the backbone.
layer freezing code has been moved to fasterrcnn_resnet50_fpn function that consumes resnet_fpn_backbone function.
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

Thanks for the PR!

In it's current state, it changes the behavior for Mask R-CNN and Keypoint R-CNN, so it can't be merged as is.

I propose that we keep the freezing logic inside resnet_fpn_backbone, but we expose another argument to allow users to change it.

Let me know what you think

Moved layer freezing logic to resnet_fpn_backbone with an additional parameter.
Layer freezing logic has been moved to resnet_fpn_backbone. This function only ensures that the all layers are made trainable if pretrained models are not used.
@muaz-urwa muaz-urwa requested a review from fmassa April 30, 2020 21:02
@muaz-urwa
Copy link
Contributor Author

@fmassa oh, understood.

I have made the requested changes and moved freezing logic back into the resnet_fpn_backbone function.

The current behavior is:

  • Both resnet_fpn_backbone and fasterrcnn_resnet50_fpn functions have an additional tunable layers parameter.
  • Default values ensure that existing functionality is not broken if this argument is not set.
  • If any pretrained model or backbone are not used, no layer is frozen.

Please let me know if you require any other changes.

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvements and the documentation!

Can you add a test for resnet_fpn_backbone in test/test_detection_utils.py checking that the parameters are correctly set to require / not require gradient?
I think there could be an issue with the conv1 handling, although it might just work as is (although it looks a bit fragile to me).

"""
# select layers that wont be frozen
assert trainable_layers<=5 and trainable_layers >=0
layers_to_train = ['layer4', 'layer3', 'layer2', 'layer1', 'conv1'][:trainable_layers]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conv1 is part of every Bottleneck and BasicBlock, see

self.conv1 = conv1x1(inplanes, width)

so I'm not sure that the conv1 part in here has the expected behavior (it might have, but can be by accident).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will add test cases and update the PR.

Regarding the conv1 part, I understand that it looks hacky and is not intuitive. But it works it this case because of the way freezing is done.

Because of the order of strings in layers_to_train 'conv1' is only used when trainable_layers is set to 5 and the entire network should be unfrozen. So even if 'conv1' finds matches in other blocks, its okay.

The fully qualified layer names that the resnet_fpn_backbone observes for res50 are:

conv1.weight
layer1.0.conv1.weight
layer1.0.conv2.weight
layer1.0.conv3.weight
layer1.0.downsample.0.weight
layer1.1.conv1.weight
layer1.1.conv2.weight
layer1.1.conv3.weight
layer1.2.conv1.weight
layer1.2.conv2.weight
layer1.2.conv3.weight
layer2.0.conv1.weight
layer2.0.conv2.weight
layer2.0.conv3.weight
layer2.0.downsample.0.weight
layer2.1.conv1.weight
layer2.1.conv2.weight
layer2.1.conv3.weight
layer2.2.conv1.weight
layer2.2.conv2.weight
layer2.2.conv3.weight
layer2.3.conv1.weight
layer2.3.conv2.weight
layer2.3.conv3.weight
layer3.0.conv1.weight
layer3.0.conv2.weight
layer3.0.conv3.weight
layer3.0.downsample.0.weight
layer3.1.conv1.weight
layer3.1.conv2.weight
layer3.1.conv3.weight
layer3.2.conv1.weight
layer3.2.conv2.weight
layer3.2.conv3.weight
layer3.3.conv1.weight
layer3.3.conv2.weight
layer3.3.conv3.weight
layer3.4.conv1.weight
layer3.4.conv2.weight
layer3.4.conv3.weight
layer3.5.conv1.weight
layer3.5.conv2.weight
layer3.5.conv3.weight
layer4.0.conv1.weight
layer4.0.conv2.weight
layer4.0.conv3.weight
layer4.0.downsample.0.weight
layer4.1.conv1.weight
layer4.1.conv2.weight
layer4.1.conv3.weight
layer4.2.conv1.weight
layer4.2.conv2.weight
layer4.2.conv3.weight
fc.weight
fc.bias

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I can do to make this more intuitive is to replace :

if all([layer not in name for layer in layers_to_train]):

with

if all([not name.startswith(layer) for layer in layers_to_train]):

this will make sure that only clean matches are found. What are your thoughts on this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a notebook which tests the desired functionality.

I will add the test cases based on it into test_models_detection_utils.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I can do to make this more intuitive is to replace :

if all([layer not in name for layer in layers_to_train]):

with

if all([not name.startswith(layer) for layer in layers_to_train]):

this will make sure that only clean matches are found. What are your thoughts on this

sure, looks good to me!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test case has been added.

This PR adds functionality to specify the number of trainable layers while initializing the faster rcnn using fasterrcnn_resnet50_fpn function. This commits adds a test case to test this functionality.
@muaz-urwa muaz-urwa requested a review from fmassa May 6, 2020 12:56
@muaz-urwa
Copy link
Contributor Author

Test cases have been added.

Let me know if any other change is required.

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@fmassa fmassa merged commit 348dd5a into pytorch:master May 15, 2020
@fmassa
Copy link
Member

fmassa commented May 15, 2020

For a follow-up PR, can you add the same flags to Mask R-CNN and Keypoint R-CNN?

@muaz-urwa muaz-urwa deleted the feat/unfreeze_layers_fpn_backbone branch May 15, 2020 13:07
@muaz-urwa
Copy link
Contributor Author

@fmassa absolutely.

Will do that in a couple of days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants