Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Feature: add DropBlock layer #5416

Merged
merged 45 commits into from
Feb 27, 2022
Merged

New Feature: add DropBlock layer #5416

merged 45 commits into from
Feb 27, 2022

Conversation

xiaohu2015
Copy link
Contributor

@xiaohu2015 xiaohu2015 commented Feb 13, 2022

@facebook-github-bot
Copy link

facebook-github-bot commented Feb 13, 2022

💊 CI failures summary and remediations

As of commit 2af4162 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@xiaohu2015 xiaohu2015 marked this pull request as draft February 13, 2022 13:08
@datumbox datumbox added this to In progress in Batteries Included - Phase 2 via automation Feb 13, 2022
@xiaohu2015 xiaohu2015 marked this pull request as ready for review February 19, 2022 15:43
Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiaohu2015 Awesome work as always. Thanks a lot!

The implementation looks very nice overall. I've added a couple of comments concerning nits and boilerplate code needed for ops in TorchVision. My only real question is on the way you do the sampling on bernoulli, see below.

Finally, would you be interested in implementing the 3d version on the same PR? We want to start offering better video support and this contribution will help us offer better models. If you prefer, we can also do it on a follow up.

torchvision/ops/dropblock.py Outdated Show resolved Hide resolved
torchvision/ops/dropblock.py Outdated Show resolved Hide resolved
torchvision/ops/dropblock.py Outdated Show resolved Hide resolved
torchvision/ops/dropblock.py Outdated Show resolved Hide resolved
torchvision/ops/dropblock.py Outdated Show resolved Hide resolved
torchvision/ops/dropblock.py Outdated Show resolved Hide resolved
torchvision/ops/dropblock.py Outdated Show resolved Hide resolved
torchvision/ops/dropblock.py Outdated Show resolved Hide resolved
torchvision/ops/dropblock.py Outdated Show resolved Hide resolved
torchvision/ops/drop_block.py Outdated Show resolved Hide resolved
torchvision/ops/drop_block.py Outdated Show resolved Hide resolved
torchvision/ops/drop_block.py Show resolved Hide resolved
torchvision/ops/drop_block.py Outdated Show resolved Hide resolved
torchvision/ops/dropblock.py Outdated Show resolved Hide resolved
xiaohu2015 and others added 2 commits February 21, 2022 18:56
Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com>
Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com>
torchvision/ops/drop_block.py Outdated Show resolved Hide resolved
torchvision/ops/drop_block.py Outdated Show resolved Hide resolved
torchvision/ops/drop_block.py Outdated Show resolved Hide resolved
torchvision/ops/drop_block.py Outdated Show resolved Hide resolved
Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiaohu2015 The changes look good to me. I've pushed to your branch to fix one remaining linter issue. Let's see if the CI passes now.

test/test_ops.py Show resolved Hide resolved
test/test_ops.py Show resolved Hide resolved
@datumbox
Copy link
Contributor

@xiaohu2015 All tests pass. 🚀 🚀

On the positive side, your implementation seems correct to me, fully aligns with the paper. Though some of the implementation details differ from the references you provided, after doing a deep dive I believe that your approach is equivalent. Some limited testing also shows that the implementation works as intended.

On the negative side, the behaviour of the method is not fully tested (see points at #5416 (comment)). Moreover, we haven't trained a model to show-case that using the layer leads to the expected increase.

What is your bandwidth at this point? Do you have the capacity to do the above 2 checks prior merging? Do you think it's something you would do on a follow up PR after merge? It's a bit hard to provide support on this one due to limited bandwidth on our side because of the upcoming release and other work on the pipeline. Another approach could be to ping a couple of other regular contributors to see if they can help us test it. Let me know your preference/thoughts. Thanks!

@xiaohu2015
Copy link
Contributor Author

@datumbox OK. I will do the two checks before merging. I plan to use resnet50 to test.

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @xiaohu2015.

I've added one last optional test improvement for your consideration below. The testing strategy looks good to me. Let me know when the training checks is completed to merge.

Thanks a lot as always for the awesome contribution!

test/test_ops.py Outdated Show resolved Hide resolved
@xiaohu2015
Copy link
Contributor Author

LGTM, @xiaohu2015.

I've added one last optional test improvement for your consideration below. The testing strategy looks good to me. Let me know when the training checks is completed to merge.

Thanks a lot as always for the awesome contribution!

I just run 90-epoch Resnet50, maybe I can get result next day.

@datumbox
Copy link
Contributor

datumbox commented Feb 23, 2022

@xiaohu2015 Awesome. The paper uses 90 epochs for the baseline and 270 epochs for those with Dropblock (see page 4, section 4.1 on paper). If you don't have enough compute to run this, ping me on slack and we can work something out.

FYI the last update broke the linter.

@datumbox
Copy link
Contributor

datumbox commented Feb 25, 2022

Here is my branch for testing the ResNet50 with DropBlock: #5483

I've hardcoded changes on the LR scheduler and the ResNet. We are not going to use scheduled DropBlock:

$ python -u run_with_submitit.py --ngpus 8 --nodes 1 --model resnet50 -b 128 --lr 0.4 --epochs 270

Submitted job_id: 25962

I'll let you know about the results once I got them.

@datumbox
Copy link
Contributor

@xiaohu2015 I got good news. The addition of your DropBlock seems to improve the accuracy.

Here is the baseline ResNet50 accuracy for a 90-epoch training without DropBlock:
Test: Acc@1 76.130 Acc@5 92.862

Here is the one if we add DropBlock on Groups 3&4 wo/ scheduling (implementation details here):
Test: Acc@1 77.064 Acc@5 93.436

Both the baseline and DropBlock values are different from the ones reported on the paper (baseline 76.51 and DropBlock ~77.5), but I think that's the result of other minor differences that we didn't replicate (hyper-params, small architectural improvements etc). I think we have sufficient evidence that your implementation work as expected and we can merge the PR. Thanks so much for the awesome contribution and work!

@datumbox datumbox merged commit 5568744 into pytorch:main Feb 27, 2022
Batteries Included - Phase 2 automation moved this from In progress to Done Feb 27, 2022
facebook-github-bot pushed a commit that referenced this pull request Mar 3, 2022
Summary:
* Create dropblock.py

* add dropblock2d

* fix pylint

* refactor dropblock

* add dropblock

* Rename dropblock.py to drop_block.py

* fix pylint

* add dropblock

* add dropblock3d

* add drop_block3d

* add dropblock

* Update drop_block.py

* Update torchvision/ops/drop_block.py

* Update torchvision/ops/drop_block.py

* Update torchvision/ops/drop_block.py

* Update torchvision/ops/drop_block.py

* Update drop_block.py

* Update drop_block.py

* import torch.fx

* fix lint

* fix lint

* Update drop_block.py

* improve dropblock

* add dropblock

* refactor dropblock

* fix doc

* remove the limitation of block_size

* Update torchvision/ops/drop_block.py

* fix lint

* fix lint

* add dropblock

* Fix linter

* add dropblock random check

* reduce test time

* Update test_ops.py

* speed the dropblock test

* fix lint

Reviewed By: datumbox

Differential Revision: D34579502

fbshipit-source-id: 635df25cfc1dfc4433f38e673e2ee8fd2af96409

Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com>
Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com>
Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com>
Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com>
Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com>
Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants