-
Notifications
You must be signed in to change notification settings - Fork 7.2k
bugfix aspect ratio sampling in transforms.RandomErasing #3344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @wbaek! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks! |
@wbaek Thanks for the PR and welcome to our community. I am not sure that working on log scale has any benefit. Also note that changing this would be a breaking change because sampling from a log scale uniformly is not the same as sampling uniformly on the original scale. Finally please note that our implementation is aligned with the official implementation from the paper. Based on the above, we are not going to be able to merge this PR. I will close if to avoid accidental merges, but if you have any concerns please let me know. |
I know the aspect ratio is height / width. I already checked official implementation. but it has problem too. |
@wbaek I understand that your intention is to sample both smaller and larger ratios "equally". In effect you would like to sample from a distribution that has its middle point around 1 and covers equal areas on its left and on its right. Thus applying the logarithm transform before sampling uniformly gives you that effect. The challenge here is that the original paper does not try to address this and its original implementation uses simply uniforms. This why I don't necessarily agree that it's a bug but a property of the proposed technique. Changing the implementation would mean introducing a breaking change to the library. @fmassa Any thoughts on this? Perhaps we could start by showing that this bias has a detrimental effect on accuracy. If that's the case it might be worth fixing? |
I think this bias is not desired, and we had a similar problem in the past that we fixed, see #618 (comment) and #799 for context. So from this perspective I would be inclined to fix this in torchvision (but I agree having more data would be nice) For reference, Maybe @zhunzhong07 (one of the RandomErasing authors and who sent the original PR to torchvision adding random erasing in #909) or @rwightman have some experiments with both versions and could shed some light here? |
@fmassa @datumbox I fixed mine when I noticed the improvements to torchvision's random resize. I did not do extensive ablations but I have produced a lot of unmatched model weights with it in the time since. The 'paper didn't do it' / will break compat line of reasoning shouldn't be a default fallback without further consideration. It will hurt the progress of torch/torchvision in the long run and I see it more and more. The addition of guassian noise erasing in Zhun's original PR for torchvision wasn't in the original paper / impl but due to an enhancment I proposed and ablation study I did. |
@rwightman Thanks for the input. I understand that this was adopted on your repo without extensive comparisons but since then you did not find any indication that it hurt model performance. As for "falling back without consideration", I hope this discussion proves this is not the case. We do need to be careful and weight things when breaking changes are introduced but if there is enough indication that a change is beneficial we are flexible. I'm currently in the process of training a couple of additional models on our side. I'll take this patch for a test and get back to you early next week. |
@wbaek Thanks for pointing this bias. Indeed, I set the aspect ratio with the original function of random resize. As pointed by @fmassa and @rwightman, this issue is addressed in the new version of random resize (#799) and is fixed by @rwightman in his new version of pytroch-imagenet. Therefore, I think this issue should be considered in random erasing. I will try to implement the results on CIFAR-10 and CIFAR-100, and to see if this change will hurt the model performance. @datumbox Thanks for your attempts and I would like to hear your feedback on your own experiments. |
I have run experiments on CIFAR-10, CIFAR-100, and Fashion-MNIST with ResNet-20. For each dataset, I run 5 times with different seeds (0-4). New and original implementations use the same seeds. CIFAR-10: Original 93.41±0.11%, New 93.24±0.10% We can find that 1) implementations of original and new achieve almost the same results on CIFAR-100 and Fashion-MNIST, 2) However, new implementation achieves slightly lower results on CIFAR-10. I am not sure if this modification will hurt the performance of other datasets, especially ImageNet. Looking forward to your experiments @datumbox @rwightman @wbaek. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhunzhong07 Thanks a lot for taking the time to look at this and for proving the statistics of your runs. Based on your results, this looks like it's going to be a "no-harm" change than one that improves the accuracy. I did a brief check on my side and unfortunately the variation of accuracy caused by random initialization of the weights using different seeds is quite large and thus we would need a lot more than 5 runs on ImageNet to measure the difference. This can be quite expensive and probably not worth it in this case...
@wbaek My intention due to the above is to complete a few basic quick runs on my side and merge if no issues are spotted. I left one comment on your implementation below. It's non-blocking but it would be good to fix before merging. Let me know what you think.
Summary: * aspect ratio must be a sampling from log scale. reference from: https://github.com/pytorch/vision/blob/8317295c1d272e0ba7b2ce31e3fd2c048235fc73/torchvision/transforms/transforms.py#L833-L836 * add random erasing unittest code * Increased threshold for difference in sampling rate * move outside of the loop * log_ratio move outside of the loop. RandomResizedCrop also Reviewed By: fmassa Differential Revision: D26341420 fbshipit-source-id: a9e3ad83aa64918bd2c1bff1e4b64a00de615e48 Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com>
aspect ratio must be a sampling from log scale.
reference from:
vision/torchvision/transforms/transforms.py
Lines 833 to 836 in 8317295