Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding RandAugment implementation #4348

Merged
merged 6 commits into from
Sep 2, 2021

Conversation

datumbox
Copy link
Contributor

@datumbox datumbox commented Sep 1, 2021

Partially implements #3817

Inspired from work started by @SamuelGabriel at #4221

cc @vfdev-5 @datumbox

Copy link
Contributor Author

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below I highlight a few bits of the implementation that I think is noteworthy.

@SamuelGabriel Given your work on the area I would love it if you can provide your input once you are back.

# op_name: (magnitudes, signed)
"ShearX": (torch.linspace(0.0, 0.3, num_bins), True),
"ShearY": (torch.linspace(0.0, 0.3, num_bins), True),
"TranslateX": (torch.linspace(0.0, 150.0 / 331.0 * image_size[0], num_bins), True),
Copy link
Contributor Author

@datumbox datumbox Sep 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SamuelGabriel I noticed in your implementation you restrict the maximum value of Translate to 14.4. I couldn't find a reference of that on the RandAugment paper. I was hoping if you could provide some background info?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not use this setting for my RA experiments actually. See for example here: https://github.com/automl/trivialaugment/blob/master/confs/wresnet28x10_svhncore_b128_maxlr.1_ra_fixed.yaml

I used fixed_standard which is the search space described in the RA paper, but slightly different from that in the implementation of the significant parts of this (https://github.com/tensorflow/models/tree/fd34f711f319d8c6fe85110d9df6e1784cc5a6ca/research/autoaugment) AutoAugment implementation (which RA follows with its augmentation space), therefore I call it fixed. The setting you speak about is for their ImageNet experiment, as my re-implementations of AA/RA where on 32x32 images (CIFAR/SVHN), I followed the implementation above. Here they set the translation to 10: https://github.com/tensorflow/models/blob/fd34f711f319d8c6fe85110d9df6e1784cc5a6ca/research/autoaugment/augmentation_transforms.py#L319 They do not use the same augmentation space across datasets...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually not sure what is the best strategy to follow here. Any idea? The same problem arises for AutoAugment actually. Should we ask the authors or focus only on 32x32 images or only on ImageNet? For TA it is simpler we use the same setting across datasets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure either. Here I decided to use this approach because of their comment on Table 6 on the AA paper. This also means that if you are a 32x32 image as in the case of CIFAR, your Translate max value would be 14.5, which is similar but on equal to yours and hence it sparked my interest on how you derived it. Not sure if this subpixel diff matters here.

@@ -178,9 +178,9 @@ def _get_transforms(
else:
raise ValueError("The provided policy {} is not recognized.".format(policy))

def _get_magnitudes(self, num_bins: int, image_size: List[int]) -> Dict[str, Tuple[Tensor, bool]]:
def _augmentation_space(self, num_bins: int, image_size: List[int]) -> Dict[str, Tuple[Tensor, bool]]:
Copy link
Contributor Author

@datumbox datumbox Sep 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even thought this is a private method, I decided to use the terminology of the TrivialAugment paper as I think it describes better what we get back (combination of permitted ops and magnitudes) for the given augmentation.

image. If given a number, the value is used for all bands respectively.
"""

def __init__(self, num_ops: int = 2, magnitude: int = 9, num_magnitude_bins: int = 30,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • N=2, M=9 are the best ImageNet values for ResNet50 (see A.2.3 on paper).
  • num_magnitude_bins=30 because the majority of the experiments on the paper used this value. Weirdly section A.2.3 mentions trying the level 31 for EfficientNet B7.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The num_magnitude_bins should be 31, like for TA, as 0 is also a bin and in the paper the maximal value is 30. That they tried level 31 is definitely weird and, I guess, a typo.

@datumbox datumbox marked this pull request as ready for review September 1, 2021 19:12
@datumbox datumbox requested a review from fmassa September 1, 2021 19:13
@datumbox datumbox changed the title [WIP] Adding RandAugment implementation Adding RandAugment implementation Sep 2, 2021
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

I haven't checked the bin values for the augmentation space, but the rest is good with me.

I've made a couple of minor comments, none of which are merge blocking

Comment on lines +299 to +303
if isinstance(img, Tensor):
if isinstance(fill, (int, float)):
fill = [float(fill)] * F.get_image_num_channels(img)
elif fill is not None:
fill = [float(f) for f in fill]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: might be worth putting this in a helper function, or push it directly to _apply_op maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I'll add this in a helper method for now. This can move to a base class once we nail the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JIT is giving me headaches if I move it on ops. I'll add a TODO to remove duplicate code once we have a base class.

@@ -239,3 +239,87 @@ def forward(self, img: Tensor) -> Tensor:

def __repr__(self) -> str:
return self.__class__.__name__ + '(policy={}, fill={})'.format(self.policy, self.fill)


class RandAugment(torch.nn.Module):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it makes sense to inherit from AutoAugment and override only the _augmentation_space method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed. It's a direct copy-paste. The only reason I didn't make it static or inherit from AutoAugment is because I think we haven't nailed the API of the base class yet. I was thinking of keeping only the public parts visible and make changes once we add a couple of methods.

@datumbox datumbox merged commit 5a81554 into pytorch:main Sep 2, 2021
@datumbox datumbox deleted the transforms/randaugment branch September 2, 2021 12:31
Copy link
Contributor

@SamuelGabriel SamuelGabriel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it looks good, besides the mentioned problems of the augmentation space.

image. If given a number, the value is used for all bands respectively.
"""

def __init__(self, num_ops: int = 2, magnitude: int = 9, num_magnitude_bins: int = 30,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The num_magnitude_bins should be 31, like for TA, as 0 is also a bin and in the paper the maximal value is 30. That they tried level 31 is definitely weird and, I guess, a typo.

# op_name: (magnitudes, signed)
"ShearX": (torch.linspace(0.0, 0.3, num_bins), True),
"ShearY": (torch.linspace(0.0, 0.3, num_bins), True),
"TranslateX": (torch.linspace(0.0, 150.0 / 331.0 * image_size[0], num_bins), True),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not use this setting for my RA experiments actually. See for example here: https://github.com/automl/trivialaugment/blob/master/confs/wresnet28x10_svhncore_b128_maxlr.1_ra_fixed.yaml

I used fixed_standard which is the search space described in the RA paper, but slightly different from that in the implementation of the significant parts of this (https://github.com/tensorflow/models/tree/fd34f711f319d8c6fe85110d9df6e1784cc5a6ca/research/autoaugment) AutoAugment implementation (which RA follows with its augmentation space), therefore I call it fixed. The setting you speak about is for their ImageNet experiment, as my re-implementations of AA/RA where on 32x32 images (CIFAR/SVHN), I followed the implementation above. Here they set the translation to 10: https://github.com/tensorflow/models/blob/fd34f711f319d8c6fe85110d9df6e1784cc5a6ca/research/autoaugment/augmentation_transforms.py#L319 They do not use the same augmentation space across datasets...

"Solarize": (torch.linspace(256.0, 0.0, num_bins), False),
"AutoContrast": (torch.tensor(0.0), False),
"Equalize": (torch.tensor(0.0), False),
"Invert": (torch.tensor(0.0), False),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invert should not be here, but Identity should be. I believe you replicated the mistake I made in the TA implementation for Vision. Sorry for that. I'll fix it there, too. TA and RA use the same augmentation operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks! Indeed this was copied from you. Your implementation heavily inspired how the entire code was refactored here, so thanks a lot for the contribution.

# op_name: (magnitudes, signed)
"ShearX": (torch.linspace(0.0, 0.3, num_bins), True),
"ShearY": (torch.linspace(0.0, 0.3, num_bins), True),
"TranslateX": (torch.linspace(0.0, 150.0 / 331.0 * image_size[0], num_bins), True),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually not sure what is the best strategy to follow here. Any idea? The same problem arises for AutoAugment actually. Should we ask the authors or focus only on 32x32 images or only on ImageNet? For TA it is simpler we use the same setting across datasets.

Copy link
Contributor Author

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SamuelGabriel Thanks a lot for your review. I'll fix the issues you highlighted on a separate PR as this is already merged.

# op_name: (magnitudes, signed)
"ShearX": (torch.linspace(0.0, 0.3, num_bins), True),
"ShearY": (torch.linspace(0.0, 0.3, num_bins), True),
"TranslateX": (torch.linspace(0.0, 150.0 / 331.0 * image_size[0], num_bins), True),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure either. Here I decided to use this approach because of their comment on Table 6 on the AA paper. This also means that if you are a 32x32 image as in the case of CIFAR, your Translate max value would be 14.5, which is similar but on equal to yours and hence it sparked my interest on how you derived it. Not sure if this subpixel diff matters here.

"Solarize": (torch.linspace(256.0, 0.0, num_bins), False),
"AutoContrast": (torch.tensor(0.0), False),
"Equalize": (torch.tensor(0.0), False),
"Invert": (torch.tensor(0.0), False),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks! Indeed this was copied from you. Your implementation heavily inspired how the entire code was refactored here, so thanks a lot for the contribution.

facebook-github-bot pushed a commit that referenced this pull request Sep 9, 2021
Summary:
Pull Request resolved: facebookresearch/vissl#421

* Adding randaugment implementation

* Refactoring.

* Adding num_magnitude_bins.

* Adding FIXME.

Reviewed By: fmassa

Differential Revision: D30793331

fbshipit-source-id: 7a99c6d2e64931e10672ceea9e81309c62a799af
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants