Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement AutoAugment for Detection #6609

Open
wants to merge 45 commits into
base: main
Choose a base branch
from

Conversation

ain-soph
Copy link
Contributor

Implement #6224.

Implement Learning Data Augmentation Strategies for Object Detection
Refers to: #3817

Motivation, pitch

Good to have augmentation in Torchvision

cc @vfdev-5 @datumbox

@ain-soph
Copy link
Contributor Author

ain-soph commented Sep 19, 2022

@vfdev-5

  • I'm using bboxes.data as torch.Tensor to avoid type checking issues. I'm not sure if it's a good convention.
              elif transform_id == "TranslateX":
                  bboxes.data = F.affine_bounding_box(
                      bboxes.data,
                      bboxes.format,
                      bboxes.image_size,
                      angle=0.0,
                      translate=[int(magnitude), 0],
                      scale=1.0,
                      shear=[0.0, 0.0],
                  )
  • There are several ops that I couldn't find in our current library:
    • SolarizeAdd
    • Cutout, Cutout_Only_BBoxes
    • BBox_Cutout
    • Flip_Only_BBoxes (This is actually just hflip, but we need to put that in _AutoAugmentBase._apply_image_transform)

@datumbox datumbox linked an issue Sep 20, 2022 that may be closed by this pull request
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Sep 22, 2022

@ain-soph thanks for the PR, let's remove useless files:

  • torchvision/image.pyd
  • torchvision/_C.pyd

@ain-soph
Copy link
Contributor Author

@ain-soph thanks for the PR, let's remove useless files:

  • torchvision/image.pyd
  • torchvision/_C.pyd

Oops! Sorry for missing that... I'll change it soon.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Sep 22, 2022

@ain-soph do you have any working testing code to run it and visually check the output ? If yes, please share it here, it would be helpful. Thanks !

@ain-soph
Copy link
Contributor Author

@vfdev-5 Sorry for the latency. Was busy at ICLR for the past week. I'll add some test codes tmr.

@ain-soph
Copy link
Contributor Author

The recent auto-augment api modification forces me to re-design the api for detection. I'm still working on this now. Need to know how the new framework works.
Kinda confused if there is any case for such "video detection" compatibility?

@datumbox
Copy link
Contributor

datumbox commented Nov 4, 2022

@ain-soph The API should be more stable now. Which specific AA modification broke your code. Perhaps we can help you?

@ain-soph
Copy link
Contributor Author

ain-soph commented Nov 4, 2022

@datumbox Sorry for dragging it so long. The main confusing part is how I can extract the bounding box from inputs, and (after augmentation) how I can insert the modified bounding box back to inputs.

Current image classification codes don't have this part and I can't find something for reference.
I'm trying to overload existing _flatten_and_extract_image_or_video and add the bounding box support, but I don't know whether it's recommended. I'll implement an ugly-but-work demo first for review. Hopefully it shall be finished today.

@datumbox
Copy link
Contributor

datumbox commented Nov 4, 2022

@ain-soph Apologies for the rough edges. You are basically brave for digging into the developer part of the API so early. The public part should be fairly stable. We will continue polishing the internal part to make the experience smoother.

The method you probably want for extracting the bounding boxes is _utils.query_bounding_box(). I think overriding temporarily the _flatten_and_extract_image_or_video is fine. An alternative would be to have a similar separate method for bboxes. At this point I would recommend to "just do what you need to do" without worrying about the recommended way to implement the transform. We can help you refactor it once you have it in a working condition.

@vfdev-5 / @pmeier any recommendation on your side at this point?

@ain-soph
Copy link
Contributor Author

ain-soph commented Nov 5, 2022

I've uploaded a version that shall work and make a very simple unittest. Please take a review and I'm expecting some guidance to improve, thanks!

There are still a few to-do items on the plate, especially some augment operations not implemented in current transforms library yet: SolarizeAdd, Cutout, BBox_Cutout, Flip_Only_BBoxes, Cutout_Only_BBoxes

@ain-soph ain-soph changed the title [DRAFT] Implement AutoAugment for Detection Implement AutoAugment for Detection Mar 9, 2023
@ain-soph
Copy link
Contributor Author

@vfdev-5 @datumbox It shall be ready to merge after reviewing all the comments I posted above.

@ain-soph
Copy link
Contributor Author

ain-soph commented Apr 6, 2023

@vfdev-5 This PR shall be ready for merge and please take a final review. I have annotated some detailed comments in codes.

Summary:

  • Implement solarize_add and cutout transforms.
  • Add 4 new transform_id in apply_transform: "Flip", "SolarizeAdd", "Cutout", "BBox_Cutout"
  • In _transform_image_or_video_in_bboxes, convert bbox format to XYXY
  • Strictly follow tensorflow repo setting to:
    • use 11 levels from 0 to 10. In previous AA codes, levels are from 0 to 9.
      To implement this, num_bins is replaced by num_bins+1 to keep correct intervals.
    • follow their magnitude settings:
      • "Solarize": slightly different. It's now from 256/255 to 0. In previous AA codes, it's from 1.0 to 0.
        (Shall we consider still using 1.0 to 0 for consistency?)
      • "Brightness", "Color", "Contrast", "Sharpness": the meaning of level quite differs from previous AA codes.
        In previous AA codes, it's [0, 0.9] in 10 pieces with random negate (so level 0 refers to 0 and level 9 refers to either 0.9 or -0.9).
        In current AAdet codes, it's [-0.9, 0.9] in 11 pieces (so level 0 refers to -0.9 and level 10 refers to 0.9).
      • "TransformX", "TransformY": magnitude ranges are different from AA settings.
  • Add a second idx in flat_inputs_with_spec to save index of bboxes tensor.
  • Use _apply_image_or_video_and_bboxes_transform to process transforms.
    • For "*_Only_BBoxes" transforms and "BBox_Cutout", call _transform_image_or_video_in_bboxes to apply transform to the sub-images/videos within the bboxes.
    • For normal transforms, apply transform to both image/video and bboxes.

Several small concerns remaining:

  • test file
    • Current test (for both AA and AAdetection) will only use the first bbox and image/video in test dict, which makes test cases limited:
      def auto_augment_adapter(transform, input, device):
      adapted_input = {}
      image_or_video_found = False
      for key, value in input.items():
      if isinstance(value, (datapoints.BoundingBox, datapoints.Mask)):
      # AA transforms don't support bounding boxes or masks
      continue
      elif check_type(value, (datapoints.Image, datapoints.Video, is_simple_tensor, PIL.Image.Image)):
      if image_or_video_found:
      # AA transforms only support a single image or video
      continue
      image_or_video_found = True
      adapted_input[key] = value
      return adapted_input
    • I'm currently testing all 4 policies, which might be unnecessary.
  • AutoAugmentDetection
    • There are unnecessary type convert in solarize_add and _apply_image_or_video_and_bboxes_transform.
    • _apply_image_or_video_transform simply calls _apply_transform. Could we merge them?
    • In _apply_transform, I have a isinstance(inpt, datapoints.BoundingBox) condition in the middle, which might makes codes inconsistent with other conditions and difficult to understand (see comments for details).
    • Need to check the usage of wrap_like is correct:
      image.wrap_like(image, result)
      chosen_bbox = bboxes.wrap_like(bboxes, bboxes[random_index].unsqueeze(0))
    • flat_inputs_with_spec records 2 idxs in AAdet, which is different from AA codes.

@ain-soph
Copy link
Contributor Author

ain-soph commented May 8, 2023

@vfdev-5 @datumbox Any feedback from maintainers? I think all code work has been finished and this PR is already ready for review. Please refer to #6609 (comment) for a work summary.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented May 9, 2023

@ain-soph thanks for pinging and sorry for the delay, I'll check the code this week and see how we can make it done finally

@vfdev-5
Copy link
Collaborator

vfdev-5 commented May 16, 2023

_apply_image_or_video_transform simply calls _apply_transform. Could we merge them?

OK, let's remove _apply_image_or_video_transform and keep only _apply_transform. _apply_image_or_video_transform is sort of alias saying that it works only on images/video.

In _apply_transform, I have a isinstance(inpt, datapoints.BoundingBox) condition in the middle, which might makes codes inconsistent with other conditions and difficult to understand (see comments for details).

Commented in the code. Let's remove all AADet related stuff from AABase.

flat_inputs_with_spec records 2 idxs in AAdet, which is different from AA codes.

This is probably ok.

Need to check the usage of wrap_like is correct

Seems like ok.

@ain-soph thanks for the work on this transformation! It is rather untrivial and especially that we need finally to train a model with this transform to see real benefit of using it.

Let's fix reported bugs and merge it and continue with follow-up PRs.

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot for the PR @ain-soph and appologies for such long review.

I pushed main class AutoAugmentDetection as _AutoAugmentDetection temporarily the time we validate everything and write proper non-regression tests.

I'll comment in this PR on the next steps.

@@ -621,3 +678,309 @@ def forward(self, *inputs: Any) -> Any:
mix = F.to_image_pil(mix)

return self._unflatten_and_insert_image_or_video(flat_inputs_with_spec, mix)


class _AutoAugmentDetectionBase(_AutoAugmentBase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for having _AutoAugmentDetectionBase? It seems to only be used by one child class

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later we could simply add other AA techniques for detection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we plan to?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no clear plans at the moment. Do you want to simplify that and readd later ?

@ain-soph
Copy link
Contributor Author

@vfdev-5 @NicolasHug Feel free to ping me when there is a plan from maintainers for following work. I could continue working on this.

@ain-soph

This comment was marked as off-topic.

@ain-soph

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement AutoAugment for Detection
5 participants