Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature proposal] Allow processing multiple images with transforms.Compose #1169

Open
Noiredd opened this issue Jul 25, 2019 · 4 comments
Open

Comments

@Noiredd
Copy link

Noiredd commented Jul 25, 2019

Proposal

Following discussions started in #9, #230, #240, and most recently in #610, I would like to propose the following change to transforms.Compose (and the transform classes) that would allow easy processing of multiple images with the same parameters using existing infrastructure. I think this would be very useful for segmentation tasks, where both the image and label need to be transformed exactly the same way.

Details

Currently the problem is that each transform, when called, implicitly generates randomized parameters (if it is a random transform) right before computing the transformation. In my opinion, it doesn't have to be so - parameter generation (get_params) is already separated from the actual image operation (which relies on the functional backend). My idea comes in two parts: first, completely decouple parameter generation from transformation; then allow Compose to generate parameters once and apply transformations multiple times.

Step 1, on the example of RandomResizedCrop:

  1. Add a generate_params method, to access the existing get_params but without the need to pass specific arguments. This function would look exactly the same for every transform that needs any random parameters. Passing specific arguments to get_params will be implementation-dependent.
def generate_params(self, image):
    return self.get_params(image, self.scale, self.ratio)
  1. Allow __call__ to optionally accept a tuple of pre-generated params:
def __call__(self, img, params=None):
    if params:
        i, j, h, w = params
    else:
        i, j, h, w = self.get_params(img, self.scale, self.ratio)
    return F.resized_crop(img, i, j, h, w, self.size, self.interpolation)

Step 2 is enabling this functionality in Compose by changing __call__ to accept iterables. Alternatively, we could subclass it entirely, which I will do in this example:

class MultiCompose(Compose):
    def __call__(self, imgs):
        for t in self.transforms:
            try:
                params = t.generate_params()
            except AttributeError:
                params = None
            imgs = tuple(t(img, params) for img in imgs)
        return imgs

Subclassing offers some advantages, for example interpolation methods could be bound to iterable indices at __init__, so we could interpolate the first item bilinearly, and the second with nearest-neighbour (ideal for segmentation).

Alternative approach

Instead of doing try/except in the Compose subclass, all transforms could be changed to inherit from a new BaseTransform abstract class, which could define generate_params as a trivial function returning None. Then we could just do:

class MultiCompose(Compose):
    def __call__(self, imgs):
        for t in self.transforms:
            params = t.generate_params()
            imgs = tuple(t(img, params) for img in imgs)
        return imgs

because static transforms like Pad would simply return None, while any random transforms would need to define generate_params accordingly.

Yes I do realize that this requires a slight refactoring of e.g. RandomHorizontalFlip

Usage

The user could subclass Dataset to yield (image, label) tuples. This change would allow them to apply custom preprocessing/augmentation transformations separately, instead of hard-coding them in the dataset implementation using functional backend. It would look sort of like this:

data_transform = transforms.MultiCompose([
    transforms.RandomCrop(256),
    transforms.ToTensor()
])
data_source = datasets.SegmentationDataset( # user class
    root='/path/to/data/,
    transform=data_transform
)
loader = torch.utils.data.DataLoader(data_source, batch_size=4, num_workers=8)

I think this would be a significantly more convenient way of doing this.

Let me know if you think this is worth pursuing. I will have some free time next week so if this would be useful and has a chance of being merged, I'd happily implement it myself. If you see any potential pitfalls or backwards-compatibility ruining caveats - please tell me as well.


Addenda

Later I have found PR #611, but it seems to have been abandoned by now, having encountered some issues that I think my plan of attack can overcome.

Some deal of the problems with this stem from the fact that get_params, since their introduction in #311, do not share an interface between classes. Instead, getting params for each transform is a completely different call. This feels anti-OOP and counter-intuitive to me; are there any reasons why this has been made this way? @alykhantejani?

cc @vfdev-5

@Jonas1312
Copy link

Image segmentation is a fairly common application nowadays.

For now I'm using this https://github.com/Jonas1312/pytorch-segmentation-dataset but it would be nice if torchvision could support data/mask augmentation natively.

@fmassa
Copy link
Member

fmassa commented Sep 10, 2019

Hi @Noiredd

Thanks for the proposal, the PR in #1315 and the discussion, and sorry for not getting back to you before.

I believe the problem mentioned in #611 (comment) is still valid in here.

For example, we don't want to apply bilinear interpolation for the ground-truth segmentation masks, so this should be handled differently and that's not covered in this PR. Same thing for rotations, scalings, etc.

Another example: we add a ColorAugmentation transform in the pipeline. This will now augment (and mess with) the segmentation masks, which is not at all what we want.

I think this is a fundamental problem with extending the Compose to handle multiple images, and users will hit into those corner cases immediately.

This is the reason why we have been advocating for using the functional interface (e.g., in #610).

In the same way that PyTorch doesn't support nn.Sequential with more complex inputs than a single Tensor (see pytorch/pytorch#9979 for example), I don't think that we should be extending Compose (which is a nn.Sequential under the hood) in the same way.

What I do think we should be doing instead is to more broadly use the functional interface, and use nn.Module for the transforms instead.

Thoughts?

@Noiredd
Copy link
Author

Noiredd commented Sep 10, 2019

Just as you were writing that comment, I was working on a solution to this very issue :)

My idea: there are basically three cases with augmentations for segmentation:

  • both transforms work the same for image and mask (e.g. RandomCrop),
  • transform operates only on the image, leaving mask intact (e.g. Grayscale),
  • transform uses interpolation (e.g. RandomRotation).

The new version of SegmentationCompose handles all three cases by recognizing which transforms are added to the pipeline. In the first case it behaves just like MultiCompose, plain and simple. For (2) and (3) I propose the following additions:

  • an "exceptions register" is kept in the class to recognize transforms that need special treatment,
  • separate pipelines are introduced, one for images, other for labels.

If an image-only transform (case 2) is detected, in the label-specific pipeline it is replaced with a new NullTransform. In case (3), the transform object is deeply copied and its interpolation (and resampling) attributes changed to Image.NEAREST so it doesn't mess with the labels.

Please see the newest commits in #1315, let me know what you think of the approach.

I do realize this adds a little overhead to creating new augmentations (as one has to remember to register them to SegmentationCompose), but in my opinion it is worth it. Personally, I've spent this whole year building augmentation pipelines from scratch over and over using the functional interface. 95% of the time I just want to use the existing blocks - quickly and efficiently change configurations without having to rebuild the whole thing.

@fmassa
Copy link
Member

fmassa commented Sep 10, 2019

@Noiredd I've replied in the PR now, let's maybe keep the discussion there now? #1315 (review)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants