Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transforms: Invert #547

Closed

Conversation

sebastianberns
Copy link

Hi!

I had written a transform that inverts grayscale images from a custom dataset for my own use.
Since I believe this might be useful to others I wanted to share this extended version which accepts PIL images of modes "L", "LA", "RGB" and "RGBA".

I have further written a unit test which is included in the PR.
A simple demo is available here.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jul 16, 2018

@sebastianberns looks good, but could you please provide more background on benefits of invert applied on RGB images ?

@sebastianberns
Copy link
Author

In any case, where the information of interest in a pixel image is placed on a white background this function would be helpful for data preprocessing. We want a black background since a convolution by default adds zeros all around as padding.

While this is obvious for grayscale (MNIST) this similarly holds for any number of channels. (Thinking about this now, the invert function could be rewritten to simply invert all channels but the alpha channel.)
Imagine fashion-MNIST in color, or image renders of Shapenet (although here it might be trivial to place an object on black instead of white background).

That being said, I personally use the invert function rather to manipulate the background than the foreground.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jul 17, 2018

@sebastianberns sorry for delayed answer. I agree that this could make sense on gray-level objects and where background is homogenious and white. I still have a doubt on inversion of RGB images. Take as example your demo, where a frog is inverted. In such cases network will see green, brownish frogs and as well as blue and magenta ones and probably it will learn only geometric features rather than color features to distiguish frogs.

@sebastianberns
Copy link
Author

sebastianberns commented Jul 18, 2018

Please, take as much time as you need for reviews and replies.

Also, please understand this PR as my way of sharing what I thought useful. Ultimately it shall be the maintainers’ decision to add any feature or not. I’m not familiar with the corresponding policies.

My demo is an illustration of the feature’s capability, not an exposition of its purpose.

Your doubts about the inversion of color images with bleed (that is to say a photo that covers the whole image map with no uniform margin) are reasonable, of course, because inverting in such a case is not necessary, and not the use case I am drawing.

Since you somehow accept this feature’s usefulness for grayscale images unquestioned, let me insist that inversion is useful because of a uniform (usually white) background, independent of the number of channels (1, L, RGB, LAB, …).

Even if the image of the frog that you reference was black and white still it would not make a difference inverting it. But imagine the same frog sitting on a white background instead of the ground of a forest (similarly to the data in fashion-MNIST and Shapenet I have mentioned). Then, I argue, you’d want to invert it.

If you were to reject color inversion I can offer to rewrite to only allow single-channel images.

@sebastianberns
Copy link
Author

In the meantime, I have set up a separate repository to save this custom transform for future reference.
https://github.com/sebastianberns/torchvision.transforms.invert

@fmassa
Copy link
Member

fmassa commented Jul 30, 2018

I'm not convinced about this transformation.
Indeed, in RGB images, inverting the colors changes the color distribution and makes a pre-trained CNN completely fail.

If the main use-case is to invert white color background into black, this would be better as a user layer for a particular application, but I'm not 100% sure that it's general enough to be merged in torchvision.

Thanks for the PR though!

@fmassa fmassa closed this Jul 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants