updated transforms.ToPILImage, see #105 #122

bodokaiser · 2017-03-23T09:30:17Z

PR for #105. Changes should be backward compatible. Types are interfered from PIL.Image.mode when passing single channel images. See PIL modes here

How does ToPILImage handle types?

1- channel:
* FloatTensor -> ByteTensor -> 'L' Image (8-bit black and white)
* ShortTensor -> 'I;16' Image
* IntTensor -> 'I' Image
* analog for the respective numpy types (exception: np.float32 will be 'F' Image)
* other types (e.g. signed ints) -> error

3-channel
* same as before with addition that we will have an error for images with precision over 8-bit (because PIL.Image cannot handle more than 8-bit colors).

How does ToTensor handle types?

* np.ndarrays -> FloatTensor of range [0, 1]
* 'I;16' Image -> ShortTensor
* 'I' Image -> IntTensor
* every thing else -> ByteTensor

bodokaiser · 2017-03-23T10:56:03Z

I want to note that the way I handle single channel images with precision > 8-bit breaks from tradition to scale tensors to [0, 1] and images to [0, 255].
In my opinion this is a necessary change because we would loose precision and / or would need to specify types on every transform call. Nevertheless I would suggest to introduce a breaking api change with the next major release which stops scaling images to [0, 255] and tensors to [0, 1]. This is something the user can easily implement using transforms.Lambda or transforms.Normalize.

torchvision/transforms.py

+            return img.float().div(255)
+        # handle PIL Image
+        if pic.mode == 'I':
+            img = torch.from_numpy(np.array(pic, np.int32))


fmassa · 2017-03-23T16:27:14Z

I agree with you wrt keeping the original types and range of the image.
Not always we want to have images in the [0, 1] range (for example, when the target is an image), but we might still want to apply transforms to it and have it converted to a tensor. Currently, we need a LambdaFunction for that, to avoid this division by 255.
But the thing is, for most of the users, dividing by 255 is fine for them, as they don't have image targets.
Do you think it is a reasonable assumption that all the images in a dataset will have the same number of bytes per pixel? If yes, then we could have a constructor argument for an optional normalization factor?

bodokaiser · 2017-03-23T16:32:59Z

👍 for the idea with the constructor. I you would use nfactor=255 as default to keep BC. Should we disable normalization then on nfactor=0 or nfactor=None?

fmassa · 2017-03-23T16:42:55Z

Yeah, maybe let the normalization factor by default to be 255, and maybe we only convert to float if the normalization factor is not None?
This means that if we convert to float, we need to pass as well a "unnormalization" factor to ToPILImage, and maybe passing as a second argument the type to be converted to?

Also, maybe a more descriptive name, such as normalization_factor? Not sure on this one

bodokaiser · 2017-03-23T16:49:15Z

Yeah, maybe let the normalization factor by default to be 255, and maybe we only convert to float if the normalization factor is not None?

Sounds reasonable.

This means that if we convert to float, we need to pass as well a "unnormalization" factor to ToPILImage, and maybe passing as a second argument the type to be converted to?

I don't think we need to pass the type around. In case of disabled normalization we can interfere type as implemented in the PR. If normalization should be applied we just expect ByteTensor.

bodokaiser · 2017-03-23T16:59:17Z

@soumith @apaszke any additional comments / thoughts?

soumith · 2017-03-23T20:48:38Z

looks pretty good, thank you!

bodokaiser added 4 commits March 23, 2017 10:24

updated transforms.ToPILImage, see pytorch#105

b8e69d8

added line after class definition to fix lint error

657aea2

added support for signed ints, removed support for unsigned

d7258f5

updated ToTensor to support more types

63c23ae

bodokaiser mentioned this pull request Mar 23, 2017

support int16 grayscale images #105

Closed

fmassa reviewed Mar 23, 2017

View reviewed changes

torchvision/transforms.py Outdated

return img.float().div(255)

# handle PIL Image

if pic.mode == 'I':

img = torch.from_numpy(np.array(pic, np.int32))

This comment was marked as off-topic.

Sign in to view

added copy=False to np.array constructor

0074868

fixed linter complaints

85b166a

bodokaiser force-pushed the 16bit-grayscale branch from 01adcdf to 85b166a Compare March 23, 2017 16:45

soumith merged commit 510f095 into pytorch:master Mar 23, 2017

fmassa mentioned this pull request Apr 4, 2017

RandomHorizontalFlip can handle both PIL Image and numpy ndarray #141

Closed

alykhantejani mentioned this pull request Nov 3, 2017

torchvision/transform.py - to_tensor() - return type is not float #317

Closed

fmassa mentioned this pull request Jul 7, 2020

Function to_pil_image expects wrong dtype for 'I;16' mode. #2322

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

updated transforms.ToPILImage, see #105 #122

updated transforms.ToPILImage, see #105 #122

bodokaiser commented Mar 23, 2017 •

edited

Loading

bodokaiser commented Mar 23, 2017

This comment was marked as off-topic.

fmassa commented Mar 23, 2017

bodokaiser commented Mar 23, 2017

fmassa commented Mar 23, 2017

bodokaiser commented Mar 23, 2017

bodokaiser commented Mar 23, 2017

soumith commented Mar 23, 2017

updated transforms.ToPILImage, see #105 #122

updated transforms.ToPILImage, see #105 #122

Conversation

bodokaiser commented Mar 23, 2017 • edited Loading

bodokaiser commented Mar 23, 2017

This comment was marked as off-topic.

fmassa commented Mar 23, 2017

bodokaiser commented Mar 23, 2017

fmassa commented Mar 23, 2017

bodokaiser commented Mar 23, 2017

bodokaiser commented Mar 23, 2017

soumith commented Mar 23, 2017

bodokaiser commented Mar 23, 2017 •

edited

Loading