Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect SampleFormat and BitsPerSample #5850

Closed
XrioBtw opened this issue Nov 24, 2021 · 7 comments · Fixed by #7049
Closed

Incorrect SampleFormat and BitsPerSample #5850

XrioBtw opened this issue Nov 24, 2021 · 7 comments · Fixed by #7049
Labels

Comments

@XrioBtw
Copy link

XrioBtw commented Nov 24, 2021

Images with signed integer data are all incorrectly assumed to be int32. uint32 and float64 are also incorrectly determined.

import numpy as np
from PIL import Image
from tempfile import TemporaryFile

for dt in [np.bool,
           np.uint8, np.uint16, np.uint32,
           np.int8,  np.int16,  np.int32,
           np.float32, np.float64]:
    with TemporaryFile() as fp:
        before = np.array([[0, 1], [0, 1]], dtype = dt)
        img = Image.fromarray(before)
        img.save(fp, format = "TIFF")
        img = Image.open(fp)
        after = np.asarray(img)
        print("{0: >7} -> {1: >7}".format(str(before.dtype), str(after.dtype)))

gives

   bool ->    bool
  uint8 ->   uint8
 uint16 ->  uint16
 uint32 ->   int32
   int8 ->   int32
  int16 ->   int32
  int32 ->   int32
float32 -> float32
float64 -> float32
@radarhere
Copy link
Member

radarhere commented Nov 24, 2021

In Pillow, we translate input data into a few standard modes. So, for example, we don't have a mode for float64. Instead, we unpack a variety of floating point types into just "F" -

Pillow/src/libImaging/Unpack.c

Lines 1700 to 1721 in 0bd97c3

{"F", "F", 32, copy4},
{"F", "F;8", 8, unpackF8},
{"F", "F;8S", 8, unpackF8S},
{"F", "F;16", 16, unpackF16},
{"F", "F;16S", 16, unpackF16S},
{"F", "F;16B", 16, unpackF16B},
{"F", "F;16BS", 16, unpackF16BS},
{"F", "F;16N", 16, unpackF16N},
{"F", "F;16NS", 16, unpackF16NS},
{"F", "F;32", 32, unpackF32},
{"F", "F;32S", 32, unpackF32S},
{"F", "F;32B", 32, unpackF32B},
{"F", "F;32BS", 32, unpackF32BS},
{"F", "F;32N", 32, unpackF32N},
{"F", "F;32NS", 32, unpackF32NS},
{"F", "F;32F", 32, unpackF32F},
{"F", "F;32BF", 32, unpackF32BF},
{"F", "F;32NF", 32, unpackF32NF},
#ifdef FLOAT64
{"F", "F;64F", 64, unpackF64F},
{"F", "F;64BF", 64, unpackF64BF},
{"F", "F;64NF", 64, unpackF64NF},

See also https://pillow.readthedocs.io/en/stable/handbook/concepts.html#modes.

It would be rather big change to abandon this idea in order to ensure that all of NumPy's modes could be roundtripped through Pillow.

@kmilos
Copy link
Contributor

kmilos commented Nov 25, 2021

I guess the only potential "bug" here is the uint32 -> int32 conversion, there should be no reason not to preserve this?

@radarhere
Copy link
Member

@kmilos are you saying that all "I" modes should go back to NumPy as uint32? So in the context of this issue, that would also mean int8, int16 and int32 would also become uint32.

@kmilos
Copy link
Contributor

kmilos commented Nov 25, 2021

are you saying that all "I" modes should go back to NumPy as uint32?

No, just that there is a potential data loss/(or wraparound) in just uint32->int32 specifically, i.e. Pillow might be missing something like an "U" mode or "I;32U" modifier, or another way to indicate to Numpy how to cast internal int32 back.

@radarhere
Copy link
Member

radarhere commented Mar 29, 2023

The "I" mode has a range of int32, so it makes sense to me for that to become int32 when converted to NumPy.

You specifically called out uint32 -> int32, but I think the same argument would apply to float64 -> float32. There might indeed be data loss - but in the conversion from NumPy to Pillow in the first place, because the original NumPy types have a higher maximum than the Pillow modes.

Pillow mostly doesn't retain the information about the rawmode that an image had. Once the information is loaded, it might be transformed. A user might decide to scale the values down, given that int32 has a lower maximum value than uint32. If the user later converted that back to NumPy, they might be surprised that Pillow had remembered the original format of the data. I don't think we should obscure the fact that the range of the values decreased in this process.

Also, just to be clear - this issue has nothing to do with the TIFF format, but just concerns the interface between NumPy and Pillow.

@radarhere radarhere changed the title Incorrect SampleFormat and BitsPerSample for TIFF Incorrect SampleFormat and BitsPerSample Mar 29, 2023
@kmilos
Copy link
Contributor

kmilos commented Mar 29, 2023

You specifically called out uint32 -> int32, but I think the same argument would apply to float64 -> float32.

It's not 100% the same - the former has to deal w/ sign and wraparound, the latter doesn't (only precision and range).

In any case, yes, the NumPy <-> Pillow interface should be transparently documented with all of its quirks and gotchas.

@radarhere
Copy link
Member

I've created PR #7049 to resolve this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants