Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Three TIFF tags cause display issues in GIS software #5838

Closed
XrioBtw opened this issue Nov 17, 2021 · 9 comments · Fixed by #5839
Closed

Three TIFF tags cause display issues in GIS software #5838

XrioBtw opened this issue Nov 17, 2021 · 9 comments · Fixed by #5839
Labels

Comments

@XrioBtw
Copy link

XrioBtw commented Nov 17, 2021

When I run my GeoTIFF though Pillow while trying to preserve the TIFF info I end up with 3 additional TIFF tags in the output (StripOffsets, RowsPerStrip, StripByteCounts). The resulting output GeoTIFF is now displayed incorrectly in the GIS software ESRI ArcMap (all cell values are now NoData), but is displayed correctly with Pillow or Matplotlib. The input GeoTIFF image is produced with ESRI ArcMap and does not have the 3 extra TIFF tags.

This could of course be an issue with ESRI ArcMap, but I believe the 3 Pillow generated TIFF tags might be wrong. They show up as:

StripOffsets = (810,)
RowsPerStrip = (40,)
StripByteCounts = (6400,)

Input GeoTIFF:
dem_input.zip

from PIL import Image

EXTRA_TIFF_TAGS = {"StripOffsets" : 273, "RowsPerStrip" : 278, "StripByteCounts" : 279}

image_input = Image.open("dem_input.tif")
print([TAG in image_input.tag._tagdata for TAG in EXTRA_TIFF_TAGS.values()])   # [False, False, False]

image_input.save("dem_output.tif", tiffinfo = image_input.tag)

image_output = Image.open("dem_output.tif")
print([TAG in image_output.tag._tagdata for TAG in EXTRA_TIFF_TAGS.values()])   # [True, True, True]

print([image_output.tag[TAG] for TAG in EXTRA_TIFF_TAGS.values()])   # [(810,), (40,), (6400,)]
@wiredfool
Copy link
Member

I suspect that you're much more likely to be losing the geo metadata or converting data types than seeing those tags cause an error.

I'd recommend checking the metadata with gdalinfo. And generally, depending on what you're doing, gdal (command line/python) or rasterio (better python bindings to gdal) will probably be better for interfacing with rasters.

@radarhere radarhere added the TIFF label Nov 17, 2021
@radarhere
Copy link
Member

radarhere commented Nov 17, 2021

When I use gdalinfo, the results are identical on the input and the output images.

What Pillow version are you using?

@XrioBtw
Copy link
Author

XrioBtw commented Nov 18, 2021

I am using version 8.4.0:

>>> import PIL
>>> PIL.__version__
'8.4.0'

I can see that there is quite a large size difference between my input dem and output dem, so something is happening (66975 byte vs 7210 byte). It just seems a bit weird to me that the output has not lost any tags and gained three new ones, but is way smaller in size. Here is the produced output from the earlier code:
dem_output.zip

I am aware that gdal is better suited for GeoTIFF, but after some testing of different rasters I was surprised to see that I was able to retain almost all TIFF tags to preserve a working GeoTIFF raster after using Pillow. For me personally it would be really convenient for my preferred workflow if all TIFF tags were correctly retained, but I guess that is far from the focus of Pillow?

@radarhere
Copy link
Member

Part of the problem is that StripOffsets, RowsPerStrip and StripByteCounts are not data that just happens to also be in the same file as the image - they are instructions for how to read the image.

https://www.awaresystems.be/imaging/tiff/tifftags/rowsperstrip.html

TIFF image data can be organized into strips for faster random access and efficient I/O buffering.

Pillow would be using strips with the intention to create a better image.

You might suggest that Pillow shouldn't write images in strips if the supplied tags don't mention it. But what if a user had manually created a list of a few TIFF tags they wanted saved, and hadn't even thought about strips? We would then be punishing them by saving a less-optimal image.

@kmilos
Copy link
Contributor

kmilos commented Nov 18, 2021

These three tags are actually mandatory for strip-based TIFF-compliant images, even if you have just the one strip.

Your original image is tile-based, and has these mandatory tags instead:

TileWidth                       : 128
TileLength                      : 128
TileOffsets                     : 674
TileByteCounts                  : 65536

Normally TIFF-compliant readers should be able to deal with both tile-based and strip-based files.

One other thing: your code keeps the original tile-based tags, which should be removed when writing as strip-based, try that. The way it is, you end up with an "invalid" TIFF file.

@XrioBtw
Copy link
Author

XrioBtw commented Nov 18, 2021

Pillow should set ROWSPERSTRIP to something more reasonable than im.size[1] or allow to specify the value. Photoshop fails when ROWSPERSTRIP is too large.

#2866 (comment)

Maybe this is of relevance?

@kmilos
Copy link
Contributor

kmilos commented Nov 18, 2021

Maybe this is of relevance?

Nope, see above.

As for the size difference: tile-based will always write full tiles (128x128 in this case) even for your 40x40 image, padded with 0s. No padding is necessary for strip-based writing. 128x128 / (40x40) = 10.24 factor difference you're seeing.

@kmilos
Copy link
Contributor

kmilos commented Nov 18, 2021

Pillow should probably include a mechanism for automatic removal of tile-based tags if writing as strips.

@radarhere
Copy link
Member

I've created PR #5839 to do that, resolving this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants