Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDAL GetMetadataItem('BLOCK_SIZE_%d_%d' % (c, r), 'TIFF') #1077

Closed
ruitaodong opened this issue Jun 20, 2017 · 14 comments
Closed

GDAL GetMetadataItem('BLOCK_SIZE_%d_%d' % (c, r), 'TIFF') #1077

ruitaodong opened this issue Jun 20, 2017 · 14 comments

Comments

@ruitaodong
Copy link

ruitaodong commented Jun 20, 2017

I have been using GDAL/QGIS (carry over from my remote sensing days) for deep learning in digital pathology. We primarily deal with JPEG tiles in TIFF pyramid images. I have since found rasterio which is so much more convenient. However, one feature I need is to access jpeg length of each block (tiffinfo -s), which is a good indicator of empty or nearly empty space. In gdal, it is

GetRasterBand(1).GetMetadataItem('BLOCK_SIZE_%d_%d' % (c, r), 'TIFF')

However, it doesn't show up in tags(), tags(1), or tags(ns='TIFF')

I'm using version 0.31.0

@sgillies
Copy link
Member

@ruitaodong Could you try tags(1, ns='TIFF')? If that doesn't work could you link me to a small JPEG-in-TIFF image I can use to debug?

There have been some changes to Rasterio's tags method since 0.31 (we're at 1.0a9 now, getting close to 1.0) and an upgrade is also worth trying.

@ruitaodong
Copy link
Author

Sorry that I closed it by mistake
1M01.zip

rasterio.open('1M01.tif').tags(1, 'TIFF')
{}

What's interesting is that with goal

from osgeo import gdal
slide = gdal.Open('1M01.tif')
slide.GetRasterBand(1).GetMetadataItem('BLOCK_SIZE_%d_%d' % (0, 0), 'TIFF')
#returns '4135', but
slide.GetRasterBand(1).GetMetadata_Dict('TIFF')
#returns {}

@sgillies
Copy link
Member

@ruitaodong that is interesting. Rasterio is using the same GDAL function as GetMetadata_Dict(). This may be a GDAL bug.

@sgillies
Copy link
Member

Upstream bug report: https://trac.osgeo.org/gdal/ticket/6961. As a workaround, we could call GetMetadataItem() to get TIFF metadata when ns='TIFF'.

@ruitaodong
Copy link
Author

@sgillies, good catch. Thanks for looking into it and thanks for rasterio.

@sgillies
Copy link
Member

@ruitaodong Even Rouault pointed out in the GDAL tracker that dumping all the block sizes (possibly hundreds or thousands) into a dict might not be user friendly. I agree with that. I feel like we should be able do better than this

slide.GetRasterBand(1).GetMetadataItem('BLOCK_SIZE_%d_%d' % (0, 0), 'TIFF')

Perhaps a new dataset method like

def block(self, bidx, i, j)

could return something like

{'window': Window(...), 'size': 8*256*256}

Such a method would also serve users who want to get the window (indexes) of a single block.

@sgillies sgillies added this to To Do in Rasterio 1.0.0 alpha Jul 12, 2017
@sgillies sgillies added this to the 1.0a10 milestone Jul 12, 2017
@ruitaodong
Copy link
Author

I think that makes sense. I know that my use case is very specific. However, it is a little weird as BLOCK_SIZE is not actually per band. It is compressed jpeg size (minus huffman table size) for the whole tile (in ycrcb in my case).

@rouault
Copy link
Contributor

rouault commented Jul 13, 2017

BLOCK_SIZE is in the general case a per band property. In fact it depends on the TIFF PlanarConfig tag value (in GDAL INTERLEAVE metadata item). If PlanarConfig=Separate (INTERLEAVE=BAND), then you have indeed one block for each band. If PlanarConfig=Contig (INTERLEAVE=PIXEL), then the block is shared by all bands. So in the general case, it is safer to ask at the band level.

sgillies pushed a commit that referenced this issue Jul 13, 2017
block() return the window and size in bytes of a TIFF block. The
size is undefined for other formats and will be None.

Resolves #1077
@sgillies
Copy link
Member

@ruitaodong I've got a try at the new feature in #1085 if you want to take a look.

@sgillies
Copy link
Member

Hmm. This code is crashing the Travis CI servers for GDAL versions < 2. I must have overlooked a version requirement.

@rouault
Copy link
Contributor

rouault commented Jul 13, 2017

This is definitely a GDAL 2.0 new feature. But in any case you should be ready to raccept NULL as return value. That might potentially happen on corrupted TIFFs or if wrong indices are provided

@ruitaodong
Copy link
Author

I tried both fed3bca & master, (Ubuntu 16.04 with gdal 2.1.0), but I got

import rasterio
File "/home/rdong/.local/lib/python2.7/site-packages/rasterio-1.0a10-py2.7-linux-x86_64.egg/rasterio/init.py", line 16, in
from rasterio._base import gdal_version
File "rasterio/_base.pyx", line 31, in init rasterio._base (rasterio/_base.c:22443)
File "/home/rdong/.local/lib/python2.7/site-packages/rasterio-1.0a10-py2.7-linux-x86_64.egg/rasterio/windows.py", line 410, in
@attr.s(slots=True)
TypeError: attributes() got an unexpected keyword argument 'slots'

@sgillies
Copy link
Member

sgillies commented Jul 14, 2017

@ruitaodong I suspect that you need to upgrade attrs. I've modified my branch to require attrs >= 16.0.0, the versions where slots were introduced. I've also guarded against NULL results and am pleased with how this feature turned out. Look out for my switch to a named tuple (with window and size attributes) for the return value of block().

@sgillies sgillies moved this from To Do to Doing in Rasterio 1.0.0 alpha Jul 14, 2017
@ruitaodong
Copy link
Author

@sgillies I finally tested fed3bca (in a docker) and block worked as expect. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

3 participants