Iterate over blocks #6

AsgerPetersen · 2013-11-25T17:21:30Z

It would be a very cool feature if rasterio could transparently handle iteration over blocks. Almost all my project involves raster files which are too large to load into memory.

It could maybe be something like:

with rasterio.open('input.tif') as src:
  kwargs = src.meta
  with rasterio.open('output.tif', 'w', **kwargs) as dst:
    # Iterate over raster blocks from band 1
    for block, data in src.band_blocks(1):
      # Block holds xoffset, yoffset, width, height
      data = data + 1
      dst.write_band(1, data, block)

sgillies · 2013-11-26T03:36:59Z

Using ReadBlock and WriteBlock as in http://www.gdal.org/classGDALRasterBand.html#a09e1d83971ddff0b43deffd54ef25eef?

A major concern for me is how RasterIO and Block access can't be mixed, explained in http://osgeo-org.1560.x6.nabble.com/gdal-dev-Best-practices-for-concurrent-writes-with-GDAL-was-RasterIO-in-paralel-td3746104.html#a3746106, and how to protect programmers from mixing them.

AsgerPetersen · 2013-11-26T07:59:40Z

I was thinking more along the lines of using windowed read/write and just have a convenient way to align the processing chunks with the block size of the source raster.

This is not as fast as using block indexes directly, but it is significantly faster than making badly aligned reads, and it should prevent mistakes as you mention.

Moreover this approach will always work regardless of differences in source and destination layout (for instance going from a tiled source to a scanline destination). At worst it will perform more or less as bad as choosing you own chunks.

AsgerPetersen · 2013-11-26T09:25:36Z

Basically I want to be able to process my rasters in chunks which should match the block layout from the source raster. Preferably without having to know too much about the "block" concept from GDAL.

sgillies · 2013-11-26T15:13:04Z

Don't take this the wrong way, anyone, but I'm going to prune away off-topic or tangential comments in this repo.

sgillies · 2013-11-26T15:35:38Z

Thanks, @AsgerPetersen, I think I get it.

In your example above, I see block reads and a windowed write. Is that deliberate? It could be a good usage pattern: block writing is tricky and as you said, requires understanding of GDAL internals, while block reading is more forgiving, just iterate over blocks as the driver gives them to you.

AsgerPetersen · 2013-11-27T20:14:55Z

Yes, I think windowed write is the way to go. I am a little unsure whether it could maybe be a good idea to read the data out using windowed reads also.

With something like these methods (completely untested) exposed on RasterReader:

def block_shapes(self):
        """Returns an ordered list of block shapes for all bands"""
        cdef void *hband = NULL
        cdef int *xsize, *ysize
        if not self._block_shapes:
            if not self._hds:
                raise ValueError("Can't read closed raster file")
            for i in range(self._count):
                hband = _gdal.GDALGetRasterBand(self._hds, i+1)
                _gdal.GDALGetBlockSize(hband, xsize, ysize)
                self._block_shapes.append( (xsize, ysize) )
        return self._block_shapes

def get_chunks(self, bandix, chunk_shape = None):
        block_shape = chunk_shape or self.block_shapes[bandix - 1]
        cols = int(ceil(float(self.width) / block_shape[0]))
        rows = int(ceil(float(self.height / block_shape[1]))
        for c in xrange(cols):
            yoffset = c * block_shape[0]
            height = min(block_shape[0], self.height - yoffset)
            r in xrange(rows):
                xoffset = r * block_shape[1]
                width = min(block_shape[1], self.width - xoffset)
                yield (xoffset, yoffset, width, height)

Which would then be used like this

with rasterio.open('source.tif') as src:
    kwargs = src.meta
    with rasterio.open('destination.tif', 'w', **kwargs) as dst:
        for window in src.get_chunks( 1 ):
            data = src.read_band( 1, window)
            dst.write_band( 1, data, window)

If one want larger to process 2x2 blocks at a time:

with rasterio.open('source.tif') as src:
    kwargs = src.meta
    with rasterio.open('destination.tif', 'w', **kwargs) as dst:
        chunksize = map(lambda x: x*2, src.block_shapes[bandix - 1])
        for window in src.get_chunks( 1, chunksize ):
            data = src.read_band( 1, window)
            dst.write_band(1, data, window)

sgillies · 2013-12-09T23:22:08Z

That interface looks almost exactly right to me. For a start, let's just have the native blocks as (offsetx, offsety, width, height) tuples from a src.blocks iterator. I'd rather we aggregate them (for the 2x2 case) using an itertools-ish grouping approach.

for double_blocks in double_the_blocks(src.blocks(1)):
    ...

But if this won't work a chunksize kwarg would be okay. How about window as a kwarg?

    for block in src.blocks(1):
        data = src.read_band(1, window=block)
        dst.write_band(1, data, window=block)

See #6.

AsgerPetersen · 2013-12-10T08:25:56Z

I think your approach with the "itertool-ish grouping" gives a cleaner and more understandable interface.

The block concept is difficult for people who are not raster aficionados, and having a "chunksize" param on a "blocks" method will only make things more confusing.

sgillies · 2013-12-10T17:04:11Z

Great. While we're discussing blocks, I've got a half-formed idea that a function returning the block window for a given pixel coordinate would be helpful. Yes? No?

This is done by extending the existing read_band() and write_band() methods. The windows that can be read and written most efficiently are the ones that come from the new block_windows property. Closes #6.

This is done by extending the existing read_band() and write_band() methods. The windows that can be read and written most efficiently are the 7nes that come from the new block_windows property. Closes #6.

sgillies · 2014-02-14T03:05:13Z

@AsgerPetersen Here's a happy customer https://twitter.com/bryanluman/status/434056951639470080 and his code: https://gist.github.com/bryanluman/8983536.

AsgerPetersen mentioned this issue Nov 25, 2013

Allow windowed reads/writes #7

Closed

sgillies added a commit that referenced this issue Dec 10, 2013

Test of access to a dataset's blocks.

b1149ce

See #6.

sgillies added a commit that referenced this issue Dec 10, 2013

Block shapes and a block window iterator.

7239f9e

See #6.

sgillies closed this as completed in fb8af33 Dec 10, 2013

RutgerK mentioned this issue Dec 13, 2013

Directly support multiple inputs and outputs #14

Closed

28raining mentioned this issue Mar 31, 2023

Using rasterio in Docker on a mac fails with "No such file or directory: 'gdal-config' #2801

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iterate over blocks #6

Iterate over blocks #6

AsgerPetersen commented Nov 25, 2013

sgillies commented Nov 26, 2013

AsgerPetersen commented Nov 26, 2013

AsgerPetersen commented Nov 26, 2013

sgillies commented Nov 26, 2013

sgillies commented Nov 26, 2013

AsgerPetersen commented Nov 27, 2013

sgillies commented Dec 9, 2013

AsgerPetersen commented Dec 10, 2013

sgillies commented Dec 10, 2013

sgillies commented Feb 14, 2014

Iterate over blocks #6

Iterate over blocks #6

Comments

AsgerPetersen commented Nov 25, 2013

sgillies commented Nov 26, 2013

AsgerPetersen commented Nov 26, 2013

AsgerPetersen commented Nov 26, 2013

sgillies commented Nov 26, 2013

sgillies commented Nov 26, 2013

AsgerPetersen commented Nov 27, 2013

sgillies commented Dec 9, 2013

AsgerPetersen commented Dec 10, 2013

sgillies commented Dec 10, 2013

sgillies commented Feb 14, 2014