Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterate over blocks #6

Closed
AsgerPetersen opened this issue Nov 25, 2013 · 10 comments
Closed

Iterate over blocks #6

AsgerPetersen opened this issue Nov 25, 2013 · 10 comments

Comments

@AsgerPetersen
Copy link
Contributor

It would be a very cool feature if rasterio could transparently handle iteration over blocks. Almost all my project involves raster files which are too large to load into memory.

It could maybe be something like:

with rasterio.open('input.tif') as src:
  kwargs = src.meta
  with rasterio.open('output.tif', 'w', **kwargs) as dst:
    # Iterate over raster blocks from band 1
    for block, data in src.band_blocks(1):
      # Block holds xoffset, yoffset, width, height
      data = data + 1
      dst.write_band(1, data, block)
@sgillies
Copy link
Member

Using ReadBlock and WriteBlock as in http://www.gdal.org/classGDALRasterBand.html#a09e1d83971ddff0b43deffd54ef25eef?

A major concern for me is how RasterIO and Block access can't be mixed, explained in http://osgeo-org.1560.x6.nabble.com/gdal-dev-Best-practices-for-concurrent-writes-with-GDAL-was-RasterIO-in-paralel-td3746104.html#a3746106, and how to protect programmers from mixing them.

@AsgerPetersen
Copy link
Contributor Author

I was thinking more along the lines of using windowed read/write and just have a convenient way to align the processing chunks with the block size of the source raster.

This is not as fast as using block indexes directly, but it is significantly faster than making badly aligned reads, and it should prevent mistakes as you mention.

Moreover this approach will always work regardless of differences in source and destination layout (for instance going from a tiled source to a scanline destination). At worst it will perform more or less as bad as choosing you own chunks.

@AsgerPetersen
Copy link
Contributor Author

Basically I want to be able to process my rasters in chunks which should match the block layout from the source raster. Preferably without having to know too much about the "block" concept from GDAL.

@sgillies
Copy link
Member

Don't take this the wrong way, anyone, but I'm going to prune away off-topic or tangential comments in this repo.

@sgillies
Copy link
Member

Thanks, @AsgerPetersen, I think I get it.

In your example above, I see block reads and a windowed write. Is that deliberate? It could be a good usage pattern: block writing is tricky and as you said, requires understanding of GDAL internals, while block reading is more forgiving, just iterate over blocks as the driver gives them to you.

@AsgerPetersen
Copy link
Contributor Author

Yes, I think windowed write is the way to go. I am a little unsure whether it could maybe be a good idea to read the data out using windowed reads also.

With something like these methods (completely untested) exposed on RasterReader:

def block_shapes(self):
        """Returns an ordered list of block shapes for all bands"""
        cdef void *hband = NULL
        cdef int *xsize, *ysize
        if not self._block_shapes:
            if not self._hds:
                raise ValueError("Can't read closed raster file")
            for i in range(self._count):
                hband = _gdal.GDALGetRasterBand(self._hds, i+1)
                _gdal.GDALGetBlockSize(hband, xsize, ysize)
                self._block_shapes.append( (xsize, ysize) )
        return self._block_shapes

def get_chunks(self, bandix, chunk_shape = None):
        block_shape = chunk_shape or self.block_shapes[bandix - 1]
        cols = int(ceil(float(self.width) / block_shape[0]))
        rows = int(ceil(float(self.height / block_shape[1]))
        for c in xrange(cols):
            yoffset = c * block_shape[0]
            height = min(block_shape[0], self.height - yoffset)
            r in xrange(rows):
                xoffset = r * block_shape[1]
                width = min(block_shape[1], self.width - xoffset)
                yield (xoffset, yoffset, width, height)

Which would then be used like this

with rasterio.open('source.tif') as src:
    kwargs = src.meta
    with rasterio.open('destination.tif', 'w', **kwargs) as dst:
        for window in src.get_chunks( 1 ):
            data = src.read_band( 1, window)
            dst.write_band( 1, data, window)

If one want larger to process 2x2 blocks at a time:

with rasterio.open('source.tif') as src:
    kwargs = src.meta
    with rasterio.open('destination.tif', 'w', **kwargs) as dst:
        chunksize = map(lambda x: x*2, src.block_shapes[bandix - 1])
        for window in src.get_chunks( 1, chunksize ):
            data = src.read_band( 1, window)
            dst.write_band(1, data, window)

@sgillies
Copy link
Member

sgillies commented Dec 9, 2013

That interface looks almost exactly right to me. For a start, let's just have the native blocks as (offsetx, offsety, width, height) tuples from a src.blocks iterator. I'd rather we aggregate them (for the 2x2 case) using an itertools-ish grouping approach.

for double_blocks in double_the_blocks(src.blocks(1)):
    ...

But if this won't work a chunksize kwarg would be okay. How about window as a kwarg?

    for block in src.blocks(1):
        data = src.read_band(1, window=block)
        dst.write_band(1, data, window=block)

sgillies added a commit that referenced this issue Dec 10, 2013
sgillies added a commit that referenced this issue Dec 10, 2013
@AsgerPetersen
Copy link
Contributor Author

I think your approach with the "itertool-ish grouping" gives a cleaner and more understandable interface.

The block concept is difficult for people who are not raster aficionados, and having a "chunksize" param on a "blocks" method will only make things more confusing.

@sgillies
Copy link
Member

Great. While we're discussing blocks, I've got a half-formed idea that a function returning the block window for a given pixel coordinate would be helpful. Yes? No?

sgillies pushed a commit that referenced this issue Dec 10, 2013
This is done by extending the existing read_band() and write_band()
methods. The windows that can be read and written most efficiently
are the ones that come from the new block_windows property.

Closes #6.
sgillies pushed a commit that referenced this issue Dec 10, 2013
This is done by extending the existing read_band() and write_band()
methods. The windows that can be read and written most efficiently
are the 7nes that come from the new block_windows property.

Closes #6.
@sgillies
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants