New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iterate over blocks #6
Comments
Using ReadBlock and WriteBlock as in http://www.gdal.org/classGDALRasterBand.html#a09e1d83971ddff0b43deffd54ef25eef? A major concern for me is how RasterIO and Block access can't be mixed, explained in http://osgeo-org.1560.x6.nabble.com/gdal-dev-Best-practices-for-concurrent-writes-with-GDAL-was-RasterIO-in-paralel-td3746104.html#a3746106, and how to protect programmers from mixing them. |
I was thinking more along the lines of using windowed read/write and just have a convenient way to align the processing chunks with the block size of the source raster. This is not as fast as using block indexes directly, but it is significantly faster than making badly aligned reads, and it should prevent mistakes as you mention. Moreover this approach will always work regardless of differences in source and destination layout (for instance going from a tiled source to a scanline destination). At worst it will perform more or less as bad as choosing you own chunks. |
Basically I want to be able to process my rasters in chunks which should match the block layout from the source raster. Preferably without having to know too much about the "block" concept from GDAL. |
Don't take this the wrong way, anyone, but I'm going to prune away off-topic or tangential comments in this repo. |
Thanks, @AsgerPetersen, I think I get it. In your example above, I see block reads and a windowed write. Is that deliberate? It could be a good usage pattern: block writing is tricky and as you said, requires understanding of GDAL internals, while block reading is more forgiving, just iterate over blocks as the driver gives them to you. |
Yes, I think windowed write is the way to go. I am a little unsure whether it could maybe be a good idea to read the data out using windowed reads also. With something like these methods (completely untested) exposed on RasterReader: def block_shapes(self):
"""Returns an ordered list of block shapes for all bands"""
cdef void *hband = NULL
cdef int *xsize, *ysize
if not self._block_shapes:
if not self._hds:
raise ValueError("Can't read closed raster file")
for i in range(self._count):
hband = _gdal.GDALGetRasterBand(self._hds, i+1)
_gdal.GDALGetBlockSize(hband, xsize, ysize)
self._block_shapes.append( (xsize, ysize) )
return self._block_shapes
def get_chunks(self, bandix, chunk_shape = None):
block_shape = chunk_shape or self.block_shapes[bandix - 1]
cols = int(ceil(float(self.width) / block_shape[0]))
rows = int(ceil(float(self.height / block_shape[1]))
for c in xrange(cols):
yoffset = c * block_shape[0]
height = min(block_shape[0], self.height - yoffset)
r in xrange(rows):
xoffset = r * block_shape[1]
width = min(block_shape[1], self.width - xoffset)
yield (xoffset, yoffset, width, height) Which would then be used like this with rasterio.open('source.tif') as src:
kwargs = src.meta
with rasterio.open('destination.tif', 'w', **kwargs) as dst:
for window in src.get_chunks( 1 ):
data = src.read_band( 1, window)
dst.write_band( 1, data, window) If one want larger to process 2x2 blocks at a time: with rasterio.open('source.tif') as src:
kwargs = src.meta
with rasterio.open('destination.tif', 'w', **kwargs) as dst:
chunksize = map(lambda x: x*2, src.block_shapes[bandix - 1])
for window in src.get_chunks( 1, chunksize ):
data = src.read_band( 1, window)
dst.write_band(1, data, window) |
That interface looks almost exactly right to me. For a start, let's just have the native blocks as (offsetx, offsety, width, height) tuples from a src.blocks iterator. I'd rather we aggregate them (for the 2x2 case) using an itertools-ish grouping approach.
But if this won't work a chunksize kwarg would be okay. How about window as a kwarg?
|
I think your approach with the "itertool-ish grouping" gives a cleaner and more understandable interface. The block concept is difficult for people who are not raster aficionados, and having a "chunksize" param on a "blocks" method will only make things more confusing. |
Great. While we're discussing blocks, I've got a half-formed idea that a function returning the block window for a given pixel coordinate would be helpful. Yes? No? |
This is done by extending the existing read_band() and write_band() methods. The windows that can be read and written most efficiently are the ones that come from the new block_windows property. Closes #6.
This is done by extending the existing read_band() and write_band() methods. The windows that can be read and written most efficiently are the 7nes that come from the new block_windows property. Closes #6.
@AsgerPetersen Here's a happy customer https://twitter.com/bryanluman/status/434056951639470080 and his code: https://gist.github.com/bryanluman/8983536. |
It would be a very cool feature if rasterio could transparently handle iteration over blocks. Almost all my project involves raster files which are too large to load into memory.
It could maybe be something like:
The text was updated successfully, but these errors were encountered: