Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of rio-merge #507

Open
sgillies opened this issue Oct 26, 2015 · 4 comments
Open

Improve performance of rio-merge #507

sgillies opened this issue Oct 26, 2015 · 4 comments

Comments

@sgillies
Copy link
Member

As it says at https://github.com/mapbox/rasterio/blob/master/rasterio/tools/merge.py#L123-L125, the current approach uses the maximum amount of memory to solve the problem. We could trade more I/O for reduced memory by operating on windows of the dataset.

@sgillies sgillies added this to the pre 1.0 milestone Oct 26, 2015
@perrygeo
Copy link
Contributor

There's always going to be a tradeoffs between the in-memory and on-disk approaches. It would be great to make this configurable so the user could make the decision based on their knowledge of the problem (available memory, file formats, disk speed, etc). As a documentation task, we could develop a narrative explaining when each would be appropriate.

The other issue is that, even if we window the src reads, we'd still have to store dest in memory which by definition will have the biggest memory footprint. We could modify the function to write to a destination file on every iteration and not return the array. This would change the function signature a bit; maybe we need a separate function merge_disk?

@perrygeo perrygeo modified the milestones: pre 1.0, post 1.0 Mar 7, 2016
@perrygeo
Copy link
Contributor

perrygeo commented Mar 8, 2016

Here's an implementation of rio-merge that implements the windowed-disk approach: https://github.com/mapbox/rio-merge-rgba

It's optimized for internally tiled RGBA rasters. The README explains how this differs from the current rio merge, why it was implemented as a separate plugin and the challenges to applying the same approach to rio merge.

TLDR; the use of masked reads (read(..., masked=True)) is not IO efficient and since, IO is the limiting factor in the performance of this windowed-disk approach, we can instead rely on the alpha band as the source of nodata for a huge performance boost. Of course that makes it less general and somewhat distinct from rio-merge.

@wckoeppen
Copy link

wckoeppen commented May 3, 2019

Just curious, any movement on the windowed reads into merge from https://github.com/mapbox/rasterio/blob/c2df12979a5e07f96f108b0be8329e79fe950532/rasterio/merge.py#L142-L146 ?

We're currently attempting to use rasterio.merge within dask, but the input files are compressed tiffs that can reach 7 Gb in memory (each), whereas our output window is relatively small.

@sgillies
Copy link
Member Author

sgillies commented May 3, 2019

@wckoeppen no movement. I would have closed this issue if it were resolved or you would see other issues pointing here if there was any discussion.

@snowman2 snowman2 modified the milestones: post 1.0, post-1.3 Nov 18, 2022
@sgillies sgillies removed this from the post-1.3 milestone Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants