Improving Space Efficiency for Zonal Analysis #17

ffrosch · 2021-08-12T12:18:03Z

Description

Idea

An improvement in space efficiency could be achieved by a size reduction of Landscape Arrays with sparse arrays (compression) or references instead of copies (duplication avoidance).

Reason

Improving the space efficiency of Landscape Arrays would greatly improve the usability of the library for zonal analyses of large datasets with many regions.

e.g.: I am using pylandstats on a large dataset with about 1100 regions and a raster of size 2600x2400. Zonal Analysis creates a copy of the raster for each region. This leads to high memory consumption (about 12 GB in my case -> but only with some manual improvements, like choosing the smallest possible dtype).

Possible improvements

Using Zonal Analysis with a large set of regions means that each landscape has only a small fraction of non-null values. It should be possible to implement the landscape arrays as sparse arrays with the python library sparse without many adjustments to the rest of the code. I haven't tested it yet though and would appreciate an assessment on the feasibility and possible problems with other pieces of the code.
Would it be possible to not copy the landscape for each region and instead use a reference to the original landscape together with the mask for the region? On request the array for the region could be computed on-the-fly with the mask and the original landscape. It could be discarded after the computation.

Let me know what you think about the suggestion, I might be able to submit a pull request if this change seems reasonable.

martibosch · 2023-09-12T12:12:26Z

Hello @ffrosch,

sorry for the huge delay in my response, I have not had time to work on pylandstats for a while. Thank you for sharing your ideas. This was indeed a conception error on my end which made using pylandstats in zonal analysis with large number of zones practically impossible.

This should be fixed in v3.0.0rc0, where zones are defined by vector geometries (a geoseries) and the zone landscapes are instantiated for the zone bounds only (rasterio mask.mask with crop=True). The change is implemented in the ZonalAnalysis class but its children classes (BufferAnalysis, ZonalGridAnalysis and even its spatiotemporal implementations) also operate accordingly.

You can install v3.0.0rc0 from pypi or conda forge and try if it now works for your use case. Note that the API has changed a little bit: the new signature takes a positional zones argument right after the landscape file path (see https://pylandstats.readthedocs.io/en/latest/zonal.html). You can see an overview in the example notebook at https://github.com/martibosch/pylandstats-notebooks/blob/main/notebooks/03-zonal-analysis.ipynb.

I hope this addresses your remarks and sorry again for my delay. Feel free to reopen if needed. Best,
Martí

ffrosch · 2024-01-16T11:33:17Z

Hello @martibosch,

thank you for coming back to this issue and taking care of it. That's awesome and should be a huge performance boost!

At the moment I don't have projects where I need the library, but I'm looking forward to trying it out in the future :-)

Best,
Florian

martibosch closed this as completed Sep 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving Space Efficiency for Zonal Analysis #17

Improving Space Efficiency for Zonal Analysis #17

ffrosch commented Aug 12, 2021

martibosch commented Sep 12, 2023

ffrosch commented Jan 16, 2024

Improving Space Efficiency for Zonal Analysis #17

Improving Space Efficiency for Zonal Analysis #17

Comments

ffrosch commented Aug 12, 2021

Description

Idea

Reason

Possible improvements

martibosch commented Sep 12, 2023

ffrosch commented Jan 16, 2024