You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An improvement in space efficiency could be achieved by a size reduction of Landscape Arrays with sparse arrays (compression) or references instead of copies (duplication avoidance).
Reason
Improving the space efficiency of Landscape Arrays would greatly improve the usability of the library for zonal analyses of large datasets with many regions.
e.g.: I am using pylandstats on a large dataset with about 1100 regions and a raster of size 2600x2400. Zonal Analysis creates a copy of the raster for each region. This leads to high memory consumption (about 12 GB in my case -> but only with some manual improvements, like choosing the smallest possible dtype).
Possible improvements
Using Zonal Analysis with a large set of regions means that each landscape has only a small fraction of non-null values. It should be possible to implement the landscape arrays as sparse arrays with the python library sparse without many adjustments to the rest of the code. I haven't tested it yet though and would appreciate an assessment on the feasibility and possible problems with other pieces of the code.
Would it be possible to not copy the landscape for each region and instead use a reference to the original landscape together with the mask for the region? On request the array for the region could be computed on-the-fly with the mask and the original landscape. It could be discarded after the computation.
Let me know what you think about the suggestion, I might be able to submit a pull request if this change seems reasonable.
The text was updated successfully, but these errors were encountered:
sorry for the huge delay in my response, I have not had time to work on pylandstats for a while. Thank you for sharing your ideas. This was indeed a conception error on my end which made using pylandstats in zonal analysis with large number of zones practically impossible.
This should be fixed in v3.0.0rc0, where zones are defined by vector geometries (a geoseries) and the zone landscapes are instantiated for the zone bounds only (rasterio mask.mask with crop=True). The change is implemented in the ZonalAnalysis class but its children classes (BufferAnalysis, ZonalGridAnalysis and even its spatiotemporal implementations) also operate accordingly.
Description
Idea
An improvement in space efficiency could be achieved by a size reduction of Landscape Arrays with sparse arrays (compression) or references instead of copies (duplication avoidance).
Reason
Improving the space efficiency of Landscape Arrays would greatly improve the usability of the library for zonal analyses of large datasets with many regions.
e.g.: I am using pylandstats on a large dataset with about 1100 regions and a raster of size 2600x2400. Zonal Analysis creates a copy of the raster for each region. This leads to high memory consumption (about 12 GB in my case -> but only with some manual improvements, like choosing the smallest possible dtype).
Possible improvements
Let me know what you think about the suggestion, I might be able to submit a pull request if this change seems reasonable.
The text was updated successfully, but these errors were encountered: