Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add groupby arg to gen_zonal_stats #134

Closed

Conversation

brendancol
Copy link

@brendancol brendancol commented Oct 10, 2016

This PR adds a groupby argument to gen_zonal_stats inspired by the ArcGIS gp.ZonalStatistics zone_field argument. The effect is that stats are computed by group, instead of by each individual feature.

The groupby arg takes a string or a function. If the value is a string, it is assumed to be a field present in each feature properties. If the value is a function, the function's return value will be the groupby key.

Caveats:

  • The major caveat of this feature is that groups containing overlapping geometry will yield unexpected results as all features are rasterized together...not individually. There are currently no warnings given for this use-case.
  • zone_arrays will be larger as the unioned bounds of the grouped features will make potentially large intermediate rasters
  • All zones will need to be able to fit into memory

Also included in this PR is an example shapefile for use with testing.

@coveralls
Copy link

coveralls commented Oct 10, 2016

Coverage Status

Coverage decreased (-0.4%) to 97.373% when pulling bf44d66 on brendancol:brendancol/add-zone-groupby into 6bdb524 on perrygeo:master.

@coveralls
Copy link

coveralls commented Oct 10, 2016

Coverage Status

Coverage decreased (-0.03%) to 97.732% when pulling d48bd3e on brendancol:brendancol/add-zone-groupby into 6bdb524 on perrygeo:master.

@coveralls
Copy link

coveralls commented Oct 10, 2016

Coverage Status

Coverage increased (+0.2%) to 97.921% when pulling c190e63 on brendancol:brendancol/add-zone-groupby into 6bdb524 on perrygeo:master.

@coveralls
Copy link

coveralls commented Oct 10, 2016

Coverage Status

Coverage increased (+0.2%) to 97.921% when pulling acb2871 on brendancol:brendancol/add-zone-groupby into 6bdb524 on perrygeo:master.

@brendancol
Copy link
Author

@perrygeo hey any thoughts on this one?

@sgoodm
Copy link
Contributor

sgoodm commented Oct 18, 2016

I think this might be better left for the user to handle before running rasterstats. maybe add a demo / sample code to the wiki that covers a preprocessing step with similar functionality?

@brendancol
Copy link
Author

brendancol commented Oct 19, 2016

@sgoodm I like this feature because it allows for data exploration without duplicating source data on disk or opening a desktop GIS package.

Consider how useful this groupby would be when computing stats for US Census Blocks. With just one set of zones, you could aggregate a raster to national, regional, state, county, group, or block level without having to duplicate the source data.

@perrygeo
Copy link
Owner

perrygeo commented Oct 19, 2016

@brendancol Sorry for the delay on my end, haven't had the time to dig into code outside of work lately!

I think this functionality is useful no doubt. I've done a similar type of analysis but it seems like something that could be decoupled from zonal_stats. What do you think about refactoring this into a separate function?

There are big benefits to decoupling any feature/geometry manipulation from zonal stats:

  • There are sure to be application-specific concerns with exactly how the grouping is accomplished (property names, dissolving geometries, etc). I don't want to make those decisions for the user.
  • Putting this processing step inside of rasterstats adds complexity to the core library and changes the semantics of several internal variables.
  • Grouping features by property is a really useful feature by itself and shouldn't be locked away inside rasterstats.

More concretely, I'd be more in favor of something like

zonal_stats(group_by_property(features, 'property'), raster)

where group_by_property would yield valid geojson features with geometry collections (or dissolved multigeometries) and aggregated properties. The foundations of such a function are already here, we just need to detangle it. No changes to zonal_stats should be required in that case plus you could use the grouping functionality everywhere else.

@brendancol
Copy link
Author

brendancol commented Oct 19, 2016

@perrygeo @sgoodm This sounds good. I think it would be good to close this PR and I can submit a follow up PR with group_by_property. Does rasterstats/utils.py sounds like an appropriate place for that? Or maybe a new module call rasterstats/reductions.py?

@perrygeo
Copy link
Owner

@brendancol The more I think about it, it could easily be a separate project when you consider some of the complexities that arise with aggregate functions (http://postgis.net/docs/manual-1.4/ch08.html#PostGIS_Aggregate_Functions). But I'm happy to start it out in this repo and spin it off later if we need to. How about rasterstats/aggregation.py to use the SQL terminology.

@brendancol
Copy link
Author

@perrygeo @sgoodm sounds good to me. I'm going to close this PR and submit a new one with rasterstats/aggregations.py

@brendancol brendancol closed this Oct 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants