Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support other types of summaries #714

Closed
krlmlr opened this issue Apr 19, 2018 · 11 comments
Closed

Support other types of summaries #714

krlmlr opened this issue Apr 19, 2018 · 11 comments

Comments

@krlmlr
Copy link
Contributor

krlmlr commented Apr 19, 2018

Are union and combine the only types of summary ever required? How about e.g. pairwise intersections, stored in a multipolygon? Or combinations of several single polygons into a multipolygon? Perhaps the do_union argument to summarise.sf() should also accept a summary function? Alternatively, how about permitting custom summaries: if the result of summarise.data.frame() already contains an sfc, don't summarize the geometry column.

Related to this, I noticed that st_as_sfc.list() doesn't work for lists of "sfc" objects, which might occur when doing a manual summary. (It returns NULL invisibly, I believe this is a mistake.) I haven't found a straightforward way to convert a list of "sfc" to a single "sfc" other than . %>% purrr::map(1) %>% st_as_sfc(); this loses the geometry metadata, e.g. the coordinate system.

@edzer
Copy link
Member

edzer commented Apr 20, 2018

We union or combine because that gives the geometry for which summarise acted over corresponding attributes (non-geometry feature properties). Could you give a concrete use case for which one would want something else?

I committed a change that accepts lists of sfg objects (geometries), like an sfc that lost its class. You can combine a list with sfc objects with do.call(c, lst):

> do.call(c, list(st_sfc(st_point(0:1), st_point(1:2), crs=4326), st_sfc(st_point(4:5))))
Geometry set for 3 features 
geometry type:  POINT
dimension:      XY
bbox:           xmin: 0 ymin: 1 xmax: 4 ymax: 5
epsg (SRID):    4326
proj4string:    +proj=longlat +datum=WGS84 +no_defs
POINT (0 1)
POINT (1 2)
POINT (4 5)

@edzer edzer closed this as completed May 7, 2018
@krlmlr
Copy link
Contributor Author

krlmlr commented May 7, 2018

Perhaps area covered by more than one geometry might be useful when attempting to repair slivers. Or convex hull/bounding box, without materializing the union first? I agree that union/combine are by far the most useful operations, but maybe not the only ones?

@edzer
Copy link
Member

edzer commented May 7, 2018

As an alternative to do_union = TRUE we could use an argument that takes a function, like union = st_union that returns a geometry, both to sf::summarise.sf and sf::aggregate.sf.

@krlmlr
Copy link
Contributor Author

krlmlr commented May 7, 2018

What if we looked at the results of the summary and held off doing union/combine if the result already contains a geometry?

@krlmlr
Copy link
Contributor Author

krlmlr commented May 7, 2018

Explicitly creating a geometry column in summarise() would mean we don't do anything ourselves.

@edzer
Copy link
Member

edzer commented May 7, 2018

Who is "we"? sf:::summarise.sf calls NextMethod(), which is a call to the dplyr method, and that returns an object without geometry afaics.

@krlmlr
Copy link
Contributor Author

krlmlr commented May 7, 2018

I'm thinking about supporting the case where the user manually creates a geometry column. That would be returned by dplyr:::summarise.tbl_df() and processed later by sf.

@edzer
Copy link
Member

edzer commented May 7, 2018

Ah, yes, that makes sense. Could you provide an example how this can be done in dplyr::summarise?

@krlmlr
Copy link
Contributor Author

krlmlr commented May 7, 2018

Not sure if this is what you're asking about, but I would expect the output of the two examples to be identical. I think it's more about that summarise.sf() detects if the result of NextMethod() already has a geometry column.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(sf)
#> Linking to GEOS 3.5.1, GDAL 2.2.1, proj.4 4.9.3

df <- tibble(a = 1:3, pt = st_sfc(st_point(1:2)))
df %>%
  summarize(a = mean(a), ptx = st_union(pt)) %>%
  st_sf()
#> Simple feature collection with 1 feature and 1 field
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 1 ymin: 2 xmax: 1 ymax: 2
#> epsg (SRID):    NA
#> proj4string:    NA
#> # A tibble: 1 x 2
#>       a     ptx
#>   <dbl> <POINT>
#> 1     2   (1 2)
df %>%
  st_sf() %>%
  summarize(a = mean(a), ptx = st_union(pt))
#> Simple feature collection with 1 feature and 1 field
#> Active geometry column: pt
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 1 ymin: 2 xmax: 1 ymax: 2
#> epsg (SRID):    NA
#> proj4string:    NA
#> # A tibble: 1 x 3
#>       a     ptx      pt
#>   <dbl> <POINT> <POINT>
#> 1     2   (1 2)   (1 2)

Created on 2018-05-07 by the reprex package (v0.2.0).

edzer added a commit that referenced this issue May 7, 2018
@edzer
Copy link
Member

edzer commented May 7, 2018

Like that?

@krlmlr
Copy link
Contributor Author

krlmlr commented May 7, 2018

Looks good, thanks for supporting arbitrary aggregations!

edzer added a commit that referenced this issue May 8, 2018
edzer added a commit that referenced this issue May 8, 2018
edzer added a commit that referenced this issue May 9, 2018
* use dplyr::group_indices() instead of undocumented indices attributes
* emit warnings when agr is not constant for st_centroid.sf and st_point_on_surface.sf
* select.sf inherits objects agr value(s)
* update tests and test outputs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants