Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Article suggestion: demonstrate multi-step geometry manipulation #578

Closed
tiernanmartin opened this issue Dec 1, 2017 · 7 comments
Closed

Comments

@tiernanmartin
Copy link

tiernanmartin commented Dec 1, 2017

I recently found myself scratching my head over how to do a pretty common GIS procedure using the sf package: join one set of multipolygon data to another based on the highest amount of spatial intersection.

A quick Stackoverflow search turned up a postgis solution, but reverse engineering it with sf took me quite a bit of time.

That made me wish that the sf documentation had an example showing how to apply a sequence of steps that combine verbs from sf and other tidyverse packages like dplyr, tidyr, and purrr.

After my project I threw together a reprex to illustrate the approach that I settled on - my hope is that it sparks a discussion that then informs a new section of the geometry manipulation article (or perhaps a separate article entirely).

Here's an image (reprex below) that uses the nc dataset to illustrate the result I was looking for:

Update:
Here's my approach wrapped up in a function:

st_intersects_most <- function(x, y){ 
  ints <- st_intersects(x,y)
  
  tibble(
    ROW_ID = imap(ints, ~rep(.y, times = length(.x))) %>% flatten_chr(),
    INTERSECT_DF = flatten_int(ints) %>% map(~y[.x,] %>% st_drop_geometry), # note: see the reprex for st_drop_geometry's definition
    INTERSECT_AREA = st_intersection(x,y) %>% st_area %>% map_dbl(1)
  ) %>%  
    unnest %>%  
    mutate(geometry = st_geometry(x[ROW_ID,])) %>% 
    st_as_sf
}

It requires the user to follow up with group_by %>% top_n(1, INTERSECT_AREA) %>% ungroup - that could probably be improved.

note: the reprex below has been updated to show the above function in action

Reprex + Session Info
# SETUP ----
library(magrittr)
library(tidyverse)  
library(ggplot2)     # devtools::install_github('tidyverse/ggplot2) for `geom_sf()`
library(sf)
## Linking to GEOS 3.6.1, GDAL 2.2.0, proj.4 4.9.3
library(scales) 
library(gridExtra) 

# LOAD DATA ----

nc_sf <- 
  st_read(system.file("shape/nc.shp", package="sf")) %>% 
  st_transform(2264) # NC state plane, US foot 

# CREATE GRID ----

grid_sf <- 
  tibble(geometry = st_make_grid(nc_sf)) %>% 
  st_sf %>% 
  bind_cols(crossing(X=1:10,Y=1:10)) %>% 
  mutate(CENTROID = map(geometry, st_centroid),
         COORDS = map(CENTROID, st_coordinates),
         COORDS_X = map_dbl(COORDS,1),
         COORDS_Y = map_dbl(COORDS,2),
         COORD_COMB = map2_dbl(COORDS_X,COORDS_Y,~ sum(abs(.x)*1e5,abs(.y)))) %>% 
  arrange(desc(COORD_COMB)) %>% 
  mutate(GRID_ID = purrr::cross2(LETTERS[1:10],str_pad(1:10,2,side = "left",pad = "0")) %>% map_chr(str_c, collapse = "")) %>% 
  as_tibble() %>% 
  st_as_sf() 

# SPATIAL JOIN FUNCTION ----

st_drop_geometry <- function (x) 
{
  if (inherits(x, "sf")) {
    x <- st_set_geometry(x, NULL)
    class(x) <- "data.frame"
    x <- as_tibble(x)
  }
  return(x)
}

st_intersects_most <- function(x, y){ 
  ints <- st_intersects(x,y)
  
  tibble(
    ROW_ID = imap(ints, ~rep(.y, times = length(.x))) %>% flatten_chr(),
    INTERSECT_DF = flatten_int(ints) %>% map(~y[.x,] %>% st_drop_geometry),
    INTERSECT_AREA = st_intersection(x,y) %>% st_area %>% map_dbl(1)
  ) %>%  
    unnest %>%  
    mutate(geometry = st_geometry(x[ROW_ID,])) %>% 
    st_as_sf
}

# SPATIAL JOIN ----

nc_grid_sf <- 
  st_intersects_most(nc_sf, grid_sf) %>% 
  group_by(ROW_ID) %>% 
  top_n(1, INTERSECT_AREA) %>% 
  ungroup %>% 
  select(-CENTROID,-COORDS,-COORDS_X,-COORDS_Y,-COORD_COMB) %>%   
  mutate(CENTROID = map(geometry, st_centroid),
         COORDS = map(CENTROID, st_coordinates),
         COORDS_X = map_dbl(COORDS,1),
         COORDS_Y = map_dbl(COORDS,2),
         COORD_COMB = map2_dbl(COORDS_X,COORDS_Y,~ sum(abs(.x)*1e5,abs(.y))))
## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries


# PLOT RESULTS ----

# Create a discrete color scale for each grid rectangle 

scale_fill_grid <- function(...){
  
  cols <- rep(brewer_pal(type = "qual", palette = 3)(12),times = 9)[1:100]
  
  vals <- grid_sf$GRID_ID
  
  ggplot2::scale_fill_manual(..., 
                             values = setNames(cols,
                                               vals))
}


# Grid beside counties
g1 <- ggplot()
g1 <- g1 + geom_sf(data = grid_sf,  color = "black", fill = "transparent") 
g1 <- g1 + geom_text(data = grid_sf, mapping = aes(COORDS_X, COORDS_Y, label = GRID_ID))
g1 <- g1 + coord_sf(crs = st_crs(grid_sf), datum = NA)
g1 <- g1 + theme_void()
g1 <- g1 + theme(legend.position = "none")

g2 <- ggplot() 
g2 <- g2 + geom_sf(data = nc_sf, color = "white")  
g2 <- g2 + coord_sf(crs = st_crs(nc_sf), datum = NA)
g2 <- g2 + theme_void()
g2 <- g2 + theme(legend.position = "none")

# Grid over counties

g3 <- ggplot()
g3 <- g3 + geom_sf(data = grid_sf, mapping = aes(fill = GRID_ID), alpha = .5, color = "transparent")
g3 <- g3 + geom_sf(data = nc_sf, color = "black",  fill = "transparent")
g3 <- g3 + scale_fill_grid()
g3 <- g3 + geom_text(data = grid_sf, mapping = aes(COORDS_X, COORDS_Y, label = GRID_ID))
g3 <- g3 + coord_sf(crs = st_crs(grid_sf), datum = NA)
g3 <- g3 + theme_void()
g3 <- g3 + theme(legend.position = "none")

g4  <- ggplot()
g4  <- g4  + geom_sf(data = nc_grid_sf, aes(fill = GRID_ID), alpha = .5, color = "white")
g4  <- g4  + scale_fill_grid() 
g4  <- g4  + geom_text(data = nc_grid_sf, mapping = aes(COORDS_X, COORDS_Y, label = GRID_ID), size = 2)
g4  <- g4  + geom_sf(data = grid_sf, fill = "transparent", color = "black")
g4  <- g4  + coord_sf(crs = st_crs(grid_sf), datum = NA)
g4  <- g4  + theme_void()
g4  <- g4  + theme(legend.position = "none")


grid.arrange(g2, g1, ncol=1)

grid.arrange(g3, g4, ncol=1)
devtools::session_info()
## Session info -------------------------------------------------------------
##  setting  value                       
##  version  R version 3.4.0 (2017-04-21)
##  system   x86_64, mingw32             
##  ui       RTerm                       
##  language (EN)                        
##  collate  English_United States.1252  
##  tz       America/Los_Angeles         
##  date     2017-12-02
## Packages -----------------------------------------------------------------
##  package      * version    date       source                            
##  assertthat     0.2.0      2017-04-11 CRAN (R 3.4.2)                    
##  backports      1.1.0      2017-05-22 CRAN (R 3.4.0)                    
##  base         * 3.4.0      2017-04-21 local                             
##  bindr          0.1        2016-11-13 CRAN (R 3.4.2)                    
##  bindrcpp     * 0.2        2017-06-17 CRAN (R 3.4.2)                    
##  bitops         1.0-6      2013-08-17 CRAN (R 3.4.0)                    
##  broom          0.4.2      2017-02-13 CRAN (R 3.4.0)                    
##  cellranger     1.1.0      2016-07-27 CRAN (R 3.4.2)                    
##  class          7.3-14     2015-08-30 CRAN (R 3.4.0)                    
##  classInt       0.1-24     2017-04-16 CRAN (R 3.4.2)                    
##  cli            1.0.0      2017-11-05 CRAN (R 3.4.2)                    
##  colorspace     1.3-2      2016-12-14 CRAN (R 3.4.2)                    
##  compiler       3.4.0      2017-04-21 local                             
##  crayon         1.3.4      2017-10-30 Github (r-lib/crayon@b5221ab)     
##  datasets     * 3.4.0      2017-04-21 local                             
##  DBI            0.7        2017-06-18 CRAN (R 3.4.2)                    
##  devtools       1.13.2     2017-06-02 CRAN (R 3.4.0)                    
##  digest         0.6.12     2017-01-27 CRAN (R 3.4.0)                    
##  dplyr        * 0.7.4      2017-09-28 CRAN (R 3.4.2)                    
##  e1071          1.6-8      2017-02-02 CRAN (R 3.4.2)                    
##  evaluate       0.10       2016-10-11 CRAN (R 3.4.0)                    
##  forcats      * 0.2.0      2017-01-23 CRAN (R 3.4.2)                    
##  foreign        0.8-67     2016-09-13 CRAN (R 3.4.0)                    
##  ggplot2      * 2.2.1.9000 2017-12-02 Github (tidyverse/ggplot2@7b5c185)
##  glue           1.1.1.9000 2017-10-18 Github (tidyverse/glue@808054b)   
##  graphics     * 3.4.0      2017-04-21 local                             
##  grDevices    * 3.4.0      2017-04-21 local                             
##  grid           3.4.0      2017-04-21 local                             
##  gridExtra    * 2.3        2017-09-09 CRAN (R 3.4.2)                    
##  gtable         0.2.0      2016-02-26 CRAN (R 3.4.2)                    
##  haven          1.1.0      2017-07-09 CRAN (R 3.4.2)                    
##  hms            0.3        2016-11-22 CRAN (R 3.4.2)                    
##  htmltools      0.3.6      2017-04-28 CRAN (R 3.4.0)                    
##  httr           1.3.1      2017-08-20 CRAN (R 3.4.2)                    
##  jsonlite       1.5        2017-06-01 CRAN (R 3.4.0)                    
##  knitr          1.16       2017-05-18 CRAN (R 3.4.0)                    
##  lattice        0.20-35    2017-03-25 CRAN (R 3.4.0)                    
##  lazyeval       0.2.1      2017-10-29 CRAN (R 3.4.2)                    
##  lubridate      1.7.1      2017-11-03 CRAN (R 3.4.2)                    
##  magrittr     * 1.5        2014-11-22 CRAN (R 3.4.0)                    
##  memoise        1.1.0      2017-04-21 CRAN (R 3.4.0)                    
##  methods      * 3.4.0      2017-04-21 local                             
##  mnormt         1.5-5      2016-10-15 CRAN (R 3.4.1)                    
##  modelr         0.1.1      2017-07-24 CRAN (R 3.4.2)                    
##  munsell        0.4.3      2016-02-13 CRAN (R 3.4.2)                    
##  nlme           3.1-131    2017-02-06 CRAN (R 3.4.0)                    
##  parallel       3.4.0      2017-04-21 local                             
##  pkgconfig      2.0.1      2017-03-21 CRAN (R 3.4.2)                    
##  plyr           1.8.4      2016-06-08 CRAN (R 3.4.2)                    
##  psych          1.7.8      2017-09-09 CRAN (R 3.4.2)                    
##  purrr        * 0.2.4.9000 2017-10-24 Github (tidyverse/purrr@fc8bea1)  
##  R6             2.2.2      2017-06-17 CRAN (R 3.4.0)                    
##  RColorBrewer   1.1-2      2014-12-07 CRAN (R 3.4.1)                    
##  Rcpp           0.12.14    2017-11-23 CRAN (R 3.4.2)                    
##  RCurl          1.95-4.8   2016-03-01 CRAN (R 3.4.1)                    
##  readr        * 1.1.1      2017-05-16 CRAN (R 3.4.2)                    
##  readxl         1.0.0      2017-04-18 CRAN (R 3.4.2)                    
##  reshape2       1.4.2      2016-10-22 CRAN (R 3.4.2)                    
##  rlang          0.1.4      2017-11-05 CRAN (R 3.4.2)                    
##  rmarkdown      1.8        2017-11-17 CRAN (R 3.4.2)                    
##  rprojroot      1.2        2017-01-16 CRAN (R 3.4.0)                    
##  rvest          0.3.2      2016-06-17 CRAN (R 3.4.2)                    
##  scales       * 0.5.0.9000 2017-12-02 Github (hadley/scales@d767915)    
##  sf           * 0.5-5      2017-10-31 CRAN (R 3.4.0)                    
##  stats        * 3.4.0      2017-04-21 local                             
##  stringi        1.1.5      2017-04-07 CRAN (R 3.4.0)                    
##  stringr      * 1.2.0      2017-02-18 CRAN (R 3.4.0)                    
##  tibble       * 1.3.4      2017-08-22 CRAN (R 3.4.2)                    
##  tidyr        * 0.7.2      2017-10-16 CRAN (R 3.4.2)                    
##  tidyverse    * 1.2.1      2017-11-14 CRAN (R 3.4.2)                    
##  tools          3.4.0      2017-04-21 local                             
##  udunits2       0.13       2016-11-17 CRAN (R 3.4.1)                    
##  units          0.4-6      2017-08-27 CRAN (R 3.4.2)                    
##  utils        * 3.4.0      2017-04-21 local                             
##  withr          2.1.0.9000 2017-12-02 Github (jimhester/withr@fe81c00)  
##  XML            3.98-1.9   2017-06-19 CRAN (R 3.4.1)                    
##  xml2           1.1.1      2017-01-24 CRAN (R 3.4.2)                    
##  yaml           2.1.14     2016-11-12 CRAN (R 3.4.0)
@edzer
Copy link
Member

edzer commented Dec 2, 2017

That's a nice article, and IMO a candidate to be wrapped into something simpler: a new function or an option under an existing one. Does this activity have a name in GIS or spatial database world?

I wrote sf::st_interpolate_aw for area-weighted interpolation, which is what you'd do when the labels are continuous variables. For me, intuitively I would search for such functionality under st_join.

@tiernanmartin
Copy link
Author

I revised the example to show how this might work as a function instead of a series of mutate steps.

I'm not sure if this activity has a formal name in GIS. I've been calling it an "area-weighted spatial join" but others have different names for it:

@mbacou
Copy link

mbacou commented Dec 3, 2017

Sometimes called majority fill or maximum area overlay -- but I believe a standard (faster?) approach is to rasterize the base layer to an appropriate resolution and then use raster::extract() to return the mode/median raster value in each polygon.

@edzer edzer closed this as completed in f16249d Dec 3, 2017
@edzer
Copy link
Member

edzer commented Dec 3, 2017

@tiernanmartin something went wrong, it seems, e.g. when assigning F08 and H04, in your plot. I now integrated a somewhat different approach in st_join, honoring the rest of st_join's conventions wrt renaming duplicate columns, and left option. Output from ?st_join below:
x

@tiernanmartin
Copy link
Author

@mbacou "maximum area overlay" sounds good to me and thanks for the the tip about using raster::extract()

@edzer You're right - the function version above contains a mistake. I'll see if I can sleuth it out...

I played around with st_join(..., largest = TRUE) and I'm really pleased with the results. The example in ?st_join is quite clear. Thanks for the quick turn-around with f16249d and I'll let you know if I run into any unexpected behavior.

I haven't tested it with large datasets yet, but I'm wondering if this may be another instance where st_intersection would benefit from vectorization (#575)?

@anaavu
Copy link

anaavu commented Jun 5, 2020

"largest" parameter as is now in st_join would also add a lot to raster::extract(), as an additional option for "fun" https://www.rdocumentation.org/packages/raster/versions/3.1-5/topics/extract

@edzer
Copy link
Member

edzer commented Jun 6, 2020

But that is an issue for raster, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants