Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joins between sf objects and data.frames #372

Closed
Nowosad opened this issue Jun 2, 2017 · 3 comments
Closed

Joins between sf objects and data.frames #372

Nowosad opened this issue Jun 2, 2017 · 3 comments

Comments

@Nowosad
Copy link
Contributor

Nowosad commented Jun 2, 2017

Lately I've been trying to use joins for pairs of sf objects and data.frames and I came across two problematic groups of joins:

  1. When a row exist in a data.frame, but doesn't exist in a sf object. Then, a new row with a GEOMETRYCOLLECTION geometry is added.

  2. When a data.frame is the main object in join. Then, a new object has a geometry column, but doesn't have a sf class.

My idea is:

  • in the first group - a new row should preserve a geometry type of an sf object, in this example an empty MULTIPOLYGON
  • in the second group - a geom column should be removed

I'm not sure, if those ideas are the best. What do you think? @edzer @hadley @Robinlovelace

library(tidyverse)
library(sf)

sf_obj = st_read(system.file("shape/nc.shp", package="sf")) %>% 
  filter(NAME %in% c("Ashe", "Surry")) %>% 
  select(NAME)

df_obj = data.frame(NAME = c("Ashe", "Surry", "Rowan"), VALUE = c(1, 4, 6))

## 1th group --------------------------

# error: empty GEOMETRYCOLLECTION() added to geom
right_join1 = sf_obj %>% 
  right_join(df_obj, by = "NAME")
right_join1

# error: empty GEOMETRYCOLLECTION
full_join1 = sf_obj %>% 
  full_join(df_obj, by = "NAME") 
full_join1

## 2nd group ------------------------

# error: keeps geom col
left_join1 = df_obj %>% 
  left_join(sf_obj, by = "NAME")
left_join1

# error: unwanted geom column added
right_join2 = df_obj %>% 
  right_join(sf_obj, by = "NAME") 
right_join2

# error: geom column added
inner_join1 =  df_obj %>% 
  inner_join(sf_obj, by = "NAME") 
inner_join1

# error: null geom
full_join2 = df_obj %>% 
  full_join(sf_obj, by = "NAME") 
full_join2
@edzer
Copy link
Member

edzer commented Jun 2, 2017

Thanks; the first problem is not an error, but indeed annoying, I will look into it.

The second problem is not an sf issue; the data.frame methods for these methods are in dplyr.

@Nowosad
Copy link
Contributor Author

Nowosad commented Jun 2, 2017

Thank you @edzer. I tested it a little bit and works great. I've also opened a new issue in the dplyr package - tidyverse/dplyr#2833

@edzer
Copy link
Member

edzer commented Jun 27, 2017

Now,

> full_join2 %>% st_sf
Simple feature collection with 3 features and 2 fields (with 1 geometry empty)
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
epsg (SRID):    4267
proj4string:    +proj=longlat +datum=NAD27 +no_defs
   NAME VALUE                       geometry
1  Ashe     1 MULTIPOLYGON(((-81.47275543...
2 Surry     4 MULTIPOLYGON(((-80.45634460...
3 Rowan     6                 MULTIPOLYGON()

substitutes the NULL list column value returned by dplyr's join with the appropriate empty geometry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants