Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix fallback rbind #1111

Merged
merged 26 commits into from
May 22, 2020
Merged

Fix fallback rbind #1111

merged 26 commits into from
May 22, 2020

Conversation

lionel-
Copy link
Member

@lionel- lionel- commented May 20, 2020

Branched from #1104.

  • Add a new common class fallback option in the coercion mechanism. The options are wrapped in a struct and a structured list() which can be passed back and forth between the R and C sides.

  • When this fallback is turned on, the common class suffix is collected by vec_ptype_common(), and stored in a sentinel ptype. This sentinel is not meant to be leaked to users. With the current design, the fallback should only be used through higher level operations like vec_cast_common() or vec_rbind(). We may be able to find a more general approach for 0.4.0.

  • vec_cast_common() leaves untouched the inputs which have a fallback.

  • vec_c() and vec_rbind() call base::c() if inputs have a fallback common class and implement a c() method. There are still several cases which are not handled well, and the rbind code needs to be refactored. We'll do this for 0.4.0 as part of the switch to recursive proxying and restoration.

  • The special-casing of sfc lists has been removed, and we fall back to c.sfc() instead. Fixes dplyr::bind_rows fails with XYZ geometry in dplyr 1.0 r-spatial/sf#1390.

Adapting vec_match() to use this fallback will be a simple matter of using a fallback-enabled variant of vec_cast_common().

vec_unchop() will be adapted to the new fallback in a different PR.

@lionel-
Copy link
Member Author

lionel- commented May 21, 2020

vec_cast_common() leaves untouched the inputs which have a fallback.

Adapting vec_match() to use this fallback will be a simple matter of using a fallback-enabled variant of vec_cast_common().

I don't know what I was thinking. It is trivially easy to show this can't work. The type is defined by both the class and the attributes and so the values of different objects are not in general comparable if they don't have exactly the same class and attributes. Simple example: the values of units::set_units(1:2, "m/s") and units::set_units(3L, "km/s") are not comparable without coercion, even though they have the same class and the same attribute names.

I think we'll need something like this:

library(rlang)
library(purrr)

# Obtain ptypes with `x[0]` and concatenate them to find the common ptype
base_ptype_common <- function(...) {
  # Would first check for common class and throw incompatible type
  # error otherwise
  ptypes <- map(list2(...), `[`, 0L)
  exec("c", !!!ptypes)
}

# Concatenate inputs with ptype in first location. This causes a
# coercion of the inputs to a common type. Slice the result and return
# list of coerced vectors of the same sizes as the inputs.
base_cast_common <- function(..., .ptype = NULL) {
  dots <- list2(...)

  if (is.null(.ptype)) {
    .ptype <- base_ptype_common(!!!dots)
  }
  common <- exec("c", .ptype, !!!dots)

  sizes <- lengths(dots)
  locs <- reduce(sizes, .init = list(0), function(output, input) {
    n <- last(last(output))
    c(output, list(seq(n + 1, n + input)))
  })
  locs <- locs[-1]

  map(locs, function(loc) common[loc])
}
last <- function(x) {
  x[[length(x)]]
}

Works well with units:

(x <- units::set_units(1:2, "m/s"))
#> Units: [m/s]
#> [1] 1 2
(y <- units::set_units(3, "km/s"))
#> 3 [km/s]

base_ptype_common(x, y)
#>  [m/s]

base_cast_common(x, y)
#> [[1]]
#> Units: [m/s]
#> [1] 1 2
#>
#> [[2]]
#> 3000 [m/s]

For sf this is a bit funny though, because [ isn't type-stable. The result is downcasted to a more specific class if possible. The common type of different sfc subclasses is normally an abstract sfc_GEOMETRY, :

sfc_x <- sf::st_sfc(sf::st_point())
sfc_y <- sf::st_sfc(sf::st_multipoint(), sf::st_multipoint())

base_ptype_common(sfc_x, sfc_y)
#> Geometry set for 0 features
#> bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
#> CRS:            NA

class(base_ptype_common(sfc_x, sfc_y))
#> [1] "sfc_GEOMETRY" "sfc"

However the inputs are being coerced back and forth and end up unchanged:

base_cast_common(sfc_x, sfc_y)
#> [[1]]
#> Geometry set for 1 feature  (with 1 geometry empty)
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
#> CRS:            NA
#> POINT EMPTY
#>
#> [[2]]
#> Geometry set for 2 features  (with 2 geometries empty)
#> geometry type:  MULTIPOINT
#> dimension:      XY
#> bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
#> CRS:            NA
#> MULTIPOINT EMPTY
#> MULTIPOINT EMPTY

At this point it's unclear that we can find a general fallback for vec-cast-common. Maybe we should first scrape CRAN packages ordered by popularity for implementations of [ and c(), and evaluate the generality of the approach drafted above. This will have to wait after dplyr 1.0.

This shouldn't affect much the current fallback to base::c() implemented in this PR, so it is still good to go.

R/type-data-frame.R Show resolved Hide resolved
R/utils.R Show resolved Hide resolved
man/faq/developer/reference-compatibility.Rmd Outdated Show resolved Hide resolved
src/cast.c Outdated Show resolved Hide resolved
src/c.c Outdated Show resolved Hide resolved
src/type-data-frame.c Outdated Show resolved Hide resolved
src/type2.c Outdated Show resolved Hide resolved
tests/testthat/test-cast.R Outdated Show resolved Hide resolved
R/bind.R Outdated Show resolved Hide resolved
NEWS.md Show resolved Hide resolved
tests/testthat/error/test-bind.txt Show resolved Hide resolved
tests/testthat/test-bind.R Show resolved Hide resolved
tests/testthat/test-bind.R Show resolved Hide resolved
@lionel- lionel- changed the base branch from doc-primitives to master May 22, 2020 13:39
lionel- and others added 3 commits May 22, 2020 17:00
Co-authored-by: Hadley Wickham <h.wickham@gmail.com>
Co-authored-by: Davis Vaughan <davis@rstudio.com>
@lionel- lionel- merged commit 28e537c into r-lib:master May 22, 2020
@lionel- lionel- deleted the fix-fallback-rbind branch May 22, 2020 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dplyr::bind_rows fails with XYZ geometry in dplyr 1.0
3 participants