Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make vec_cast() a directional coercion generic #606

Closed
lionel- opened this issue Oct 2, 2019 · 11 comments · Fixed by #965
Closed

Make vec_cast() a directional coercion generic #606

lionel- opened this issue Oct 2, 2019 · 11 comments · Fixed by #965
Milestone

Comments

@lionel-
Copy link
Member

lionel- commented Oct 2, 2019

There are two issues with the current casting rules in vctrs:

  • They make vctrs complicated. Users and class implementors need to know about two diagrams for coercions and casts.

  • The casting rules do not seem to be the right default most of the time. For instance input validation can't use vec_cast() because it is too flexible.

To fix this, we could make vec_cast() a directional coercion.

  • A non-directional coercion converts an object to the common type, which is always the larger type (i.e. integer is coerced to double).

  • A directional coercion converts an object to the type supplied by the caller, but only if there exists a common type. A directional coercion is no longer guaranteed to convert to the larger type. If the cast is lossy, this is an error by default, unless the code is wrapped in vctrs::allow_lossy_cast().

Essentially, we'd add the rule that vec_cast() methods can only be implemented for coercible types, i.e. types that have a common type specified by vec_ptype2() methods.

Cast methods to remove

It is unclear yet how much of a breaking change it would be to remove these methods, we might need a deprecation strategy:

  • character casts
  • list casts
  • double <-> date

Where should this conversion logic go? Either we move it to separate single-dispatch generics like vec_as_double(), or we add a new double-dispatch generic in vctrs. @DavisVaughan suggested the name vec_force().

  • Even if we have a double-dispatch generic, it is nice to have the explicit is_ and as_ functions for a class, so we might end up with separate wrappers anyway.

  • The separate generics have the advantage of requiring only a single dispatch, which makes things faster for users and simpler for class authors.

  • At a syntax level, having a single double-dispatch generic seems to give a false sense of principled genericity, whereas most of the implemented behaviour is ad hoc.

  • On the other hand, vec_force() could be useful as part of a deprecation plan.

@DavisVaughan
Copy link
Member

DavisVaughan commented Oct 2, 2019

Just in terms of terminology, does this make:

  • vec_cast_common() non-directional
  • vec_cast_common(.to = <ptype>) directional
  • vec_cast(x, to = <ptype>) directional

@lionel-
Copy link
Member Author

lionel- commented Oct 2, 2019

Yes exactly. Also vapply() in base R is an example of prototype-based directional coercion.

@DavisVaughan
Copy link
Member

I just wanted to clarify something that I think is an important point here, and while I think @lionel- understands this well, it wasn't clear to me until now.

I'm going to call this directional coercion generic vec_coerce() for the moment, rather than vec_cast().

As a directional coercion generic:

  1. If there is a common type you are always allowed to coerce up into the larger type. Ex. integer() -> numeric()

  2. What I didn't get before is that if there is a common type you can also coerce down into the smaller type, if it is not lossy. Ex. you can do numeric() -> integer() if it is not lossy. But not character() -> integer() because there is no common type, even if it is not lossy.

This would make vec_coerce() super helpful as an input validator for integerish inputs. It would allow 1L and 1 and TRUE as inputs, but not "1" as vec_cast() currently allows.

The only way to do character() -> integer() or integer() -> character() would be through vec_force(), and I think that feels very correct.

Some concrete examples of my thinking are below:

library(vctrs)

vec_coerce <- vctrs:::vec_coercible_cast
vec_force <- vec_cast

# common type is numeric()
vec_ptype2(numeric(), integer())
#> numeric(0)

# meaning that of course we can do `integer -> numeric`
vec_coerce(integer(), numeric())
#> numeric(0)

# BUT it also means we are allowed to do `numeric -> integer`
# as a coercion as long as it is not lossy
vec_coerce(numeric(), integer())
#> integer(0)

vec_coerce(1.5, integer())
#> Error: Lossy cast from `x` <double> to `to` <integer>.
#> * Locations: 1


# Compare that with character() and integer(), which has no common type
vec_ptype2(character(), integer())
#> Error: No common type for `x` <character> and `y` <integer>.

# Meaning we can't even do this because there is no common type
vec_coerce(1L, character())
#> Error: No common type for `x` <integer> and `to` <character>.

# Or this, even though it isn't lossy
vec_coerce("1", integer())
#> Error: No common type for `x` <character> and `to` <integer>.

# To do this, you'd have to use `vec_force()`
vec_force(1L, character())
#> [1] "1"

vec_force("1", integer())
#> [1] 1

# Which would work as long as it is not lossy
vec_force("1.5", integer())
#> Error: Lossy cast from `x` <character> to `to` <integer>.
#> * Locations: 1

Created on 2019-11-21 by the reprex package (v0.3.0.9000)

@lionel-
Copy link
Member Author

lionel- commented Nov 25, 2019

Maybe a good terminology would be:

  • Coercion for directional conversion to a specified supertype or subtype. Lossy coercions to subtypes are an error.

  • Promotion for non-directional conversion to an unspecified type. In vctrs this happens most often with variadic combinations. A promotion is always to the super type, for safety.

    There is a parallel with C or C++ promotions which are implicit coercions applied to temporary objects. The temporary objects are eventually coerced unless the recipient variable is typed with auto (which is usually what is happening with our dynamic typing system, unless the caller explicitly coerces after combination or provides a ptype).

@hadley
Copy link
Member

hadley commented Nov 25, 2019

Just a note, but if we use promotion, we'll need to be careful that it matches the direction in our diagrams.

@krlmlr
Copy link
Member

krlmlr commented Jan 2, 2020

Does this resolve the following:

vctrs::vec_cast("a", NA)
#> Error: Lossy cast from `x` <character> to `to` <logical>.
#> * Locations: 1
vctrs::`vec_slice<-`(NA, 1, "a")
#> Error: Lossy cast from `value` <character> to `x` <logical>.
#> * Locations: 1

Created on 2020-01-02 by the reprex package (v0.3.0)

@lionel-
Copy link
Member Author

lionel- commented Jan 3, 2020

@krlmlr This looks like correct behaviour. vec-assign already uses directional coercion.

@krlmlr
Copy link
Member

krlmlr commented Jan 3, 2020

Do we treat a vector of NA as unspecified?

@krlmlr
Copy link
Member

krlmlr commented Jan 3, 2020

See tidyverse/tibble#689 for context: Do we support creating an NA vector (=a column filled with NA) and then filling it up with vec_slice<-()?

@lionel-
Copy link
Member Author

lionel- commented Jan 3, 2020

We could make that exception, but I'm not sure that we should. Perhaps better wait and gain more experience with this restriction enabled, as we switch to vctrs in other packages?

@lionel-
Copy link
Member Author

lionel- commented Feb 12, 2020

tidyr users have started to use the unrestricted conversion semantics of vec_cast() tidyverse/tidyr#842

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants