Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with factors that have additional classes #1888

Closed
SebKrantz opened this issue Oct 26, 2023 · 13 comments
Closed

Error with factors that have additional classes #1888

SebKrantz opened this issue Oct 26, 2023 · 13 comments

Comments

@SebKrantz
Copy link

SebKrantz commented Oct 26, 2023

I'm getting a serious error using ggplot together with collapse, in particular, collapse optimizes computations on factors by giving them an additional class "na.included" if they are known to not contain any NA's. This class avoids a missing value check when using factors for grouping, joins etc. see https://sebkrantz.github.io/collapse/reference/qF.html. In the example below, I melt a data frame using collapse::pivot which adds the "na.included" class to the generated 'variable' factor i.e. the class is c("factor", "na.included"). This produces an error in ggplot2 caused by vctrs. Thanks for considering this issue.

library(collapse)
#> collapse 2.0.3, see ?`collapse-package` or ?`collapse-documentation`
#> 
#> Attaching package: 'collapse'
#> The following object is masked from 'package:stats':
#> 
#>     D
library(ggplot2)

mtcars |>
  pivot() |>
  ggplot(aes(y = value)) +
      geom_histogram() +
      facet_wrap( ~ variable)
#> Error in `stop_native_implementation()`:
#> ! `vec_ptype2.factor.factor()` is implemented at C level.
#>   This R function is purely indicative and should never be called.
#> ℹ This is an internal error that was detected in the vctrs package.
#>   Please report it at <https://github.com/r-lib/vctrs/issues> with a reprex
#>   (<https://tidyverse.org/help/>) and the full backtrace.
#> Backtrace:
#>      ▆
#>   1. ├─base::tryCatch(...)
#>   2. │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>   3. │   ├─base (local) tryCatchOne(...)
#>   4. │   │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#>   5. │   └─base (local) tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
#>   6. │     └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>   7. │       └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#>   8. ├─base::withCallingHandlers(...)
#>   9. ├─base::saveRDS(...)
#>  10. ├─base::do.call(...)
#>  11. ├─base (local) `<fn>`(...)
#>  12. └─global `<fn>`(input = base::quote("neat-zebra_reprex.R"))
#>  13.   └─rmarkdown::render(input, quiet = TRUE, envir = globalenv(), encoding = "UTF-8")
#>  14.     └─knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet)
#>  15.       └─knitr:::process_file(text, output)
#>  16.         ├─base::withCallingHandlers(...)
#>  17.         ├─base::withCallingHandlers(...)
#>  18.         ├─knitr:::process_group(group)
#>  19.         └─knitr:::process_group.block(group)
#>  20.           └─knitr:::call_block(x)
#>  21.             └─knitr:::block_exec(params)
#>  22.               └─knitr:::eng_r(options)
#>  23.                 ├─knitr:::in_input_dir(...)
#>  24.                 │ └─knitr:::in_dir(input_dir(), expr)
#>  25.                 └─knitr (local) evaluate(...)
#>  26.                   └─evaluate::evaluate(...)
#>  27.                     └─evaluate:::evaluate_call(...)
#>  28.                       ├─evaluate (local) handle(...)
#>  29.                       │ └─base::try(f, silent = TRUE)
#>  30.                       │   └─base::tryCatch(...)
#>  31.                       │     └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>  32.                       │       └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>  33.                       │         └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#>  34.                       ├─base::withCallingHandlers(...)
#>  35.                       ├─base::withVisible(value_fun(ev$value, ev$visible))
#>  36.                       └─knitr (local) value_fun(ev$value, ev$visible)
#>  37.                         └─knitr (local) fun(x, options = options)
#>  38.                           ├─base::withVisible(knit_print(x, ...))
#>  39.                           ├─knitr::knit_print(x, ...)
#>  40.                           └─knitr:::knit_print.default(x, ...)
#>  41.                             └─evaluate (local) normal_print(x)
#>  42.                               ├─base::print(x)
#>  43.                               └─ggplot2:::print.ggplot(x)
#>  44.                                 ├─ggplot2::ggplot_build(x)
#>  45.                                 └─ggplot2:::ggplot_build.ggplot(x)
#>  46.                                   └─layout$setup(data, plot$data, plot$plot_env)
#>  47.                                     └─ggplot2 (local) setup(..., self = self)
#>  48.                                       └─self$facet$compute_layout(data, self$facet_params)
#>  49.                                         └─ggplot2 (local) compute_layout(..., self = self)
#>  50.                                           └─ggplot2::combine_vars(data, params$plot_env, vars, drop = params$drop)
#>  51.                                             ├─ggplot2:::unique0(vec_rbind0(!!!values[has_all]))
#>  52.                                             └─ggplot2:::vec_rbind0(!!!values[has_all])
#>  53.                                               ├─ggplot2:::with_ordered_restart(...)
#>  54.                                               │ └─base::withCallingHandlers(...)
#>  55.                                               └─vctrs::vec_rbind(..., .error_call = .error_call)
#>  56.                                                 └─vctrs (local) `<fn>`()
#>  57.                                                   └─vctrs:::vec_ptype2.factor.factor(...)
#>  58.                                                     └─vctrs:::stop_native_implementation("vec_ptype2.factor.factor")
#>  59.                                                       └─cli::cli_abort(...)
#>  60.                                                         └─rlang::abort(...)

# The problem is the additional "na.included" class
mtcars |> pivot() |> vclasses()
#>             variable                value 
#> "factor na.included"            "numeric"

Created on 2023-10-26 with reprex v2.0.2

@lionel-
Copy link
Member

lionel- commented Oct 26, 2023

We don't currently support superclassing classes like that.

Would you consider using c("my_package_my_class", "factor") instead?

In this example I'm including a package prefix "my_package_", this is good practice because S3 classes are shared globally across the ecosystem.

@SebKrantz
Copy link
Author

Ok, the primary class is meant to be a factor, so I don't see the point of adding a class in front which is more likely to cause problems, especially if the only reason for doing so has to do with some restrictive C-level routine in vctrs. I will also not rename the class as that would be considered an API breaking change to collapse.

In general I don't think the order of classes should matter except for method dispatch. If vctrs has an internal method for factors it should dispatch on the factor class regardless of which other classes the object inherits.

We have seen similar issues in base R where R Core added the "matrix" class to arrays and it caused havoc for a lot of developers that did illicit checks on the length of the class vector.

@lionel-
Copy link
Member

lionel- commented Oct 26, 2023

Ok, the primary class is meant to be a factor, so I don't see the point of adding a class in front which is more likely to cause problems

On the contrary, since you're subclassing factor your class should come before your superclasses in the class vector. This is standard practice.

In general I don't think the order of classes should matter except for method dispatch. If vctrs has an internal method for factors it should dispatch on the factor class regardless of which other classes the object inherits.

Sorry we don't currently have plans to change this.

@SebKrantz
Copy link
Author

Ok, I will reshuffle the class, in the hope that it doesn't break anything. I still think vctrs is being too restrictive here though. There is, to my knowledge, nothing in R that warrants programs to fail because the order of classes is non-standard. Any user code that manipulates the class attribute could induce such artifacts.

@lionel-
Copy link
Member

lionel- commented Oct 26, 2023

Thanks, please let me know if this causes trouble elsewhere.

The problem in vctrs is that we don't do normal S3 dispatch, we do a limited form of double dispatch where inheritance is explicitly disabled. So superclassing is incompatible with this approach as we only dispatch to the leftmost classes.

You will probably have to implement some coercion methods for compatibility with vctrs/tidyverse, to allow your class to be coercible to factor without loss of information, see https://vctrs.r-lib.org/reference/howto-faq-coercion.html. We made the choice of being restrictive in case of class mismatches, though we'd like to be a bit more lenient in the future.

@SebKrantz
Copy link
Author

Ahh ok, I see this will cause errors probably everywhere:

library(vctrs)
#> Warning: package 'vctrs' was built under R version 4.3.1
f <- iris$Species
class(f) <- c("na.included", "factor")
vec_ptype2(f, iris$Sepal.Length)
#> Error:
#> ! Can't combine `f` <na.included> and `iris$Sepal.Length` <double>.
#> Backtrace:
#>      ▆
#>   1. ├─vctrs::vec_ptype2(f, iris$Sepal.Length)
#>   2. └─vctrs (local) `<fn>`()
#>   3.   └─vctrs::vec_default_ptype2(...)
#>   4.     ├─base::withRestarts(...)
#>   5.     │ └─base (local) withOneRestart(expr, restarts[[1L]])
#>   6.     │   └─base (local) doWithOneRestart(return(expr), restart)
#>   7.     └─vctrs::stop_incompatible_type(...)
#>   8.       └─vctrs:::stop_incompatible(...)
#>   9.         └─vctrs:::stop_vctrs(...)
#>  10.           └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = call)

Created on 2023-10-26 with reprex v2.0.2

I definitely want to avoid opening a pandora's box that will force collapse to move along with tidyverse developments. So in this case I think sticking to my superclassing approach is preferable. Can the issue also be solved in a different way?

@lionel-
Copy link
Member

lionel- commented Oct 26, 2023

Isn't the error above expected? You'd only want your class to be coercible to factor and character IIUC?

@SebKrantz
Copy link
Author

Same with vec_ptype2(f, as.character(iris$Sepal.Length)). I guess the primary reason for me to add "na.included" after the "factor" class was precisely because I wanted it to be ignored by basically all other software, which would be the case here, but not if I make the adjustement to put the class before "factor". I now believe changing the class order could cause more issues with the tidyverse than anticipated. So my question is there a method such as vec_cast.na.included that I could define to make vctrs compatible with my superclassing approach. That to me seems a cleaner solution.

@SebKrantz
Copy link
Author

I see if I export vec_ptype2.factor.factor <- function(x, y, ...) x, this solves the problem with the present version of collapse. Could this cause any other issues? or is this function expected to do anything else?

@lionel-
Copy link
Member

lionel- commented Oct 26, 2023

I see if I export vec_ptype2.factor.factor <- function(x, y, ...) x, this solves the problem

😱 I would not do that. That will not solve your issue anyway since the dispatch mechanism will not look into the search (except the global env).

I now believe changing the class order could cause more issues with the tidyverse than anticipated

I'd be surprised and I don't think there has been evidence of that in this thread. The error you're seeing is expected behaviour in our stack.

Same with vec_ptype2(f, as.character(iris$Sepal.Length)).

Yes we require explicit coercion behaviour, which must be done by implementing the ptype2 and cast coercion methods for all relevant classes (see linked FAQ). I'm sorry that causes you more work. The main issue with this approach is that it doesn't scale for classes like I() that are essentially mixins for a variable number of classes. Hopefully that's not your case?

@SebKrantz
Copy link
Author

That's a lot of coding expected from my side and I don't see that it solves the issue once and for all, because you could introduce new classes that require coercion to/from factor, and then I would be required to implement the casting methods for that. I'll think about it further, but I think this is not a good option.

@lionel-
Copy link
Member

lionel- commented Oct 26, 2023

Thanks for your consideration.

@lionel- lionel- closed this as completed Oct 26, 2023
@SebKrantz
Copy link
Author

Just a note to anyone reading this thread (referenced in the collapse documentation): the solution proposed by lionel is not feasible because collapse does not depend on vctrs and thus cannot register any vctrs methods. It is also my intention for the "na.included" class to be ignored by other software, because its sole purpose is internal performance optimization at the C-level for factors known to not have any missing values. Thus it should come after the "factor" class and not be targeted by method dispatch. To avoid errors with the current version vctrs, users can either define a function

vec_ptype2.factor.factor <- function(x, y, ...) x

in their global namespace, or reclass factors such as set_class(my_fac, "factor") (from magrittr), for example:

library(fastverse)
fastverse_extend(ggplot2)

iris %>%
  pivot("Species") %>%
  ggplot(aes(x = value, fill = Species)) +
    geom_histogram() + 
    facet_wrap( ~ set_class(variable, "factor"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants