-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch from extendr to savvy #1126
Comments
Not familiar with |
@etiennebacher It would be quicker to look at a real example. For example, implementing the Arrow C stream interface is much easier than extendr. use arrow::ffi_stream::{ArrowArrayStreamReader, FFI_ArrowArrayStream};
let stream_reader = unsafe {
let stream = savvy::ExternalPointerSexp::try_from(stream_ptr)?
.cast_mut_unchecked::<FFI_ArrowArrayStream>();
ArrowArrayStreamReader::from_raw(stream).map_err(|e| e.to_string())?
}; It would also allow for more general Rust rules for file structure, such as the ability to split impl blocks into multiple files. (extendr/extendr#739) Footnotes |
Currently, the file structure on the Rust side is quite different from that of py-polars and uses a uniquely defined error type, so it is necessary to write functions according to completely different rules than for general packages using extendr. Another thing I am thinking is that just as Python Polars have, for example, the DataFrame class (defined in Python) and the (internal) PyDataFrame class (defined in Rust), it would be better to have a dual structure for each class in R Polars. |
@etiennebacher Although only basic type conversion from R to Polars has been implemented yet, I have created a polars binding using savvy. Note that the directory structure matches that of py-polars. Matching py-polars will make it easier to follow upstream and contribute. The current implementation of r-polars is too complex. |
Thanks @eitsupi, that looks great and much easier to follow and understand than the current Rust implementation (so far), thanks! I won't have much time (or skill anyway) to explore this in the next few days but I have a few questions coming to my mind (all of this supposing we move to
100% agree |
Thanks for checking that.
Basically, yes, but the type of input is different. For example, in the following, you will see that the integer and double vectors are converted to Series in separate functions (I tried to see if this could be handled in a single function, but it seemed impossible. But I think separating the functions makes the correspondence between one S3 method and one Rust function clearer and easier to understand).
As long as the Arrow C interface is used, there should be no problem (currently the C interface is intended to be used via nanoarrow, see #1138)
I don't think a phased transition is possible. Almost everything needs to be rewritten in a huge PR / switch the branch.
This is both an advantage and a disadvantage, but the error structure is completely different.
This seems to be related to the Rust code to bypass R's single-threaded limitation to execute R code? I don't think there is anything extendr-specific that can be translated, but I don't know if this is the right implementation to begin with (like the segmentation fault we recently got on Linux). |
r-polars/vignettes/differences-with-python.Rmd Lines 94 to 106 in 6eac27a
Compared to the above example, the error message would look like this: > as_polars_series(geos::as_geos_geometry("LINESTRING (0 1, 3 9)"))
Error in as_polars_series.default(geos::as_geos_geometry("LINESTRING (0 1, 3 9)")) :
Unsupported class: geos_geometry
> rlang::global_entrace()
> as_polars_series(geos::as_geos_geometry("LINESTRING (0 1, 3 9)"))
Error in `as_polars_series.default()`:
! Unsupported class: geos_geometry
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/rlang_error>
Error in `as_polars_series.default()`:
! Unsupported class: geos_geometry
---
Backtrace:
▆
1. ├─neopolars::as_polars_series(geos::as_geos_geometry("LINESTRING (0 1, 3 9)"))
2. └─neopolars:::as_polars_series.default(geos::as_geos_geometry("LINESTRING (0 1, 3 9)")) at neo-r-polars/R/as_polars_series.R:3:3
> as_polars_df(list(x = geos::as_geos_geometry("LINESTRING (0 1, 3 9)")))
Error in `as_polars_series.default()`:
! Unsupported class: geos_geometry
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/rlang_error>
Error in `as_polars_series.default()`:
! Unsupported class: geos_geometry
---
Backtrace:
▆
1. ├─neopolars::as_polars_df(list(x = geos::as_geos_geometry("LINESTRING (0 1, 3 9)")))
2. └─neopolars:::as_polars_df.list(list(x = geos::as_geos_geometry("LINESTRING (0 1, 3 9)"))) at neo-r-polars/R/as_polars_df.R:3:3
3. ├─neopolars:::wrap(PlRDataFrame$init(lapply(x, function(column) as_polars_series(column)$`_s`))) at neo-r-polars/R/as_polars_df.R:13:3
4. ├─PlRDataFrame$init(lapply(x, function(column) as_polars_series(column)$`_s`)) at neo-r-polars/R/series-series.R:4:3
5. │ └─neopolars:::.savvy_wrap_PlRDataFrame(...) at neo-r-polars/R/000-wrappers.R:58:3
6. └─base::lapply(x, function(column) as_polars_series(column)$`_s`) at neo-r-polars/R/000-wrappers.R:43:3
7. └─neopolars (local) FUN(X[[i]], ...)
8. ├─neopolars::as_polars_series(column) at neo-r-polars/R/as_polars_df.R:13:13
9. └─neopolars:::as_polars_series.default(column) at neo-r-polars/R/as_polars_series.R:3:3 (This is not an appropriate example since it seems unlikely that we can create an error-prone situation within Rust at the moment, but anyway, the Result type on the R side is not necessary.) |
I implemented > as_polars_series(mtcars)$struct$unnest()
shape: (32, 11)
┌──────┬─────┬───────┬───────┬───┬─────┬─────┬──────┬──────┐
│ mpg ┆ cyl ┆ disp ┆ hp ┆ … ┆ vs ┆ am ┆ gear ┆ carb │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ f64 ┆ ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞══════╪═════╪═══════╪═══════╪═══╪═════╪═════╪══════╪══════╡
│ 21.0 ┆ 6.0 ┆ 160.0 ┆ 110.0 ┆ … ┆ 0.0 ┆ 1.0 ┆ 4.0 ┆ 4.0 │
│ 21.0 ┆ 6.0 ┆ 160.0 ┆ 110.0 ┆ … ┆ 0.0 ┆ 1.0 ┆ 4.0 ┆ 4.0 │
│ 22.8 ┆ 4.0 ┆ 108.0 ┆ 93.0 ┆ … ┆ 1.0 ┆ 1.0 ┆ 4.0 ┆ 1.0 │
│ 21.4 ┆ 6.0 ┆ 258.0 ┆ 110.0 ┆ … ┆ 1.0 ┆ 0.0 ┆ 3.0 ┆ 1.0 │
│ 18.7 ┆ 8.0 ┆ 360.0 ┆ 175.0 ┆ … ┆ 0.0 ┆ 0.0 ┆ 3.0 ┆ 2.0 │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │
│ 30.4 ┆ 4.0 ┆ 95.1 ┆ 113.0 ┆ … ┆ 1.0 ┆ 1.0 ┆ 5.0 ┆ 2.0 │
│ 15.8 ┆ 8.0 ┆ 351.0 ┆ 264.0 ┆ … ┆ 0.0 ┆ 1.0 ┆ 5.0 ┆ 4.0 │
│ 19.7 ┆ 6.0 ┆ 145.0 ┆ 175.0 ┆ … ┆ 0.0 ┆ 1.0 ┆ 5.0 ┆ 6.0 │
│ 15.0 ┆ 8.0 ┆ 301.0 ┆ 335.0 ┆ … ┆ 0.0 ┆ 1.0 ┆ 5.0 ┆ 8.0 │
│ 21.4 ┆ 4.0 ┆ 121.0 ┆ 109.0 ┆ … ┆ 1.0 ┆ 1.0 ┆ 4.0 ┆ 2.0 │
└──────┴─────┴───────┴───────┴───┴─────┴─────┴──────┴──────┘
> as_polars_series(1)$struct$unnest()
Error: SchemaMismatch(ErrString("invalid series dtype: expected `Struct`, got `f64`"))
> rlang::global_entrace()
> as_polars_series(1)$struct$unnest()
Error:
! SchemaMismatch(ErrString("invalid series dtype: expected `Struct`, got `f64`"))
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/rlang_error>
Error:
! SchemaMismatch(ErrString("invalid series dtype: expected `Struct`, got `f64`"))
---
Backtrace:
▆
1. └─as_polars_series(1)$struct$unnest()
2. └─neopolars:::series_struct_unnest(.self) at neo-r-polars/R/series-struct.R:5:19
3. ├─neopolars:::wrap(self$`_s`$struct_unnest()) at neo-r-polars/R/series-struct.R:12:3
4. └─self$`_s`$struct_unnest() at neo-r-polars/R/series-series.R:4:3
5. └─neopolars:::.savvy_wrap_PlRDataFrame(...) at neo-r-polars/R/000-wrappers.R:72:5 |
I took a look at the code in your neo-r-polars repo but didn't play a lot with it yet. I'm trying to see how this transition to MotivationI think it'd be nice to have a doc that summarizes why we want to move to
but maybe this could be implemented in Also, I'm not confident I can adapt some more complex code (e.g. This is not to say that this isn't worth it, but it's quite a massive rewrite so we should be clear why we do it. List of changesWe could list the type of changes that would be necessary. You already started making a proof-of-concept so you know more what is needed. Feel free to edit this list.
I suppose this means we need input checking on the R side. It shouldn't be too hard to implement if we use
How we do the transitionSupposing we move to SummaryIt would be nice if
Once all of this is guaranteed in this minimal PoC, then we can move on and add all the Expr/DataFrame/... stuff that requires a lot of tedious (but relatively easy) work. Until we have this, I'm not sure I'll spend much time on What do you think about all this? |
Sidenote about errors: would it make sense to return a string with a specific "error" attribute? This would allow more customization of the error message in R. For instance, if we use foo_unwrap <- function(x) {
error <- attributes(x)$error_type
if (is.null(error)) {
return(x)
}
err_type <- switch(
error,
"ColumnNotFound" = "Column not found",
"ComputeError" = "Operation failed"
)
err_type <- paste0(err_type, ":")
rlang::abort(
c(
err_type,
" " = x
)
)
}
# Function worked: no msg
foo_unwrap(1)
#> [1] 1 # Function failed
error_output <- c("a")
attr(error_output, "error_type") <- "ColumnNotFound"
foo_unwrap(error_output)
#> Error in `foo_unwrap()`:
#> ! Column not found:
#> a # Function failed
error_output <- c("arithmetic on string and numeric not allowed, try an explicit cast first")
attr(error_output, "error_type") <- "ComputeError"
foo_unwrap(error_output)
#> Error in `foo_unwrap()`:
#> ! Operation failed:
#> arithmetic on string and numeric not allowed, try an explicit cast first |
Thanks for taking a look at this. I will write about motivation in the README once the minimum functionality (e.g. the examples in the README of this repository) is implemented.
Yes. So basically we need to implement the branch using the S3 method. I think it is clearer compared to the current implementation which has a huge match arm on the Rust side. |
This comment was marked as resolved.
This comment was marked as resolved.
Thanks to @etiennebacher, error handling based on rlang has been implemented, and I explained Motivation and current status in the README. @sorhawell @vincentarelbundock @Sicheng-Pan @grantmcdermott Could you take a look at that? |
@eitsupi Thanks for the prototype, and it looks very promising. Probably we can implement Even if we stick to the current implementation, it seems possible to migrate without too much effort, if we disable the support for the usage of R functions in queries (just for this feature) |
I haven’t been able to test it in-depth, but my high-level feedback is that this all sounds good to me. Ensuring a healthy dev experience is something that we should be prioritising first and foremost. If the switch to savvy and the addition of the rlang dependency make life easier for you @eitsupi (and you @etiennebacher), then that’s all to the good. It’s infinitely preferable to make some dependency changes than overwhelm the primary maintainer(s). I’ll try to test neo-polars on some of my existing codebases when I get a sec. But if the test suite is already passing then I say go for it :-) |
Thanks all for your feedback. I have opened a new issue #1152 because this rewrite is not just a migration to savvy, but also complete change of structure on the R side. |
I don't think this can be accomplished in a short term, but it would be far easier to maintain by eliminating the large amount of code written to return a Result type to R with extendr.
I also think we can simplify the source code in several other points.
The text was updated successfully, but these errors were encountered: