Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor error handling of the repository #233

Merged
merged 27 commits into from
Jun 21, 2023
Merged

Refactor error handling of the repository #233

merged 27 commits into from
Jun 21, 2023

Conversation

Sicheng-Pan
Copy link
Collaborator

Why

Currently, most errors in r-polars are plain string error messages, and for the errors from other crates (e.g. pola-rs) we have to manually convert it into string. Sometimes we append extra string context to illustrate why there is an error. Such string errors could become hard to work with when there are multiple layers of results to unwrap, since we need to directly manipulate the error strings. We also lose the layered structure of the error in the process, and on the R side we only see a single error message.

How

The anyhow crate provides common error handling implementations that can simplify our workflow:

  • We can use anyhow::Error for all internal functions on the Rust side (e.g. robj_to_*). All errors implementing the std::error::Error trait (as well as Sync and Send traits) can be automatically converted to anyhow::Error with the ? operator. We can also use <error>.context(<msg>) to wrap the error with additional context. Currently the extendr_api::Error unimplements Sync and Send, so we cannot directly reuse those errors for now.
  • We need to implement our wrapper error (currently named Rerr) for the interops between R and Rust. An anyhow::Error can also be automatically wrapped to Rerr with the ? operator. We use the wrapper error at boundary functions (e.g. import_arrow_ipc) that connects R and Rust. We can also implement as many helper functions for the wrapper error as we like to and export it to the R side (e.g. dump the layers of contexts to R).
  • We can also implement helper methods on the R side to help process errors from Rust.

Progress

I have implemented a minimal version of this error handling system. The scan_arrow_ipc function on the R side can dump error messages from an anyhow::Error. Currently the import_arrow_ipc on the Rust side looks messy since we are using Strings as errors, which do not implement the std::error::Error trait by default. Hopefully this could look a lot cleaner after we refactored the internal functions.

man/nanoarrow.Rd Outdated Show resolved Hide resolved
@sorhawell
Copy link
Collaborator

Looks good !

How about classed errors? I guess robj_to! and all conversion functions should also support some classed errors . What would be a good error type internally for r-polars for e.g. try_f64_into_usize() -> Result<_, ?>?

We can try to make some conversion wrappers for the extendr error types to support send+sync.

rust-polars used to define their own Result type Result<T, E = PolarsError> however they renamed it to PolarsResult to avoid namespace collisions with all the other crates, that also liked to define what Result should mean, e.g. extendr Result<T, E= ExtendrError>. When working in namespaces with multiple custom Result-types for e.g. rust-polars, extendr and r-polars it is annoying to rename the Result to avoid conflict and remember what Result currently meeans. I prefer if we could name our new candidate Result-type based on anyhow to something like RResult or use plain

std::result::Result;
Result<T, Rerr> // not too verbose anyways

@sorhawell
Copy link
Collaborator

sorhawell commented Jun 6, 2023

this will add auto completion to Rerr methods. Although Rerr is not public, it is nice to develop with.

#' @title auto complete $-access into a polars object
#' @description called by the interactive R session internally
#' @param x Rerr
#' @param pattern code-stump as string to auto-complete
#' @export
#' @keywords internal
.DollarNames.Rerr = function(x, pattern = "") {
  get_method_usages(Rerr, pattern = pattern)
}

@sorhawell
Copy link
Collaborator

It seems Extendr errors are not send+sync because they can contain Robj which is only partially send+sync, see ParRobj (but that is playing with fire :) )

I think the Robj can be converted to a string representation, because they are not used for anything but for display anyways.

@Sicheng-Pan
Copy link
Collaborator Author

Looks good !

How about classed errors? I guess robj_to! and all conversion functions should also support some classed errors . What would be a good error type internally for r-polars for e.g. try_f64_into_usize() -> Result<_, ?>?

We can try to make some conversion wrappers for the extendr error types to support send+sync.

rust-polars used to define their own Result type Result<T, E = PolarsError> however they renamed it to PolarsResult to avoid namespace collisions with all the other crates, that also liked to define what Result should mean, e.g. extendr Result<T, E= ExtendrError>. When working in namespaces with multiple custom Result-types for e.g. rust-polars, extendr and r-polars it is annoying to rename the Result to avoid conflict and remember what Result currently meeans. I prefer if we could name our new candidate Result-type based on anyhow to something like RResult or use plain

std::result::Result;
Result<T, Rerr> // not too verbose anyways

For internal error type I would recommend anyhow::Error. If we would like to know the concrete type of error within anyhow::Error, we could try to downcast it to its original form in the implementation of Rerr.

I would also recommend using classed context (e.g. context enum) to achieve classed errors. The base error will be string error messages created from anyhow!() or errors from other libraries.

I've renamed our Result type to RResult for clarity.

@sorhawell
Copy link
Collaborator

sorhawell commented Jun 7, 2023

It appears anyhow is convenient but not to handy to match on Errors.

If we say the final goal of error handling is to produce classed R error conditions something like this perhaps

> structure(list(message="cannot divide by zero", call =NULL), class = c("ComputeError","rust_polars","condition"))

<ComputeError: cannot divide by zero>

Here a unit test can check easily what type of error was raised

how would we convert the anyhow errors into such a classed error?

... or could Rerr have a method which can return a char vec like class= above? That would be likely be fine also.

If using anyhow error shouldn't r-polars also use an enum as base error to store the class/type information?

@Sicheng-Pan
Copy link
Collaborator Author

It appears anyhow is convenient but not to handy to match on Errors.

If we say the final goal of error handling is to produce classed R error conditions something like this perhaps

> structure(list(message="cannot divide by zero", call =NULL), class = c("ComputeError","rust_polars","condition"))

<ComputeError: cannot divide by zero>

Here a unit test can check easily what type of error was raised

how would we convert the anyhow errors into such a classed error?

... or could Rerr have a method which can return a char vec like class= above? That would be likely be fine also.

If using anyhow error shouldn't r-polars also use an enum as base error to store the class/type information?

Well I think that could be achieved with the classed context. For example if you have a context enum with various error tags. For example, if we have the following enum:

enum ErrTag {
    ComputeError,
    Polars,
}

impl Display for ErrTag {
    ...
}

then we can write the following code:

err.context(ErrTag::ComputeError).context(ErrTag::Polars)

which will gives us the error with the desired context. Currently I exports the anyhow::Error::chain function for Rerr, and on the R side we can use

result$err$chain()

to retrieve the layers of contexts for the error. Although I have not implemented error tag yet, you can try it out with String contexts. For example:

import_arrow_ipc("wrong.file", NULL, TRUE, TRUE, NULL, 0L, TRUE)$err$chain()

This will produce a vector containing the error context and the error itself.

@sorhawell
Copy link
Collaborator

sorhawell commented Jun 8, 2023

https://www.lpalmieri.com/posts/error-handling-rust/

I was reading this blog on error basics and thiserror and anyhow. It struck me that anyhow also requires besides send+sync also static. That means we can only use static strings. Maybe that would mean a big rework of the help conversion messages, and no String-types used in a transition phase. Maybe it will also be difficult to represent the user input which was wrong.

example

> pl$lit("hey you")$str$split(" ")$to_r()
[[1]]
[1] "hey" "you"

> pl$lit("hey you")$str$split(42)$to_r()
Error: in str$split: the arg [by] is not a single string, but 42.0
when calling :
pl$lit("hey you")$str$split(42)

I have tinkered with representation function for extendr errors to make them send+sync.

I was thinking to provide also some simple test cases in that branch.

@Sicheng-Pan
Copy link
Collaborator Author

https://www.lpalmieri.com/posts/error-handling-rust/

I was reading this blog on error basics and thiserror and anyhow. It struck me that anyhow also requires besides send+sync also static. That means we can only use static strings. Maybe that would mean a big rework of the help conversion messages, and no String-types used in a transition phase. Maybe it will also be difficult to represent the user input which was wrong.

example

> pl$lit("hey you")$str$split(" ")$to_r()
[[1]]
[1] "hey" "you"

> pl$lit("hey you")$str$split(42)$to_r()
Error: in str$split: the arg [by] is not a single string, but 42.0
when calling :
pl$lit("hey you")$str$split(42)

I have tinkered with representation function for extendr errors to make them send+sync.

I was thinking to provide also some simple test cases in that branch.

Well I think the anyhow! macro could take String as inputs and creates the corresponding anyhow::Error.

Based on your representation, I have now implemented

impl From<extendr_api::Error> for Rerr {
    ...
}

so that we can reuse the extendr error enum for our purpose.

@Sicheng-Pan
Copy link
Collaborator Author

Sicheng-Pan commented Jun 11, 2023

@sorhawell I've made some major refactoring in the last commit. We are no longer using anyhow since I cannot figure out how to extract the contexts from anyhow::Error after putting them in. Instead, I created a custom Rerr for our purpose, which is basically a vector of contexts. I have also implemented the collect_handled function for LazyFrame based on this design:

pub fn collect_handled(&self) -> crate::rerr::RResult<crate::rdataframe::DataFrame> {
use crate::rerr::WithRctx;
handle_thread_r_requests(self.clone().0).when("calling $collect() on LazyFrame")
}

Hopefully this looks cleaner than the previous version.

And by the way I have implemented a test_rerr() function so that you can directly use it to test the new Rerr on the R side.

@sorhawell
Copy link
Collaborator

And by the way I have implemented a test_rerr() function so that you can directly use it to test the new Rerr on the R side.

sweet looking forward to try it out "tomorrow" :)

@sorhawell
Copy link
Collaborator

sorhawell commented Jun 13, 2023

I really like this pattern you have come up with. I think this could work well in all aspects we have talked of. Elegant.

I have tinkered with it and used in a conversion function robj_to_str. I like the variant TypeMissMatch but often the argument blaming and conversion error will not happen on the same scope level. I tried two new variants TypeMiss (of two strings) and BadArgument(of one string).

https://github.com/pola-rs/r-polars/blob/787c343b373757f71c5583302ac9b2de4ceb6090/src/rust/src/rlib.rs#LL266C1-L272C2

usecase, without robj_to! ... for now

use crate::rerr;
use crate::rerr::WithRctx;
#[extendr]
fn test_rerr_str(fishbone: Robj) -> rerr::RResult<String> {
    let s = crate::utils::test_robj_to_str(fishbone).blame_arg("fishbone");
    s.map(|s| s.to_string())
}

new variants

#[derive(Clone, Debug)]
pub enum Rctx {
    Extendr(String),
    Hint(String),
    Plain(String),
    Polars(String),
    When(String),
    TypeMismatch(String, String, String),
    TypeMiss(String, String),
    BadArgument(String),
}

#[derive(Clone, Debug)]
pub struct Rerr(Vec<Rctx>);
pub type RResult<T> = core::result::Result<T, Rerr>;

pub trait WithRctx<T> {
    fn ctx(self, rctx: Rctx) -> RResult<T>;
    fn hint<S: Into<String>>(self, msg: S) -> RResult<T>;
    fn when<S: Into<String>>(self, msg: S) -> RResult<T>;
    fn blame_arg<S: Into<String>>(self, msg: S) -> RResult<T>;
}

impl<T: Clone, E: Into<Rerr>> WithRctx<T> for core::result::Result<T, E> {
    fn ctx(self, rctx: Rctx) -> RResult<T> {
        self.map_err(|e| {
            let mut rerr = e.into();
            rerr.0.push(rctx);
            rerr
        })
    }

    fn hint<S: Into<String>>(self, msg: S) -> RResult<T> {
        self.ctx(Rctx::Hint(msg.into()))
    }

    fn when<S: Into<String>>(self, msg: S) -> RResult<T> {
        self.ctx(Rctx::When(msg.into()))
    }

    fn blame_arg<S: Into<String>>(self, arg_name: S) -> RResult<T> {
        self.ctx(Rctx::BadArgument(arg_name.into()))
    }
}

use from R

> polars:::test_rerr_str(42)
$ok
NULL

$err
Error: The argument [fishbone] caused an error because : expected a [string] value, but got [42.0] instead

attr(,"class")
[1] "extendr_result"

src/rust/src/rerr.rs Outdated Show resolved Hide resolved
@Sicheng-Pan
Copy link
Collaborator Author

Sicheng-Pan commented Jun 14, 2023

I really like this pattern you have come up with. I think this could work well in all aspects we have talked of. Elegant.

I have tinkered with it and used in a conversion function robj_to_str. I like the variant TypeMissMatch but often the argument blaming and conversion error will not happen on the same scope level. I tried two new variants TypeMiss (of two strings) and BadArgument(of one string).

I broke the TypeMismatch context into two as you suggested. Also I used thiserror to make the Rctx looks nicer.

src/rust/src/rerr.rs Outdated Show resolved Hide resolved
@Sicheng-Pan Sicheng-Pan marked this pull request as ready for review June 19, 2023 02:05
@Sicheng-Pan
Copy link
Collaborator Author

@sorhawell I've just updated the conflicting unit tests. Most of them are still using the old expect_grepl_error, while I created a new expect_rerr helper function for those that have a Rerr in their output.

@sorhawell
Copy link
Collaborator

sorhawell commented Jun 20, 2023

###R side:
error_trait.R to allow legacy string errors to coexist with Rerr
both Rerr (Rerr.R) and string errors (string_error.R) implements the "trait" which is a collection of S3 methods
error_conversion.R provides the new unwrap() and result() to use for any (two) error types

fix check warnings/errors
update tests

###rust-side:
pub struct Rerr(VecDeque<Rctx>) use VecDeque to allow pushing contexts first and last in chain. Add when_last used for calls ("When calling:").
expose all Rerr methods
add WhereIn method and context variant

Rerr should also be renamed as the class is too generic to not S3 method namespace collide with some other package, which happend in the past for DataType. A new name could be RPolarsErr or PolarsErr.

@sorhawell sorhawell changed the title [WIP] Refactor error handling of the repository Refactor error handling of the repository Jun 20, 2023
R/error_conversion.R Outdated Show resolved Hide resolved
@sorhawell
Copy link
Collaborator

@Sicheng-Pan what do you think of the added commit?

@Sicheng-Pan
Copy link
Collaborator Author

@sorhawell The new commits looks good to me. If we would like to rename Rerr to avoid ambiguity, I would prefer RPolarsErr .

Copy link
Collaborator

@sorhawell sorhawell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many thx @Sicheng-Pan I'm very impressed with the PR you made

@sorhawell
Copy link
Collaborator

ohh #247 will only be an issue when someone forks polars and run the actions as here.

@sorhawell sorhawell merged commit 71f3d05 into pola-rs:main Jun 21, 2023
8 checks passed
@sorhawell
Copy link
Collaborator

building an error on R side could look like this

> err = .pr$RPolarsErr$new()$bad_robj(42)$misvalued("is less than zero")$when("storing small numbers")
> print(err)
When storing small numbers: Expected a value that is less than zero: Got value [Rvalue: 42.0, Rsexp: Doubles, Rclass: ["numeric"]]
> err$contexts()
$When
[1] "storing small numbers"

$ValueOutOfScope
[1] "is less than zero"

$BadValue
[1] "Rvalue: 42.0, Rsexp: Doubles, Rclass: [\"numeric\"]"

This was referenced Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants