Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returning error messages to R #278

Open
multimeric opened this issue Sep 3, 2021 · 59 comments
Open

Returning error messages to R #278

multimeric opened this issue Sep 3, 2021 · 59 comments

Comments

@multimeric
Copy link
Member

Is there an easy way to return strings from Rust that will become errors, warnings or messages in R? If I remember right, simply returning an Error will just panic since it just gets unwrapped.

@multimeric
Copy link
Member Author

Okay it looks like we have an existing wrapper which is throw_r_error:

pub fn throw_r_error<S: AsRef<str>>(s: S) {
let s = s.as_ref();
unsafe {
R_ERROR_BUF = Some(std::ffi::CString::new(s).unwrap());
libR_sys::Rf_error(R_ERROR_BUF.as_ref().unwrap().as_ptr());
unreachable!("Rf_error does not return");
};
}

I note there's also an Rf_warning which can be used the same way as Rf_error for warnings too. There's also R_ShowMessage, but neither of these have a nice Rust wrapper so we would have to manually convert the string into a C string as in the above snippet.

@andy-thomason
Copy link
Contributor

This sounds like something to look at.

Would you like to try a PR or would you like one of us to give it a try?

@multimeric
Copy link
Member Author

If it's as simple as implementing a simple Rust wrapper like this for each of those APIs I'm happy to do it.

By the way, we have this wrapper for returning errors, but afaik none of the TryFrom Robj implementations actually use this. So I'm guessing that means that errors in a Rust library will not return a user-friendly R error but will show a less friendly Rust error? Am I right?

@multimeric
Copy link
Member Author

Notes for future implementation:

@andy-thomason
Copy link
Contributor

We did look at handling Result types in wrappers. It may still do so.

@multimeric
Copy link
Member Author

The thing that worries me is this:

impl<T> From<Result<T>> for Robj
where
T: Into<Robj>,
{
fn from(res: Result<T>) -> Self {
// Force a panic on error.
res.unwrap().into()
}
}

So if our Rust function returns a result, and that result is an error, then it will just get unwrapped in Rust. I guess the error message will get printed to the terminal maybe (?) but I suspect this isn't very user friendly. Also will this cause existing Rust objects in R to break, since I assume the memory will be reclaimed when the library crashes?

@andy-thomason
Copy link
Contributor

This only works because we catch panics and convert them to R errors.
Panics should unwind the stack, dropping objects along the way.
We should check that we have a test for this, however.

A nicer way would be to have a R class "RustError" or similar
which wraps a string. We can use the Display trait to generate
the error message.

The jury has been out on this, but we could put it on a switch.

@multimeric
Copy link
Member Author

Huh, I didn't even know you could catch panics. Well yes I would find it infinitely cleaner to just go Rust Error → R Error and bypass the use of panic entirely. Is there some kind of way to vote on this issue?

@andy-thomason
Copy link
Contributor

andy-thomason commented Sep 27, 2021 via email

@multimeric
Copy link
Member Author

Okay so in summary what I'm hearing is that the correct way to raise an error using extendr is simply to return a Result which will be unwrapped and thereby converted to an R error. In the future this may change.

@clauswilke
Copy link
Member

To be clear: I don't think we'll change the part where you return a Result. How exactly the handling of that is implemented and communicated to R could change.

@multimeric
Copy link
Member Author

Sorry yes that's what I meant: that the unwrapping panic part may change but that returning a Result will stay. Which I think is ideal because that's the most idiomatic way to indicate an error in Rust.

@shrektan
Copy link
Contributor

shrektan commented Jan 25, 2022

Sorry guys, I don't follow.

Let's say we made a Rust function foo() -> Result<Robj> and it returns Err(Error::Other("xxx".to_string())).

Now rextendr will just unwrap the result leading to the case that the error message printed in R is ... foo panicked. with the real error message "xxx" being passed to the stderr, thus can't be captured by R, correct?

I can't understand the reason that we can't return the correct message to R because cpp11 and Rcpp already did this job, didn't they?

(To be clear, what I want is R calls a Rust function, which may throw Rust errors. The errors are not in R's stderr stream. Thus, it looks strange.
I mean what I'm trying to ask is not about calling an R function in Rust. I know this would be very difficult to handle as when an error happens, R will perform a long jump so Rust can't clean up the stack.)

Thanks.

@andy-thomason
Copy link
Contributor

In C++ you can call longjump and just ignore the mess this creates, but Rust is a little more
careful - longjump does not exist in Rust, nor should it.

We did discuss having an option to return an object with a "rust.error" class or similar.
This would be far cleaner than converting a panic to a R longjump.

Something like

res <- myrustfunc();
res$ok() gives the result or NULL
res$err() gives the error or NULL

This was not popular with the team and got pushed back.

@multimeric
Copy link
Member Author

I still don't think we've addressed my suggestion above which is to just call the R API's throw error function, because I believe unlike the other methods, it will behave how R users expect and be catchable using tryCatch. Returning an object on failure wouldn't be idiomatic to R, nor is just printing to the message to stdout which I think is what we're doing now.

@andy-thomason
Copy link
Contributor

We do have a function for throwing R errors. This is very dangerous, however
as it could leave objects on the stack in an undefined state.

In the wrapper, we could assume that objects have been dropped, but we would
need annotation to handle errors.

If the error supports Display (a fair assumption) it should be possible
to use to_string() to generate the error message.

#[extendr(throw_result_errors=true)]
fn i_throw_stuff(input: Doubles) -> Result<Doubles> {
   ...
}

@clauswilke
Copy link
Member

Returning an object on failure wouldn't be idiomatic to R,

Exactly, that's why I pushed back against it. Since R uses longjumps on error, I think the Rust code needs to somehow emulate this also. Would it help to write some C intermediary that wraps everything, so that the Rust code can fully shut down and return to C, and then the C code calls the longjump?

@yutannihilation
Copy link
Contributor

In the wrapper, we could assume that objects have been dropped, but we would need annotation to handle errors.

Using a wrapper sounds good to me (If I remember correctly, it was what @shrektan suggested on Discord). But, I don't get why we need annotation. Are there anything ambiguous without that annotation switch?

@multimeric
Copy link
Member Author

multimeric commented Jan 28, 2022

We do have a function for throwing R errors. This is very dangerous, however
as it could leave objects on the stack in an undefined state.

Sorry, to clarify I mean: add a panic hook (like we already have), but in that hook we call the Rf_error function with the error string. That way we get appropriate behaviour but also the stack has been unwound.

@yutannihilation
Copy link
Contributor

yutannihilation commented Jan 28, 2022

I was just trying in that direction. Unlike with normal errors, std::thread::Error doesn’t implement the Error trait, but it seems we can downcast it.

One disadvantage of this is probably we cannot add custom classes to the error, because panic!() can only take a bare string, but I think this is also an option.

pub fn handle_panic<F, R>(err_str: &str, f: F) -> R
where
    F: FnOnce() -> R,
    F: std::panic::UnwindSafe,
{
    // Surpress outputs from the default hook (c.f. https://stackoverflow.com/a/59211505)
    let prev_hook = std::panic::take_hook();
    std::panic::set_hook(Box::new(|_| {}));

    let result = match std::panic::catch_unwind(f) {
        Ok(res) => res,
        Err(e) => {
            let err_str = if let Some(e) = e.downcast_ref::<&str>() {
                e.as_ptr()
            } else {
                err_str.as_ptr()
            };
            unsafe {
                libR_sys::Rf_error(err_str as *const std::os::raw::c_char);
            }
            unreachable!("handle_panic unreachable")
        }
    };

    // Restore the previous hook
    std::panic::set_hook(prev_hook);

    result
}

This almost worked, but I don't know why I get this extra "src/lib.rs" string...

#[extendr]
fn hello_world() -> &'static str {
    panic!("Hello world!");
}
hello_world()
#> Error in hello_world() : Hello world!src\lib.rs

@multimeric
Copy link
Member Author

Just a guess, is it because the string pointer you're creating is to a string without a null terminator, so it's reading off the end of the string onto another one? Can you convert to a CString first?

@yutannihilation
Copy link
Contributor

Ah, it might be. Thanks.

@andy-thomason
Copy link
Contributor

It is worth noting that a CString will leak memory as nothing will deallocate it.

This may not be a problem for occasional errors - I'm pretty certain that R also will leak
memory on errors, this is a fact of longjump.

@yutannihilation
Copy link
Contributor

yutannihilation commented Jan 28, 2022

Oh, what I saw was the opposite. If I convert it to a CString, no error message is shown probably because the memory is already deallocated? But, I don't investigate it further. (this was my mistake: #278 (comment)) I think we might be able to use R_alloc() here somehow.

Anyway, I agree I was too naive about allocation during handling panic. I also notice I cannot pass a non-constant string to panic!().

@shrektan
Copy link
Contributor

I did some simple investigation but don't have time to look further yet.

The cargo package does this by using a macro to wrap the Rust code and unwind the stack when panicked, then converting the message into an R error. @yutannihilation You may find the following code helpful.

https://github.com/dbdahl/cargo-framework/blob/b7fa0a8eb7f9c8f9249e3c20b6e930607b8d8de2/cargo/inst/template/src/rustlib/roxido_macro/src/lib.rs#L143-L176

@dfalbel
Copy link
Contributor

dfalbel commented Jan 28, 2022

We did discuss having an option to return an object with a "rust.error" class or similar. This would be far cleaner than converting a panic to a R longjump.

Something like

res <- myrustfunc();
res$ok() gives the result or NULL
res$err() gives the error or NULL

Would it be possible to this, but let the autogenerated R wrappers raise the error?
For example, currently the R wrapper is something like:

RModelsBpe$read_file <- function(vocab, merges) .Call(wrap__RModelsBpe__read_file, vocab, merges)

It could be something like:

RModelsBpe$read_file <- function(vocab, merges)  {
  res <- Call(wrap__RModelsBpe__read_file, vocab, merges)
  if (!is.null(res$err())) stop(res$err)
  res$ok()
}

This adds an overhead for every function call, we could provide a way for users to disable this when they are sure no error is going to be raised.

@yutannihilation
Copy link
Contributor

I think it's also a possible solution. So, we have plenty of options... (not sure if they all are really feasible, though) I have no idea how we can decide.

@shrektan

This comment has been minimized.

@multimeric
Copy link
Member Author

I thought std::panic::set_hook was global, so that we can catch hooks thrown from any crates we've loaded?

@yutannihilation The other dimension is how we return it to R, which is an orthogonal question to where we handle it. We've had suggestions of :

  • Return an error object to R which points to Rust memory
  • Use Rf_error to throw a string
  • Just print to stderr and return missing

@yutannihilation
Copy link
Contributor

So, all other alternatives only solve part of the problem, i.e., the rust panics that package author writes.

No, I don't think so. We do want to capture all sorts of panics, but it's to protect the user's R session from crash, not to handle it nicely. What we are discussing here is how to handle Error, right?

@yutannihilation
Copy link
Contributor

After some thinking, I'd vote the R wrapper. The Rust wrapper can simply return an error condition object. Here's my attempt (I just tweaked the handle_panic() and didn't modify the Rust wrappers yet):

master...yutannihilation:poc/raise-error-in-R-wrapper

While this can be also possible in C code, but I think it's easier to throw this nicer with metadata on R's code.

@clauswilke
Copy link
Member

After some thinking, I'd vote the R wrapper.

That's a lot of performance overhead though if every function that dispatches to Rust has this R-side error handling.

@yutannihilation
Copy link
Contributor

Yeah, it seems if (inherits(x, "rust_error")) takes around 0.5ms on my local, so I agree the performance overhead might be a concern, depending on the use case.

@Ilia-Kosenkov
Copy link
Member

I would like again to point out that handling errors on the R side can be an opt-in, determined by the return type. If it is a Result<T, E>, we take extra steps and convert it into list(res = T, err = E), and then perform a check in R wrapper, comparing result$err with NULL. I actually tried to simulate this behavior and see what is the magnitude of the impact. For some reason, it is not constant but scales with the load. Perhaps there is something I am missing.
I am simulating Result<T, E> by explicitly returning a list. Perhaps this can be sped up.
It seems that the order of if branches is important. It looks like it is faster to check for res$res and immediately return it in if branch rather than check for res$err and return result in else branch. The hot path should go under the first if.

Benchmark results
rextendr::rust_source(
    code = "
    use extendr_api::prelude::*;
    #[extendr]
    fn get_with_result_impl(x : Doubles, y : Doubles) -> List {
        let res : Rfloat =
            x.iter().zip(y.iter())
            .map(|(xx, yy)| xx + yy)
            .sum();
        list!(res = res)
    }
    #[extendr]
    fn get_without_result(x : Doubles, y : Doubles) -> Rfloat {
        x.iter().zip(y.iter())
            .map(|(xx, yy)| xx + yy)
            .sum()
    }
    ",
    patch.crates_io = list(
        `extendr-api` = list(git = "https://github.com/extendr/extendr"),
        `libR-sys` = list(git = "https://github.com/extendr/libR-sys")
    )
)
#> i build directory: 'C:/Users/[redacted]/AppData/Local/Temp/RtmpQdCHWc/file27cc48be2316'
#> v Writing 'C:/Users/[redacted]/AppData/Local/Temp/RtmpQdCHWc/file27cc48be2316/target/extendr_wrappers.R'.

get_with_result <- function(x, y) {
    res <- .Call("wrap__get_with_result_impl", x, y, PACKAGE = "rextendr1")
    if (!is.null(res$err)) {
        stop(res$err)
    }
    res$res
}

get_with_result_rev <- function(x, y) {
    res <- .Call("wrap__get_with_result_impl", x, y, PACKAGE = "rextendr1")
    if (!is.null(res$res)) {
        return(res$res)
    }
    stop(res$err)
}

test_list_access <- function(x) {
    if (!is.null(x$err)) {
        stop(x$err)
    }
    x$res
}

test_return_result <- function(x) x

bench::press(
    n =  c(1, 1e2, 1e4, 1e6, 1e7),
    {
        x <- rnorm(n)
        y <- rnorm(n)
        reference <- sum(x + y)
        dummy <- list(res = reference)
        bench::mark(
            test_return_result(reference),
            test_list_access(dummy),
            test_list_access(list(result = reference)),
            get_without_result(x, y),
            get_with_result(x, y),
            get_with_result_rev(x, y),
        )
    }
) |> print(n = 100)
#> Running with:
#>          n
#> 1        1
#> 2      100
#> 3    10000
#> 4  1000000
#> 5 10000000
#> # A tibble: 30 x 14
#>    expression                                        n     min  median `itr/sec`
#>    <bch:expr>                                    <dbl> <bch:t> <bch:t>     <dbl>
#>  1 test_return_result(reference)                     1   100ns   200ns    4.33e6
#>  2 test_list_access(dummy)                           1   200ns   300ns    2.82e6
#>  3 test_list_access(list(result = reference))        1   400ns   500ns    1.87e6
#>  4 get_without_result(x, y)                          1   5.1us   5.5us    1.78e5
#>  5 get_with_result(x, y)                             1  10.1us  10.8us    8.93e4
#>  6 get_with_result_rev(x, y)                         1   9.9us  10.8us    9.13e4
#>  7 test_return_result(reference)                   100   100ns   200ns    5.16e6
#>  8 test_list_access(dummy)                         100   200ns   300ns    2.73e6
#>  9 test_list_access(list(result = reference))      100   400ns   500ns    1.93e6
#> 10 get_without_result(x, y)                        100  13.2us  13.9us    7.13e4
#> 11 get_with_result(x, y)                           100  18.2us  19.6us    5.06e4
#> 12 get_with_result_rev(x, y)                       100  18.1us  19.2us    5.17e4
#> 13 test_return_result(reference)                 10000   100ns   200ns    5.03e6
#> 14 test_list_access(dummy)                       10000   200ns   300ns    2.83e6
#> 15 test_list_access(list(result = reference))    10000   400ns   500ns    1.88e6
#> 16 get_without_result(x, y)                      10000 737.5us 757.5us    1.31e3
#> 17 get_with_result(x, y)                         10000 747.2us 769.5us    1.30e3
#> 18 get_with_result_rev(x, y)                     10000 740.8us 762.5us    1.31e3
#> 19 test_return_result(reference)               1000000   100ns   200ns    5.27e6
#> 20 test_list_access(dummy)                     1000000   200ns   300ns    2.75e6
#> 21 test_list_access(list(result = reference))  1000000   400ns   500ns    1.96e6
#> 22 get_without_result(x, y)                    1000000  74.6ms    75ms    1.33e1
#> 23 get_with_result(x, y)                       1000000  74.8ms    75ms    1.33e1
#> 24 get_with_result_rev(x, y)                   1000000  74.9ms    75ms    1.33e1
#> 25 test_return_result(reference)              10000000   100ns   200ns    5.36e6
#> 26 test_list_access(dummy)                    10000000   200ns   300ns    2.81e6
#> 27 test_list_access(list(result = reference)) 10000000   400ns   500ns    1.99e6
#> 28 get_without_result(x, y)                   10000000 760.4ms 760.4ms    1.32e0
#> 29 get_with_result(x, y)                      10000000 753.3ms 753.3ms    1.33e0
#> 30 get_with_result_rev(x, y)                  10000000 748.7ms 748.7ms    1.34e0
#> # ... with 9 more variables: mem_alloc <bch:byt>, gc/sec <dbl>, n_itr <int>,
#> #   n_gc <dbl>, total_time <bch:tm>, result <list>, memory <list>, time <list>,
#> #   gc <list>

Created on 2022-01-31 by the reprex package (v2.0.1)

@Ilia-Kosenkov
Copy link
Member

UPD: An interesting observation. It seems the fastest way (based on the number of iterations instead of measured time/ops) is to check res$err for NULL and return res$res in if.

Benchmark results
stop_in_if <- function(res) {
    if (!is.null(res$err)) {
        stop(res$err)
    }
    res$res
}

return_in_if <- function(res) {
    if (!is.null(res$res)) {
        return(res$res)
    }
    stop(res$err)
}

cache_res <- function(res) {
    tmp <- res$res
    if (!is.null(tmp)) {
        return(tmp)
    }
    stop(res$err)
}

check_err_and_return <- function(res) {
    if (is.null(res$err)) {
        return(res$res)
    }
    stop(res$err)
}


x <- rnorm(1e6)
y <- rnorm(1e6)
reference <- sum(x + y)
dummy <- list(res = reference)
bench::mark(
    stop_in_if(dummy),
    return_in_if(dummy),
    cache_res(dummy),
    check_err_and_return(dummy),
    max_iterations = 1e8
) |> print(n = 100)
#> # A tibble: 4 x 13
#>   expression                     min  median `itr/sec` mem_alloc `gc/sec`  n_itr
#>   <bch:expr>                  <bch:> <bch:t>     <dbl> <bch:byt>    <dbl>  <int>
#> 1 stop_in_if(dummy)            200ns   300ns  3083301.        0B     28.2 1.09e6
#> 2 return_in_if(dummy)          200ns   300ns  2970877.        0B     29.3 1.11e6
#> 3 cache_res(dummy)             200ns   300ns  2780485.        0B     31.6 1.06e6
#> 4 check_err_and_return(dummy)  200ns   300ns  3313410.        0B     27.4 1.21e6
#> # ... with 6 more variables: n_gc <dbl>, total_time <bch:tm>, result <list>,
#> #   memory <list>, time <list>, gc <list>

Created on 2022-01-31 by the reprex package (v2.0.1)

@multimeric
Copy link
Member Author

I feel the need to reinforce that returning a Result-like object into R is incredibly non-idiomatic. It works great in Rust, because the language is built around enums and strong typing. However the average R user of a library that returns a list(res, err) object would be initially confused and then annoyed that they have to check each output with an if statement. This is true even for recoverable errors, because R makes no distinction between an error and a panic like Rust does.

Further, all of the condition-handling suggested by Hadley that is common in R would become irrelevant as well, because we would not be throwing errors at all. Users will expect tryCatch to work and be very surprised when it's not that simple.

I realise that a Rust package author could solve this in their R wrapper layer, but I feel that it's better to just produce idiomatic R in the first place, so that an extra layer is not essential.

@yutannihilation
Copy link
Contributor

However the average R user of a library that returns a list(res, err)

Just to be clear, if you are talking about the interface, the wrapper never expose it to the user. It just returns res (or stop()s with err). The wrapper is not what a Rust package author need to solve manually, but is provided automatically by extendr.

@multimeric
Copy link
Member Author

multimeric commented Jan 31, 2022

Right, so the Rust → R interface returns a list(res, err) and then the auto-generated R wrappers convert this into a stop() call? That's better, but I still don't know why we need that intermediate step. Why can't we call stop() directly from Rust?

@shrektan
Copy link
Contributor

shrektan commented Jan 31, 2022

I'll summarise (again 😄) the discussion so far. Please correct me if I'm wrong:

  1. The panic!() in Rust should be captured and handled carefully.
    • "Carefully" means the stack and destructor in Rust runtime is cleaned and called.
    • It also means the error (with clear message) should be redirected to R.
    • "extendr" already handles the panic (Returning error messages to R #278 (comment) , thanks @andy-thomason) and unwind the stacks, but the message is not redirected.
  2. So "extendr" should:
    • suppress the message stream in Rust (as @multimeric and @yutannihilation pointed out, we can use std::panic::set_hook()) .
    • trigger R errors with correct message.
  3. The discussion now is about "how should R error be trigged", there're three opinions:
    • Use Rf_error() or Rf_errorcall() to trigger the error in Rust runtime. It means, "extendr" would throw error in Rust code. It will probably have to leak very small amount of memory of the error message, as the char can't be deallocated due to longjmp nature of R errors.
    • "extendr" returns an R object to R's runtime, even when panic occurs. rextendr generates R wrappers in the function call to decide whether a result be returned or an error be thrown. And @Ilia-Kosenkov and @yutannihilation are trying to find the minimal overhead way of handling this on R side.
    • The last opinion from @clauswilke is "extendr" returns an R object to R's C runtime. The only difference from the 2nd opinion is that the "wrapper" around handling the returned R object is happend in C code thus probably less overhead.

@shrektan
Copy link
Contributor

Right, so the Rust → R interface returns a list(res, err) and then the auto-generated R wrappers convert this into a stop() call? That's better, but I still don't know why we need that intermediate step. Why can't we call stop() directly from Rust?

I'll raise the same question, if it's not about avoiding the "very little memory leak". Adding a "condition class" to the error message, from my personally experience, is not useful. I think Rcpp and cpp11 call the plain R error message, too.

@multimeric
Copy link
Member Author

multimeric commented Jan 31, 2022

Can we not just allocate a string vector in R memory and then strcpy the Rust string into the vector, unprotect it so that the GC can handle it, and then let Rust free the Rust string memory?

Some Rust equivalent of:

SEXP err = PROTECT(allocVector(STRSXP, 1));
SET_STRING_ELT(err, 0, mkChar(our_error_message) );
UNPROTECT(1);

@yutannihilation
Copy link
Contributor

Thanks for the core question. I too thought this is a problem of memory leak, which is not considered as "unsafe" in Rust, but it seems the problem is the undefined behavior on cross-language unwinding.

Currently, when foreign (non-Rust) code invokes Rust code, the behavior of a panic!() unwind that "escapes" from Rust is explicitly undefined. Similarly, when Rust calls a foreign function that unwinds, the behavior once the unwind operation encounters Rust frames is undefined.
(https://blog.rust-lang.org/inside-rust/2020/02/27/ffi-unwind-design-meeting.html)

Sorry, I think I'm running out of time here as my current focus is graphics-related things... I hope Andy will give some vision on this eventually!

@sorhawell
Copy link
Contributor

sorhawell commented Aug 23, 2022

EDIT: Yikes. Little testing showed this to not be a good idea. Anything not dropped before was leaked when calling throw_r_error()

great thread :) just to clarify?

For an average extendr-user who just barely can wait for the new cool error handling :) , how much would one leak by doing the following until then?

pub fn r_unwrap<T, E>(x: Result<T, E>) -> T
where
    T: std::fmt::Debug,
    E: std::fmt::Debug + std::fmt::Display,
{
    x.map_err(|err| extendr_api::throw_r_error(err.to_string()))
        .unwrap()
}

A concrete example could be some method which would return a Result, but the current unwrap-panic! is not desirable. Instead here the final Result is unwrapped with r_unwrap()

image

Is it only one C-string or could it potentially be much more memory?

@sorhawell
Copy link
Contributor

sorhawell commented Aug 25, 2022

I suggest the following pattern (git repo) to use opt-in list(ok = T, err = E)

pros

  • opt-in, works today (e.g. with 0.3.1)
  • use one generic function to wrap any* extendr exportable result
  • end-Ruser do not have to know about result-unwrap
  • full control of error handling on R side
  • no memory leakage

cons

  • any exported function call to rust must be wrapped in function to resolve the returned result
  • slightly increased overhead
  • the generic do not support ? error bubbling. (Lower level invoked functions can still use ? as normal)

*assuming error implements Display. The bound could perhaps be relaxed to something that implements Display and/or IntoRobj via a some super trait.

Rust code

use extendr_api::prelude::{extendr, extendr_module, list, GetSexp, IntoRobj}; // or just use extendr_api::prelude::*;
use extendr_api::NULL;

//convert any extendr exportalbe Result into either list(ok=ok_value,err=NULL) or list(ok=NULL,err=err_string)
//... where error implements Display. This error traitbound could be resolved on R side instead.
pub fn r_result_list<T, E>(x: std::result::Result<T, E>) -> list::List
where
   T: std::fmt::Debug + IntoRobj,
   E: std::fmt::Debug + std::fmt::Display,
{
   if x.is_ok() {
       list!(ok = x.unwrap().into_robj(), err = NULL)
   } else {
       list!(ok = NULL, err = x.unwrap_err().to_string())
   }
}

//some function using the r_result_list generic function to export
#[extendr]
fn rust_hello_result(is_ok: bool) -> list::List {
   //some silly logic to emulate Result
   let result = if is_ok {
       Ok(42i32)
   } else {
       Err("rust world error: is_ok must be set to true")
   };

   //instead of returning Result<T,E> then wrap like this and return list::List
   r_result_list(result)
}

extendr_module! {
   mod helloresult;
   fn rust_hello_result;
}

R code

#' rust like unwrapping of result. Useful to keep error handling on the R side.
#'
#' @param result a list with two elements named 'ok' and 'err'. One and only one of them must NULL or missing.
#' @param call a context call for what triggered the error, passed to abort(footer=). abort(call=) will annoyingly not display anonymous funcitons.
#' @param ... any other param to rlang::abort
#' 
#' @importFrom rlang abort
#'
#' @return eiter the ok value or no return but throw error by the err string
#' @export
#'
#' @examples unwrap(extendr_result:::rust_hello_result(TRUE))
unwrap = function(result,call=sys.call(1L),...) {
  
  #if not a result
  if(!is.list(result)) {
    abort("internal error: cannot unwrap non result",.internal = TRUE)
  }
  
  #if result is ok
  if(!is.null(result$ok) && is.null(result$err)) {
    return(result$ok)
  }
  
  #if result is error
  if( is.null(result$ok) && !is.null(result$err)) {
    return(abort(
      result$err,
      call = NULL,
      footer = paste(
        "when calling:\n",
        paste(capture.output(print(call)),collapse="\n")
      ),
      ...
    ))
  }
  
  #if not ok XOR error, then roll over
  abort("internal error: result object corrupted",.internal = TRUE)
}


#' example user function
#'
#' @param ok bool TRUE or FALSE, should result be ok (or and error)
#'
#' @return r_result_list
#' @export
#' 
#' @importFrom rlang is_bool
#'
#' @examples stopifnot(r_user_function(is_ok=TRUE)==42L)
r_user_function = function(is_ok) {
  if(!is_bool(is_ok)) stop("ok arg must be either scalar TRUE or FALSE")
  result = rust_hello_result(is_ok)
  unwrap(result)
}

@Ilia-Kosenkov
Copy link
Member

We had this discussion multiple times, and there is no consensus. There are arguments in favor of different implementations.
I guess this can be a default behavior if a return type is some Result<RObj, Error>. The problem is, if the return type is a list, it is always an extra allocation.
One solution could is that on error result, we return a condition, e.g. created with simpleError() (we do not care what to return from Rust since it is always an RObj). On R side we check if the thing returned from R is a correct Rust-created condition, we throw it as error with say stop(), otherwise we return it as a valid output from Rust.

Again, many ways to solve this problem, feel free to make an attempt at PR.

@sorhawell
Copy link
Contributor

sorhawell commented Aug 26, 2022

Yeah many ways. I think of it as an iron triangle of Speed, Safety and 'Good' error handling. Unless the C-wizards :) come up with a really nice have-it-all solution, it will be a compromise.

The current default behavior, "unwrap->panic->unwind" favors Speed+Safety and might actually be a fine default behavior.

I suggest to add 2 or 3 handy generic functions and a explanation in the learning materials how the extendr_api-user can opt-in to other mixes of speed-safety-goodError.

list(ok,err) favors Safety+GoodError
As newbie, I try to understand the source code of list! it invokes List::from_values_and_names which finally uses make_vector and SET_VECTOR_ELT it seems to not use cloning but only moving. I'm I wrong on that?. In that case it is just the allocation of a two element list object and might not be too bad.

your mentioned simpleCondition-option has Safety+GoodError and retains speed on the happy path by avoiding the list allocation.

The generic function would look like this then

pub fn r_result_condition<T, E>(x: std::result::Result<T, E>) -> Robj
where
    T: std::fmt::Debug + IntoRobj,
    E: std::fmt::Debug + std::fmt::Display,
{
    if x.is_ok() {
        x.unwrap().into_robj()
    } else {
        let err_call_string = format!("simpleError(message = '{}')", x.unwrap_err().to_string());
        let robj = extendr_api::eval_string(&err_call_string).unwrap();
        robj
    }
}

usage in rust could look like

#[extendr]
fn rust_hellocondition(is_ok: bool) -> Robj {
    //some silly logic to emulate Result
    let result = if is_ok {
        Ok(42i32)
    } else {
        Err("rust world error: is_ok must be set to true")
    };

    //instead of returning Result<T,E> then wrap like this and return ok Robj or R simpleError condition
    r_result_condition(result)
}

and on R side the extendr_api-user would write a function like

check_condition = function(value_or_condition) {
  
  #if ok, just return value as is 
  if(!rlang::is_error(value_or_condition)) return(value_or_condition)
  
  #else throw error condition
  abort(
    value_or_condition$message
  )

}

I will try it out for some weeks and make a PR if I still like it :)

@Ilia-Kosenkov
Copy link
Member

The end goal would be to update generates code as well. If a rust FN returns a result of supported shape, then correctly process it and send to R. And on the R side, generate wrapper that not only makes a call into native lib, but also throws an error if it recognizes it is indeed an error.

Then, from users perspective, it will be super smooth.

@sorhawell
Copy link
Contributor

sorhawell commented Aug 26, 2022

How about extending the #[extendr] macro to take an optional argument which decides the behavior. The default behavior can be "smooth for end-users who just wanna try writing a few functions". "Need for speed"-users and "error control"-people can just opt for other behaviors.

//Default, intuitively like any other R function, return value or raise error
// extendr-macro will use above generic r_result_condition + modify extendr-wrappers.R to always check for condition for this specific function
#[extendr(err_condition)]
pub fn rust_result_condition(is_ok: bool) -> Result<i32, &str> {
  todo!("some code returning a Result")
}

// unwrap-panic, good for speed and simplicity
#[extendr(err_panic)]
pub fn rust_result_panic(is_ok: bool) -> Result<i32, &str> {
  todo!("some code returning a Result")
}

// always return list(ok, err), good for explicit error handling where the rust function is not exposed directly to end-user anyways.
// R developer must manually unwrap err on R side, but do not have to use tryCatch, more like return from purrr::safely
#[extendr(err_result)]
pub fn rust_result_list(is_ok: bool) -> Result<i32, &str> {
  todo!("some code returning a Result")
}

@Ilia-Kosenkov
Copy link
Member

I am not sure the flag is needed, you can communicate it with a return type. If it is something like T : impl IntoRobj, you panic, if however it is a Result<T : impl IntoRobj, Error> (or whatever the correct syntax is, you get the idea), you unwrap it in R and throw there.
If you are not part of the discord community, I strongly suggest you join -- it is much easier to discuss these things there first, in an informal way.

@yutannihilation
Copy link
Contributor

yutannihilation commented Feb 12, 2023

Let me share my proof-of-concept implementation using tagged pointer, which is in the direction of #278 (comment). I have no intention (and bandwidth) to promote this, at least at the moment, but hope this helps the discussion here somehow.

Rust code: https://github.com/yutannihilation/unextendr/blob/ec88a43411a54a720b4071b7c0adbfe597fda18e/src/rust/src/lib.rs#L33-L77 (updated after Ilia's comment)
C code: https://github.com/yutannihilation/unextendr/blob/d757eb998a8bbc686c38f0b7c1a80b967893cf4d/src/init.c#L4-L15

The main advantage is that the additional cost is only one bitwise operation, but I'm not sure how portable this is...

@Ilia-Kosenkov
Copy link
Member

What do you achieve with flag_sexp(res, false) upon success? It should do nothing (you execute bitwise x | 0)

@yutannihilation
Copy link
Contributor

Aw, it was my mistake, sorry. The intention was to ensure the error flag is unset (so I should write it as x & !1), but thinking this again, it's unnecessary because probably unaligned pointer address cannot be a valid SEXP so it should have failed before this point. In short, it does nothing, but it doesn't matter.

@yutannihilation
Copy link
Contributor

My toy framework now reached a usable state. To discuss about error handling, it might be a good example to play with. I have no idea how I can fit the implementation into extendr, but feedback is welcome.

How to create a package: https://github.com/yutannihilation/savvy#getting-started
Full documentation (WIP): https://docs.rs/savvy/latest/savvy/

#[savvy]
fn raise_error() -> savvy::Result<savvy::SEXP> {
    Err(savvy::Error::new("This is my custom error"))
}
raise_error()
#> Error: This is my custom error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants