Returning error messages to R #278

multimeric · 2021-09-03T16:25:14Z

Is there an easy way to return strings from Rust that will become errors, warnings or messages in R? If I remember right, simply returning an Error will just panic since it just gets unwrapped.

multimeric · 2021-09-04T02:26:11Z

Okay it looks like we have an existing wrapper which is throw_r_error:

extendr/extendr-api/src/thread_safety.rs

Lines 91 to 98 in 2103979

    
           pub fn throw_r_error<S: AsRef<str>>(s: S) { 
        
               let s = s.as_ref(); 
        
               unsafe { 
        
                   R_ERROR_BUF = Some(std::ffi::CString::new(s).unwrap()); 
        
                   libR_sys::Rf_error(R_ERROR_BUF.as_ref().unwrap().as_ptr()); 
        
                   unreachable!("Rf_error does not return"); 
        
               }; 
        
           }

I note there's also an Rf_warning which can be used the same way as Rf_error for warnings too. There's also R_ShowMessage, but neither of these have a nice Rust wrapper so we would have to manually convert the string into a C string as in the above snippet.

andy-thomason · 2021-09-04T21:54:24Z

This sounds like something to look at.

Would you like to try a PR or would you like one of us to give it a try?

multimeric · 2021-09-05T01:32:11Z

If it's as simple as implementing a simple Rust wrapper like this for each of those APIs I'm happy to do it.

By the way, we have this wrapper for returning errors, but afaik none of the TryFrom Robj implementations actually use this. So I'm guessing that means that errors in a Rust library will not return a user-friendly R error but will show a less friendly Rust error? Am I right?

multimeric · 2021-09-06T01:17:32Z

Notes for future implementation:

R_ShowMessage is somewhat explained here: https://cran.r-project.org/doc/manuals/R-exts.html#Setting-R-callbacks
Rf_warning and Rf_error are explained here: https://cran.r-project.org/doc/manuals/r-release/R-ints.html#Warnings-and-errors

andy-thomason · 2021-09-14T10:05:56Z

We did look at handling Result types in wrappers. It may still do so.

multimeric · 2021-09-18T02:15:38Z

The thing that worries me is this:

extendr/extendr-api/src/robj/into_robj.rs

Lines 37 to 45 in 7e9cac3

    
           impl<T> From<Result<T>> for Robj 
        
           where 
        
               T: Into<Robj>, 
        
           { 
        
               fn from(res: Result<T>) -> Self { 
        
                   // Force a panic on error. 
        
                   res.unwrap().into() 
        
               } 
        
           }

So if our Rust function returns a result, and that result is an error, then it will just get unwrapped in Rust. I guess the error message will get printed to the terminal maybe (?) but I suspect this isn't very user friendly. Also will this cause existing Rust objects in R to break, since I assume the memory will be reclaimed when the library crashes?

andy-thomason · 2021-09-26T22:54:26Z

This only works because we catch panics and convert them to R errors.
Panics should unwind the stack, dropping objects along the way.
We should check that we have a test for this, however.

A nicer way would be to have a R class "RustError" or similar
which wraps a string. We can use the Display trait to generate
the error message.

The jury has been out on this, but we could put it on a switch.

multimeric · 2021-09-26T23:33:41Z

Huh, I didn't even know you could catch panics. Well yes I would find it infinitely cleaner to just go Rust Error → R Error and bypass the use of panic entirely. Is there some kind of way to vote on this issue?

andy-thomason · 2021-09-27T07:54:19Z

The magic is here. https://github.com/extendr/extendr/blob/1bdd03f08c14a781ba7b3d0ae628b10cf2425b95/extendr-macros/src/wrappers.rs#L106 *handle_panic* uses *catch_unwind* to do the conversion. In theory, the unwind should free any allocated variables, especially the formal arguments which are reference counted *Robj*s. You are probably not alone with the "straight to R error" but it is very difficult to catch R errors without the danger of memory leaks. *make_funtion_wrappers* has and *options* parameter that can be driven from the *#[extendr(param=value)]* attribute. This would allow some customisation.

…

On Mon, Sep 27, 2021 at 12:33 AM Michael Milton ***@***.***> wrote: Huh, I didn't even know you *could* catch panics. Well yes I would find it infinitely cleaner to just go Rust Error → R Error and bypass the use of panic entirely. Is there some kind of way to vote on this issue? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#278 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAL36XGD2VKXCJAKKFNFPE3UD6UV7ANCNFSM5DMCPUAQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

multimeric · 2021-10-02T00:43:08Z

Okay so in summary what I'm hearing is that the correct way to raise an error using extendr is simply to return a Result which will be unwrapped and thereby converted to an R error. In the future this may change.

clauswilke · 2021-10-02T01:02:30Z

To be clear: I don't think we'll change the part where you return a Result. How exactly the handling of that is implemented and communicated to R could change.

multimeric · 2021-10-02T01:08:50Z

Sorry yes that's what I meant: that the unwrapping panic part may change but that returning a Result will stay. Which I think is ideal because that's the most idiomatic way to indicate an error in Rust.

shrektan · 2022-01-25T16:38:38Z

Sorry guys, I don't follow.

Let's say we made a Rust function foo() -> Result<Robj> and it returns Err(Error::Other("xxx".to_string())).

Now rextendr will just unwrap the result leading to the case that the error message printed in R is ... foo panicked. with the real error message "xxx" being passed to the stderr, thus can't be captured by R, correct?

I can't understand the reason that we can't return the correct message to R because cpp11 and Rcpp already did this job, didn't they?

(To be clear, what I want is R calls a Rust function, which may throw Rust errors. The errors are not in R's stderr stream. Thus, it looks strange.
I mean what I'm trying to ask is not about calling an R function in Rust. I know this would be very difficult to handle as when an error happens, R will perform a long jump so Rust can't clean up the stack.)

Thanks.

andy-thomason · 2022-01-27T15:57:12Z

In C++ you can call longjump and just ignore the mess this creates, but Rust is a little more
careful - longjump does not exist in Rust, nor should it.

We did discuss having an option to return an object with a "rust.error" class or similar.
This would be far cleaner than converting a panic to a R longjump.

Something like

res <- myrustfunc();
res$ok() gives the result or NULL
res$err() gives the error or NULL

This was not popular with the team and got pushed back.

multimeric · 2022-01-27T16:08:34Z

I still don't think we've addressed my suggestion above which is to just call the R API's throw error function, because I believe unlike the other methods, it will behave how R users expect and be catchable using tryCatch. Returning an object on failure wouldn't be idiomatic to R, nor is just printing to the message to stdout which I think is what we're doing now.

andy-thomason · 2022-01-27T16:27:06Z

We do have a function for throwing R errors. This is very dangerous, however
as it could leave objects on the stack in an undefined state.

In the wrapper, we could assume that objects have been dropped, but we would
need annotation to handle errors.

If the error supports Display (a fair assumption) it should be possible
to use to_string() to generate the error message.

#[extendr(throw_result_errors=true)]
fn i_throw_stuff(input: Doubles) -> Result<Doubles> {
   ...
}

clauswilke · 2022-01-27T16:41:48Z

Returning an object on failure wouldn't be idiomatic to R,

Exactly, that's why I pushed back against it. Since R uses longjumps on error, I think the Rust code needs to somehow emulate this also. Would it help to write some C intermediary that wraps everything, so that the Rust code can fully shut down and return to C, and then the C code calls the longjump?

yutannihilation · 2022-01-28T01:50:42Z

In the wrapper, we could assume that objects have been dropped, but we would need annotation to handle errors.

Using a wrapper sounds good to me (If I remember correctly, it was what @shrektan suggested on Discord). But, I don't get why we need annotation. Are there anything ambiguous without that annotation switch?

multimeric · 2022-01-28T01:55:52Z

We do have a function for throwing R errors. This is very dangerous, however
as it could leave objects on the stack in an undefined state.

Sorry, to clarify I mean: add a panic hook (like we already have), but in that hook we call the Rf_error function with the error string. That way we get appropriate behaviour but also the stack has been unwound.

yutannihilation · 2022-01-28T02:35:45Z

I was just trying in that direction. Unlike with normal errors, std::thread::Error doesn’t implement the Error trait, but it seems we can downcast it.

One disadvantage of this is probably we cannot add custom classes to the error, because panic!() can only take a bare string, but I think this is also an option.

pub fn handle_panic<F, R>(err_str: &str, f: F) -> R
where
    F: FnOnce() -> R,
    F: std::panic::UnwindSafe,
{
    // Surpress outputs from the default hook (c.f. https://stackoverflow.com/a/59211505)
    let prev_hook = std::panic::take_hook();
    std::panic::set_hook(Box::new(|_| {}));

    let result = match std::panic::catch_unwind(f) {
        Ok(res) => res,
        Err(e) => {
            let err_str = if let Some(e) = e.downcast_ref::<&str>() {
                e.as_ptr()
            } else {
                err_str.as_ptr()
            };
            unsafe {
                libR_sys::Rf_error(err_str as *const std::os::raw::c_char);
            }
            unreachable!("handle_panic unreachable")
        }
    };

    // Restore the previous hook
    std::panic::set_hook(prev_hook);

    result
}

This almost worked, but I don't know why I get this extra "src/lib.rs" string...

#[extendr]
fn hello_world() -> &'static str {
    panic!("Hello world!");
}

hello_world()
#> Error in hello_world() : Hello world!src\lib.rs

multimeric · 2022-01-28T02:49:18Z

Just a guess, is it because the string pointer you're creating is to a string without a null terminator, so it's reading off the end of the string onto another one? Can you convert to a CString first?

yutannihilation · 2022-01-28T02:59:34Z

Ah, it might be. Thanks.

andy-thomason · 2022-01-28T12:00:08Z

It is worth noting that a CString will leak memory as nothing will deallocate it.

This may not be a problem for occasional errors - I'm pretty certain that R also will leak
memory on errors, this is a fact of longjump.

yutannihilation · 2022-01-28T12:22:15Z

~~Oh, what I saw was the opposite. If I convert it to a CString, no error message is shown probably because the memory is already deallocated? But, I don't investigate it further.~~ (this was my mistake: #278 (comment)) I think we might be able to use R_alloc() here somehow.

Anyway, I agree I was too naive about allocation during handling panic. I also notice I cannot pass a non-constant string to panic!().

shrektan · 2022-01-28T16:20:32Z

I did some simple investigation but don't have time to look further yet.

The cargo package does this by using a macro to wrap the Rust code and unwind the stack when panicked, then converting the message into an R error. @yutannihilation You may find the following code helpful.

https://github.com/dbdahl/cargo-framework/blob/b7fa0a8eb7f9c8f9249e3c20b6e930607b8d8de2/cargo/inst/template/src/rustlib/roxido_macro/src/lib.rs#L143-L176

dfalbel · 2022-01-28T16:47:04Z

We did discuss having an option to return an object with a "rust.error" class or similar. This would be far cleaner than converting a panic to a R longjump.

Something like
res <- myrustfunc();
res$ok() gives the result or NULL
res$err() gives the error or NULL

Would it be possible to this, but let the autogenerated R wrappers raise the error?
For example, currently the R wrapper is something like:

RModelsBpe$read_file <- function(vocab, merges) .Call(wrap__RModelsBpe__read_file, vocab, merges)

It could be something like:

RModelsBpe$read_file <- function(vocab, merges)  {
  res <- Call(wrap__RModelsBpe__read_file, vocab, merges)
  if (!is.null(res$err())) stop(res$err)
  res$ok()
}

This adds an overhead for every function call, we could provide a way for users to disable this when they are sure no error is going to be raised.

yutannihilation · 2022-01-29T13:51:20Z

I think it's also a possible solution. So, we have plenty of options... (not sure if they all are really feasible, though) I have no idea how we can decide.

throw an error in the R wrapper (@dfalbel, Returning error messages to R #278 (comment))
throw an error in the C wrapper (@clauswilke, Returning error messages to R #278 (comment))
throw an error in the Rust code in handle_panic (@multimeric, Returning error messages to R #278 (comment))
throw an error in the Rust code in the Rust wrapper generated by macro (@shrektan, Returning error messages to R #278 (comment))

multimeric · 2022-01-29T14:28:42Z

I thought std::panic::set_hook was global, so that we can catch hooks thrown from any crates we've loaded?

@yutannihilation The other dimension is how we return it to R, which is an orthogonal question to where we handle it. We've had suggestions of :

Return an error object to R which points to Rust memory
Use Rf_error to throw a string
Just print to stderr and return missing

yutannihilation · 2022-01-29T15:07:18Z

So, all other alternatives only solve part of the problem, i.e., the rust panics that package author writes.

No, I don't think so. We do want to capture all sorts of panics, but it's to protect the user's R session from crash, not to handle it nicely. What we are discussing here is how to handle Error, right?

yutannihilation · 2022-01-31T01:32:12Z

After some thinking, I'd vote the R wrapper. The Rust wrapper can simply return an error condition object. Here's my attempt (I just tweaked the handle_panic() and didn't modify the Rust wrappers yet):

master...yutannihilation:poc/raise-error-in-R-wrapper

While this can be also possible in C code, but I think it's easier to throw this nicer with metadata on R's code.

clauswilke · 2022-01-31T04:45:45Z

After some thinking, I'd vote the R wrapper.

That's a lot of performance overhead though if every function that dispatches to Rust has this R-side error handling.

yutannihilation · 2022-01-31T05:24:07Z

Yeah, it seems if (inherits(x, "rust_error")) takes around 0.5ms on my local, so I agree the performance overhead might be a concern, depending on the use case.

Ilia-Kosenkov · 2022-01-31T08:54:34Z

I would like again to point out that handling errors on the R side can be an opt-in, determined by the return type. If it is a Result<T, E>, we take extra steps and convert it into list(res = T, err = E), and then perform a check in R wrapper, comparing result$err with NULL. I actually tried to simulate this behavior and see what is the magnitude of the impact. For some reason, it is not constant but scales with the load. Perhaps there is something I am missing.
I am simulating Result<T, E> by explicitly returning a list. Perhaps this can be sped up.
It seems that the order of if branches is important. It looks like it is faster to check for res$res and immediately return it in if branch rather than check for res$err and return result in else branch. The hot path should go under the first if.

Benchmark results

rextendr::rust_source(
    code = "
    use extendr_api::prelude::*;
    #[extendr]
    fn get_with_result_impl(x : Doubles, y : Doubles) -> List {
        let res : Rfloat =
            x.iter().zip(y.iter())
            .map(|(xx, yy)| xx + yy)
            .sum();
        list!(res = res)
    }
    #[extendr]
    fn get_without_result(x : Doubles, y : Doubles) -> Rfloat {
        x.iter().zip(y.iter())
            .map(|(xx, yy)| xx + yy)
            .sum()
    }
    ",
    patch.crates_io = list(
        `extendr-api` = list(git = "https://github.com/extendr/extendr"),
        `libR-sys` = list(git = "https://github.com/extendr/libR-sys")
    )
)
#> i build directory: 'C:/Users/[redacted]/AppData/Local/Temp/RtmpQdCHWc/file27cc48be2316'
#> v Writing 'C:/Users/[redacted]/AppData/Local/Temp/RtmpQdCHWc/file27cc48be2316/target/extendr_wrappers.R'.

get_with_result <- function(x, y) {
    res <- .Call("wrap__get_with_result_impl", x, y, PACKAGE = "rextendr1")
    if (!is.null(res$err)) {
        stop(res$err)
    }
    res$res
}

get_with_result_rev <- function(x, y) {
    res <- .Call("wrap__get_with_result_impl", x, y, PACKAGE = "rextendr1")
    if (!is.null(res$res)) {
        return(res$res)
    }
    stop(res$err)
}

test_list_access <- function(x) {
    if (!is.null(x$err)) {
        stop(x$err)
    }
    x$res
}

test_return_result <- function(x) x

bench::press(
    n =  c(1, 1e2, 1e4, 1e6, 1e7),
    {
        x <- rnorm(n)
        y <- rnorm(n)
        reference <- sum(x + y)
        dummy <- list(res = reference)
        bench::mark(
            test_return_result(reference),
            test_list_access(dummy),
            test_list_access(list(result = reference)),
            get_without_result(x, y),
            get_with_result(x, y),
            get_with_result_rev(x, y),
        )
    }
) |> print(n = 100)
#> Running with:
#>          n
#> 1        1
#> 2      100
#> 3    10000
#> 4  1000000
#> 5 10000000
#> # A tibble: 30 x 14
#>    expression                                        n     min  median `itr/sec`
#>    <bch:expr>                                    <dbl> <bch:t> <bch:t>     <dbl>
#>  1 test_return_result(reference)                     1   100ns   200ns    4.33e6
#>  2 test_list_access(dummy)                           1   200ns   300ns    2.82e6
#>  3 test_list_access(list(result = reference))        1   400ns   500ns    1.87e6
#>  4 get_without_result(x, y)                          1   5.1us   5.5us    1.78e5
#>  5 get_with_result(x, y)                             1  10.1us  10.8us    8.93e4
#>  6 get_with_result_rev(x, y)                         1   9.9us  10.8us    9.13e4
#>  7 test_return_result(reference)                   100   100ns   200ns    5.16e6
#>  8 test_list_access(dummy)                         100   200ns   300ns    2.73e6
#>  9 test_list_access(list(result = reference))      100   400ns   500ns    1.93e6
#> 10 get_without_result(x, y)                        100  13.2us  13.9us    7.13e4
#> 11 get_with_result(x, y)                           100  18.2us  19.6us    5.06e4
#> 12 get_with_result_rev(x, y)                       100  18.1us  19.2us    5.17e4
#> 13 test_return_result(reference)                 10000   100ns   200ns    5.03e6
#> 14 test_list_access(dummy)                       10000   200ns   300ns    2.83e6
#> 15 test_list_access(list(result = reference))    10000   400ns   500ns    1.88e6
#> 16 get_without_result(x, y)                      10000 737.5us 757.5us    1.31e3
#> 17 get_with_result(x, y)                         10000 747.2us 769.5us    1.30e3
#> 18 get_with_result_rev(x, y)                     10000 740.8us 762.5us    1.31e3
#> 19 test_return_result(reference)               1000000   100ns   200ns    5.27e6
#> 20 test_list_access(dummy)                     1000000   200ns   300ns    2.75e6
#> 21 test_list_access(list(result = reference))  1000000   400ns   500ns    1.96e6
#> 22 get_without_result(x, y)                    1000000  74.6ms    75ms    1.33e1
#> 23 get_with_result(x, y)                       1000000  74.8ms    75ms    1.33e1
#> 24 get_with_result_rev(x, y)                   1000000  74.9ms    75ms    1.33e1
#> 25 test_return_result(reference)              10000000   100ns   200ns    5.36e6
#> 26 test_list_access(dummy)                    10000000   200ns   300ns    2.81e6
#> 27 test_list_access(list(result = reference)) 10000000   400ns   500ns    1.99e6
#> 28 get_without_result(x, y)                   10000000 760.4ms 760.4ms    1.32e0
#> 29 get_with_result(x, y)                      10000000 753.3ms 753.3ms    1.33e0
#> 30 get_with_result_rev(x, y)                  10000000 748.7ms 748.7ms    1.34e0
#> # ... with 9 more variables: mem_alloc <bch:byt>, gc/sec <dbl>, n_itr <int>,
#> #   n_gc <dbl>, total_time <bch:tm>, result <list>, memory <list>, time <list>,
#> #   gc <list>

^{Created on 2022-01-31 by the reprex package (v2.0.1)}

Ilia-Kosenkov · 2022-01-31T09:09:42Z

UPD: An interesting observation. It seems the fastest way (based on the number of iterations instead of measured time/ops) is to check res$err for NULL and return res$res in if.

Benchmark results

stop_in_if <- function(res) {
    if (!is.null(res$err)) {
        stop(res$err)
    }
    res$res
}

return_in_if <- function(res) {
    if (!is.null(res$res)) {
        return(res$res)
    }
    stop(res$err)
}

cache_res <- function(res) {
    tmp <- res$res
    if (!is.null(tmp)) {
        return(tmp)
    }
    stop(res$err)
}

check_err_and_return <- function(res) {
    if (is.null(res$err)) {
        return(res$res)
    }
    stop(res$err)
}


x <- rnorm(1e6)
y <- rnorm(1e6)
reference <- sum(x + y)
dummy <- list(res = reference)
bench::mark(
    stop_in_if(dummy),
    return_in_if(dummy),
    cache_res(dummy),
    check_err_and_return(dummy),
    max_iterations = 1e8
) |> print(n = 100)
#> # A tibble: 4 x 13
#>   expression                     min  median `itr/sec` mem_alloc `gc/sec`  n_itr
#>   <bch:expr>                  <bch:> <bch:t>     <dbl> <bch:byt>    <dbl>  <int>
#> 1 stop_in_if(dummy)            200ns   300ns  3083301.        0B     28.2 1.09e6
#> 2 return_in_if(dummy)          200ns   300ns  2970877.        0B     29.3 1.11e6
#> 3 cache_res(dummy)             200ns   300ns  2780485.        0B     31.6 1.06e6
#> 4 check_err_and_return(dummy)  200ns   300ns  3313410.        0B     27.4 1.21e6
#> # ... with 6 more variables: n_gc <dbl>, total_time <bch:tm>, result <list>,
#> #   memory <list>, time <list>, gc <list>

^{Created on 2022-01-31 by the reprex package (v2.0.1)}

multimeric · 2022-01-31T09:34:11Z

I feel the need to reinforce that returning a Result-like object into R is incredibly non-idiomatic. It works great in Rust, because the language is built around enums and strong typing. However the average R user of a library that returns a list(res, err) object would be initially confused and then annoyed that they have to check each output with an if statement. This is true even for recoverable errors, because R makes no distinction between an error and a panic like Rust does.

Further, all of the condition-handling suggested by Hadley that is common in R would become irrelevant as well, because we would not be throwing errors at all. Users will expect tryCatch to work and be very surprised when it's not that simple.

I realise that a Rust package author could solve this in their R wrapper layer, but I feel that it's better to just produce idiomatic R in the first place, so that an extra layer is not essential.

yutannihilation · 2022-01-31T09:59:14Z

However the average R user of a library that returns a list(res, err)

Just to be clear, if you are talking about the interface, the wrapper never expose it to the user. It just returns res (or stop()s with err). The wrapper is not what a Rust package author need to solve manually, but is provided automatically by extendr.

multimeric · 2022-01-31T10:02:53Z

Right, so the Rust → R interface returns a list(res, err) and then the auto-generated R wrappers convert this into a stop() call? That's better, but I still don't know why we need that intermediate step. Why can't we call stop() directly from Rust?

shrektan · 2022-01-31T10:26:34Z

I'll summarise (again 😄) the discussion so far. Please correct me if I'm wrong:

The panic!() in Rust should be captured and handled carefully.
- "Carefully" means the stack and destructor in Rust runtime is cleaned and called.
- It also means the error (with clear message) should be redirected to R.
- "extendr" already handles the panic (Returning error messages to R #278 (comment) , thanks @andy-thomason) and unwind the stacks, but the message is not redirected.
So "extendr" should:
- suppress the message stream in Rust (as @multimeric and @yutannihilation pointed out, we can use std::panic::set_hook()) .
- trigger R errors with correct message.
The discussion now is about "how should R error be trigged", there're three opinions:
- Use Rf_error() or Rf_errorcall() to trigger the error in Rust runtime. It means, "extendr" would throw error in Rust code. It will probably have to leak very small amount of memory of the error message, as the char can't be deallocated due to longjmp nature of R errors.
- "extendr" returns an R object to R's runtime, even when panic occurs. rextendr generates R wrappers in the function call to decide whether a result be returned or an error be thrown. And @Ilia-Kosenkov and @yutannihilation are trying to find the minimal overhead way of handling this on R side.
- The last opinion from @clauswilke is "extendr" returns an R object to R's C runtime. The only difference from the 2nd opinion is that the "wrapper" around handling the returned R object is happend in C code thus probably less overhead.

shrektan · 2022-01-31T10:34:07Z

Right, so the Rust → R interface returns a list(res, err) and then the auto-generated R wrappers convert this into a stop() call? That's better, but I still don't know why we need that intermediate step. Why can't we call stop() directly from Rust?

I'll raise the same question, if it's not about avoiding the "very little memory leak". Adding a "condition class" to the error message, from my personally experience, is not useful. I think Rcpp and cpp11 call the plain R error message, too.

multimeric · 2022-01-31T10:47:25Z

Can we not just allocate a string vector in R memory and then strcpy the Rust string into the vector, unprotect it so that the GC can handle it, and then let Rust free the Rust string memory?

Some Rust equivalent of:

SEXP err = PROTECT(allocVector(STRSXP, 1));
SET_STRING_ELT(err, 0, mkChar(our_error_message) );
UNPROTECT(1);

yutannihilation · 2022-01-31T11:34:19Z

Thanks for the core question. I too thought this is a problem of memory leak, which is not considered as "unsafe" in Rust, but it seems the problem is the undefined behavior on cross-language unwinding.

Currently, when foreign (non-Rust) code invokes Rust code, the behavior of a panic!() unwind that "escapes" from Rust is explicitly undefined. Similarly, when Rust calls a foreign function that unwinds, the behavior once the unwind operation encounters Rust frames is undefined.
(https://blog.rust-lang.org/inside-rust/2020/02/27/ffi-unwind-design-meeting.html)

Sorry, I think I'm running out of time here as my current focus is graphics-related things... I hope Andy will give some vision on this eventually!

sorhawell · 2022-08-23T08:30:49Z

EDIT: Yikes. Little testing showed this to not be a good idea. Anything not dropped before was leaked when calling throw_r_error()

great thread :) just to clarify?

For an average extendr-user who just barely can wait for the new cool error handling :) , how much would one leak by doing the following until then?

pub fn r_unwrap<T, E>(x: Result<T, E>) -> T
where
    T: std::fmt::Debug,
    E: std::fmt::Debug + std::fmt::Display,
{
    x.map_err(|err| extendr_api::throw_r_error(err.to_string()))
        .unwrap()
}

A concrete example could be some method which would return a Result, but the current unwrap-panic! is not desirable. Instead here the final Result is unwrapped with r_unwrap()

Is it only one C-string or could it potentially be much more memory?

sorhawell · 2022-08-25T08:21:34Z

I suggest the following pattern (git repo) to use opt-in list(ok = T, err = E)

pros

opt-in, works today (e.g. with 0.3.1)
use one generic function to wrap any* extendr exportable result
end-Ruser do not have to know about result-unwrap
full control of error handling on R side
no memory leakage

cons

any exported function call to rust must be wrapped in function to resolve the returned result
slightly increased overhead
the generic do not support ? error bubbling. (Lower level invoked functions can still use ? as normal)

*assuming error implements Display. The bound could perhaps be relaxed to something that implements Display and/or IntoRobj via a some super trait.

Rust code

use extendr_api::prelude::{extendr, extendr_module, list, GetSexp, IntoRobj}; // or just use extendr_api::prelude::*;
use extendr_api::NULL;

//convert any extendr exportalbe Result into either list(ok=ok_value,err=NULL) or list(ok=NULL,err=err_string)
//... where error implements Display. This error traitbound could be resolved on R side instead.
pub fn r_result_list<T, E>(x: std::result::Result<T, E>) -> list::List
where
   T: std::fmt::Debug + IntoRobj,
   E: std::fmt::Debug + std::fmt::Display,
{
   if x.is_ok() {
       list!(ok = x.unwrap().into_robj(), err = NULL)
   } else {
       list!(ok = NULL, err = x.unwrap_err().to_string())
   }
}

//some function using the r_result_list generic function to export
#[extendr]
fn rust_hello_result(is_ok: bool) -> list::List {
   //some silly logic to emulate Result
   let result = if is_ok {
       Ok(42i32)
   } else {
       Err("rust world error: is_ok must be set to true")
   };

   //instead of returning Result<T,E> then wrap like this and return list::List
   r_result_list(result)
}

extendr_module! {
   mod helloresult;
   fn rust_hello_result;
}

R code

#' rust like unwrapping of result. Useful to keep error handling on the R side.
#'
#' @param result a list with two elements named 'ok' and 'err'. One and only one of them must NULL or missing.
#' @param call a context call for what triggered the error, passed to abort(footer=). abort(call=) will annoyingly not display anonymous funcitons.
#' @param ... any other param to rlang::abort
#' 
#' @importFrom rlang abort
#'
#' @return eiter the ok value or no return but throw error by the err string
#' @export
#'
#' @examples unwrap(extendr_result:::rust_hello_result(TRUE))
unwrap = function(result,call=sys.call(1L),...) {
  
  #if not a result
  if(!is.list(result)) {
    abort("internal error: cannot unwrap non result",.internal = TRUE)
  }
  
  #if result is ok
  if(!is.null(result$ok) && is.null(result$err)) {
    return(result$ok)
  }
  
  #if result is error
  if( is.null(result$ok) && !is.null(result$err)) {
    return(abort(
      result$err,
      call = NULL,
      footer = paste(
        "when calling:\n",
        paste(capture.output(print(call)),collapse="\n")
      ),
      ...
    ))
  }
  
  #if not ok XOR error, then roll over
  abort("internal error: result object corrupted",.internal = TRUE)
}


#' example user function
#'
#' @param ok bool TRUE or FALSE, should result be ok (or and error)
#'
#' @return r_result_list
#' @export
#' 
#' @importFrom rlang is_bool
#'
#' @examples stopifnot(r_user_function(is_ok=TRUE)==42L)
r_user_function = function(is_ok) {
  if(!is_bool(is_ok)) stop("ok arg must be either scalar TRUE or FALSE")
  result = rust_hello_result(is_ok)
  unwrap(result)
}

Ilia-Kosenkov · 2022-08-25T16:38:11Z

We had this discussion multiple times, and there is no consensus. There are arguments in favor of different implementations.
I guess this can be a default behavior if a return type is some Result<RObj, Error>. The problem is, if the return type is a list, it is always an extra allocation.
One solution could is that on error result, we return a condition, e.g. created with simpleError() (we do not care what to return from Rust since it is always an RObj). On R side we check if the thing returned from R is a correct Rust-created condition, we throw it as error with say stop(), otherwise we return it as a valid output from Rust.

Again, many ways to solve this problem, feel free to make an attempt at PR.

sorhawell · 2022-08-26T08:39:49Z

Yeah many ways. I think of it as an iron triangle of Speed, Safety and 'Good' error handling. Unless the C-wizards :) come up with a really nice have-it-all solution, it will be a compromise.

The current default behavior, "unwrap->panic->unwind" favors Speed+Safety and might actually be a fine default behavior.

I suggest to add 2 or 3 handy generic functions and a explanation in the learning materials how the extendr_api-user can opt-in to other mixes of speed-safety-goodError.

list(ok,err) favors Safety+GoodError
As newbie, I try to understand the source code of list! it invokes List::from_values_and_names which finally uses make_vector and SET_VECTOR_ELT it seems to not use cloning but only moving. I'm I wrong on that?. In that case it is just the allocation of a two element list object and might not be too bad.

your mentioned simpleCondition-option has Safety+GoodError and retains speed on the happy path by avoiding the list allocation.

The generic function would look like this then

pub fn r_result_condition<T, E>(x: std::result::Result<T, E>) -> Robj
where
    T: std::fmt::Debug + IntoRobj,
    E: std::fmt::Debug + std::fmt::Display,
{
    if x.is_ok() {
        x.unwrap().into_robj()
    } else {
        let err_call_string = format!("simpleError(message = '{}')", x.unwrap_err().to_string());
        let robj = extendr_api::eval_string(&err_call_string).unwrap();
        robj
    }
}

usage in rust could look like

#[extendr]
fn rust_hellocondition(is_ok: bool) -> Robj {
    //some silly logic to emulate Result
    let result = if is_ok {
        Ok(42i32)
    } else {
        Err("rust world error: is_ok must be set to true")
    };

    //instead of returning Result<T,E> then wrap like this and return ok Robj or R simpleError condition
    r_result_condition(result)
}

and on R side the extendr_api-user would write a function like

check_condition = function(value_or_condition) {
  
  #if ok, just return value as is 
  if(!rlang::is_error(value_or_condition)) return(value_or_condition)
  
  #else throw error condition
  abort(
    value_or_condition$message
  )

}

I will try it out for some weeks and make a PR if I still like it :)

Ilia-Kosenkov · 2022-08-26T09:15:13Z

The end goal would be to update generates code as well. If a rust FN returns a result of supported shape, then correctly process it and send to R. And on the R side, generate wrapper that not only makes a call into native lib, but also throws an error if it recognizes it is indeed an error.

Then, from users perspective, it will be super smooth.

sorhawell · 2022-08-26T10:58:40Z

How about extending the #[extendr] macro to take an optional argument which decides the behavior. The default behavior can be "smooth for end-users who just wanna try writing a few functions". "Need for speed"-users and "error control"-people can just opt for other behaviors.

//Default, intuitively like any other R function, return value or raise error
// extendr-macro will use above generic r_result_condition + modify extendr-wrappers.R to always check for condition for this specific function
#[extendr(err_condition)]
pub fn rust_result_condition(is_ok: bool) -> Result<i32, &str> {
  todo!("some code returning a Result")
}

// unwrap-panic, good for speed and simplicity
#[extendr(err_panic)]
pub fn rust_result_panic(is_ok: bool) -> Result<i32, &str> {
  todo!("some code returning a Result")
}

// always return list(ok, err), good for explicit error handling where the rust function is not exposed directly to end-user anyways.
// R developer must manually unwrap err on R side, but do not have to use tryCatch, more like return from purrr::safely
#[extendr(err_result)]
pub fn rust_result_list(is_ok: bool) -> Result<i32, &str> {
  todo!("some code returning a Result")
}

Ilia-Kosenkov · 2022-08-26T18:15:57Z

I am not sure the flag is needed, you can communicate it with a return type. If it is something like T : impl IntoRobj, you panic, if however it is a Result<T : impl IntoRobj, Error> (or whatever the correct syntax is, you get the idea), you unwrap it in R and throw there.
If you are not part of the discord community, I strongly suggest you join -- it is much easier to discuss these things there first, in an informal way.

yutannihilation · 2023-02-12T10:26:30Z

Let me share my proof-of-concept implementation using tagged pointer, which is in the direction of #278 (comment). I have no intention (and bandwidth) to promote this, at least at the moment, but hope this helps the discussion here somehow.

Rust code: https://github.com/yutannihilation/unextendr/blob/ec88a43411a54a720b4071b7c0adbfe597fda18e/src/rust/src/lib.rs#L33-L77 (updated after Ilia's comment)
C code: https://github.com/yutannihilation/unextendr/blob/d757eb998a8bbc686c38f0b7c1a80b967893cf4d/src/init.c#L4-L15

The main advantage is that the additional cost is only one bitwise operation, but I'm not sure how portable this is...

Ilia-Kosenkov · 2023-02-12T16:53:08Z

What do you achieve with flag_sexp(res, false) upon success? It should do nothing (you execute bitwise x | 0)

yutannihilation · 2023-02-13T00:12:31Z

Aw, it was my mistake, sorry. The intention was to ensure the error flag is unset (so I should write it as x & !1), but thinking this again, it's unnecessary because probably unaligned pointer address cannot be a valid SEXP so it should have failed before this point. In short, it does nothing, but it doesn't matter.

yutannihilation · 2023-09-25T16:17:52Z

My toy framework now reached a usable state. To discuss about error handling, it might be a good example to play with. I have no idea how I can fit the implementation into extendr, but feedback is welcome.

How to create a package: https://github.com/yutannihilation/savvy#getting-started
Full documentation (WIP): https://docs.rs/savvy/latest/savvy/

#[savvy]
fn raise_error() -> savvy::Result<savvy::SEXP> {
    Err(savvy::Error::new("This is my custom error"))
}

raise_error()
#> Error: This is my custom error

This comment has been minimized.

Sign in to view

sstadick mentioned this issue Apr 26, 2022

R session aborts when Result::Err is returned from Rust or a panic! occurs #400

Open

multimeric mentioned this issue May 28, 2022

Requirements for 1.0.0 #411

Open

sorhawell mentioned this issue Jan 20, 2023

extendr_api::throw_r_error could leak a bit of memory PRQL/prqlc-r#46

Closed

sorhawell mentioned this issue May 4, 2023

implicit function param conversions leaks memory #549

Closed

yutannihilation mentioned this issue May 7, 2023

[PoC] Drop Error before throw_r_error() #551

Closed

yutannihilation mentioned this issue Nov 1, 2023

Fix compilation with features result_list and result_condition #652

Closed

yutannihilation mentioned this issue Apr 7, 2024

panic in other rust's crates yutannihilation/savvy#163

Closed

Returning error messages to R #278

Returning error messages to R #278

Comments

multimeric commented Sep 3, 2021

multimeric commented Sep 4, 2021

andy-thomason commented Sep 4, 2021

multimeric commented Sep 5, 2021

multimeric commented Sep 6, 2021

andy-thomason commented Sep 14, 2021

multimeric commented Sep 18, 2021

andy-thomason commented Sep 26, 2021

multimeric commented Sep 26, 2021

andy-thomason commented Sep 27, 2021 via email

multimeric commented Oct 2, 2021

clauswilke commented Oct 2, 2021

multimeric commented Oct 2, 2021

shrektan commented Jan 25, 2022 • edited

andy-thomason commented Jan 27, 2022

multimeric commented Jan 27, 2022

andy-thomason commented Jan 27, 2022

clauswilke commented Jan 27, 2022

yutannihilation commented Jan 28, 2022

multimeric commented Jan 28, 2022 • edited

yutannihilation commented Jan 28, 2022 • edited

multimeric commented Jan 28, 2022

yutannihilation commented Jan 28, 2022

andy-thomason commented Jan 28, 2022

yutannihilation commented Jan 28, 2022 • edited

shrektan commented Jan 28, 2022

dfalbel commented Jan 28, 2022

yutannihilation commented Jan 29, 2022

This comment has been minimized.

multimeric commented Jan 29, 2022

yutannihilation commented Jan 29, 2022

yutannihilation commented Jan 31, 2022

clauswilke commented Jan 31, 2022

yutannihilation commented Jan 31, 2022

Ilia-Kosenkov commented Jan 31, 2022

Ilia-Kosenkov commented Jan 31, 2022

multimeric commented Jan 31, 2022

yutannihilation commented Jan 31, 2022

multimeric commented Jan 31, 2022 • edited

shrektan commented Jan 31, 2022 • edited

shrektan commented Jan 31, 2022

multimeric commented Jan 31, 2022 • edited

yutannihilation commented Jan 31, 2022

sorhawell commented Aug 23, 2022 • edited

sorhawell commented Aug 25, 2022 • edited

Ilia-Kosenkov commented Aug 25, 2022

sorhawell commented Aug 26, 2022 • edited

Ilia-Kosenkov commented Aug 26, 2022

sorhawell commented Aug 26, 2022 • edited

Ilia-Kosenkov commented Aug 26, 2022

yutannihilation commented Feb 12, 2023 • edited

Ilia-Kosenkov commented Feb 12, 2023

yutannihilation commented Feb 13, 2023

yutannihilation commented Sep 25, 2023

shrektan commented Jan 25, 2022 •

edited

multimeric commented Jan 28, 2022 •

edited

yutannihilation commented Jan 28, 2022 •

edited

yutannihilation commented Jan 28, 2022 •

edited

multimeric commented Jan 31, 2022 •

edited

shrektan commented Jan 31, 2022 •

edited

multimeric commented Jan 31, 2022 •

edited

sorhawell commented Aug 23, 2022 •

edited

sorhawell commented Aug 25, 2022 •

edited

sorhawell commented Aug 26, 2022 •

edited

sorhawell commented Aug 26, 2022 •

edited

yutannihilation commented Feb 12, 2023 •

edited