Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can extendr give up embedded usage? #583

Open
yutannihilation opened this issue Jul 12, 2023 · 13 comments
Open

Can extendr give up embedded usage? #583

yutannihilation opened this issue Jul 12, 2023 · 13 comments

Comments

@yutannihilation
Copy link
Contributor

Seriously, I believe it makes things super complicated for extendr to support both R package use and embedded use. Does it really worth to support the embedded usage?

The recent trouble about the path to libR.so is the case; libR-sys sets rustc-link-search, but this isn't used for R packages because linking against libR happens AFTER cargo build, iiuc. libR-sys's build.rs can be much simpler if it gives up the embedded usage.

This tricky implementation of NA for &str is also the case. See #483 (comment) for the details. In short, it handles the problems that can occur only when multiple R sessions exist, which should never happen as long as extendr is used in R packages.

Placing single_threaded() here and there is also for this reason. I believe most of them are unnecessary except when it's used in some embedded usage. As long as it's called from R session, most of the things should be synchronous.

But, to be fair, giving up embedded usage means you cannot have extendr_engine(), which means you cannot have test! macro, which means you cannot do any unit testing at all. You'll always need a real R session and real R package. So, I guess no one will agree with this idea, but let me express my frustration here.

@andy-thomason
Copy link
Contributor

andy-thomason commented Jul 12, 2023

Yes. The whole raison d'etre of the embedded mode is for testing.

I had thought about making R libraries available on crates.io, too.

We could provide a config to no op single_threaded() when run from an R
package. That would improve performance.

@andy-thomason
Copy link
Contributor

This would also make Robj reference counting much cheaper.

@CGMossa
Copy link
Member

CGMossa commented Jul 12, 2023

I'm somewhat against this, and I'd like to elaborate soon on it.

In the mean time, please don't do this before I've given my opinion on this.

@multimeric
Copy link
Member

But has anyone answered the question of how we would do testing without embedded R?

@CGMossa
Copy link
Member

CGMossa commented Jul 12, 2023 via email

@yutannihilation
Copy link
Contributor Author

The whole raison d'etre of the embedded mode is for testing.

Oh, really? I didn't know this. Then, if only we can find some alternative way of testing, we might not need this. e.g. I was wondering if it's possible to inject fake libR for unit testing.

In the mean time, please don't do this

Don't worry, this would require enormous amount of efforts. No one can do this overnight :)

But has anyone answered the question of how we would do testing without embedded R?

I already answered it. You'll always need a real R session and real R package for testing. I mean, "testing" is possible, but unit test is not.

@yutannihilation
Copy link
Contributor Author

Anyway, I don't have energy to make this a reality. I'll keep experimenting with my toy crate in the hope I can reflect the achievements to extendr someday. Thank you all for your quick responses.

At the moment, stopping relying on panic!() is my priority.

@CGMossa
Copy link
Member

CGMossa commented Aug 27, 2023

I've come around to this idea, and in fact I think we should have an evalr package solely devoted to having
R-powered Rust crates / programs; I've experimented with it, and basically, the evalr crate should inherit the
types exposed by libR-sys, to avoid type issues with Rust, but otherwise it works as intended.

Users that want to use R in their rust crates will have many, many more needs than what extendr-api should cover.

(I failed to find this issue again, but @yutannihilation just linked it again, thanks!)

@yutannihilation
Copy link
Contributor Author

Just in case you don't realize (or forget) how horrible thing this is, let me highlight the NA character problem for an example.

The C string underlying R_NaString is just a "NA", which is allocated during the initialization of an R session.

https://github.com/wch/r-source/blob/ad9c596f97904f330c03cfaed9f8ff21200ed9de/src/main/names.c#L1221

The pointer address of this "NA" shouldn't change during the session. So, in theory, NA-check of &str can be done by comparing the pointer address against the "NA" directly.

However, in extendr, it's not possible because the pointer of the "NA" randomly disappears, which causes STATUS_ACCESS_VIOLATION! Can you believe? Something that cannot happen happens here. I'm almost sure the cause is that cargo test concurrently runs with multiple R sessions, considering --test-threads=1 made the error disappear.

You might argue that extendr anyway needs to be concurrency-proof, but I bet this isn't a concurrency that happens on a real R session. It's probably that multiple R sessions created by extendr-engine weirdly share some state. I haven't heard running multiple R process causes such an error.

I feel it's insane to have such a cryptic code below just to avoid the error that can only happen in embedded usage.

// To make sure this "NA" is allocated at a different place than any other "NA"
// strings (so that it can be used as a sentinel value), we allocate it by
// ourselves.
static EXTENDR_NA_STRING: Lazy<&'static str> = Lazy::new(|| unsafe {
// Layout::array() can fail when the size exceeds `isize::MAX`, but we
// only need 2 here, so it's safe to unwrap().
let layout = Layout::array::<u8>(2).unwrap();
// We allocate and never free it because we need this pointer to be
// alive until the program ends.
let ptr = alloc::alloc(layout);
let v: &mut [u8] = std::slice::from_raw_parts_mut(ptr, 2);
v[0] = b'N';
v[1] = b'A';
std::str::from_utf8_unchecked(v)
});

@CGMossa
Copy link
Member

CGMossa commented Oct 8, 2023

Oh I have forgotten about this. It must have been on my mind, because I made #624, which basically checks if R's NULL is available as a non C NULL pointer, to indicate if R was starting.

I feel like you're posting a lot of poignant information and issue, and I'm not arguing against it really.

I would love it if you made an issue about keeping the concurrency, or getting rid of it.
As I know understand it, it would benefit everyone, if we focused our effort on ExternalPtr/ALTREP and make
that a seamless thing between R & Rust, in which we then may provide concurrency, without having to deal
with R's C-API and its limitations.

Don't you think it makes a little bit of sense to try and attack these problems anyways? I've already made a lot of progress on this particular issue, and a few others. I am very positive that we together could get through this and many other things to be fair..

If you have ideas for the concurrency tests, please throw them my way. I will support it if you'd put forth, that extendr should not support concurrency.

@yutannihilation
Copy link
Contributor Author

I would love it if you made an issue about keeping the concurrency, or getting rid of it.

Why? I think you should file an issue about starting to support the concurrency, or not1. I don't think it's the consensus that concurrency is in the current scope of extendr. Am I wrong? If so, it probably means I should fade out from the extendr project because it seems I'm completely lost in the discussion.

Also, I'm not proposing dropping the (non-existent) support of concurrency here. I'm just expressing my feeling that I'm super tired of all the problems that embedded usage have been causing.

Footnotes

  1. I actually filed such an issue before, but I think I'm not the right person as I've changed my mind since then.

@CGMossa
Copy link
Member

CGMossa commented Oct 9, 2023

Dear @yutannihilation,

There are zero discussions being had without you. This "stuff" comes from me
trying desperately to implement #626, #608, and then me encountering these things,
having learned more about Rust and programming to deal with it.

I didn't put a reply in your original issue, because I am not that articulate, or knowledgeable.
But you are a few steps ahead of atleast me. So I suggested that you start discussion (again! Sorry I forgot about that issue, #298),
and I did say, that I would agree towards your view of it. Especially now. I have a brilliant PR #635, but it breaks CI,
sometimes, and sometimes not. It is correct approach to the single_threaded, but any extendr-API decisions we make
will affect it. That burden, that you've carried as well, is not worth it. Especially since we are not really given concurrent
extendr-api, we are just making it concurrency-safe (not a real term ☹️).

I think your example can be fixed. I can propose a fix for it. It doesn't mean I don't support that we go away from this,
it just means I want to learn, understand, and have empirical evidence of an honest attempt at solving this.
On paper, it doesn't sound hard.

But I will start the discussion! You already gave me a few points. I'll maybe suggest a PR to deal with the absence of symbols
when in extendr-scope (I'd basically use the ctor package, and maybe make a multi threads test, or something like that).

A side note: Thanks for your extensive comment on #646, I didn't know.. most of what you wrote 😅.

@yutannihilation
Copy link
Contributor Author

There are zero discussions being had without you.

I'm fine either with or without me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants