Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ndarray improvements #275

Merged
merged 15 commits into from
Sep 3, 2021
Merged

Conversation

multimeric
Copy link
Member

@multimeric multimeric commented Aug 31, 2021

Changes:

  • Bump ndarray. This is needed because the current 0.13.x version that we are using is not compatible with projects using the more modern 0.15.x versions, and interop between arrays from different versions of the library will fail in confusing ways
  • Implement From<ArrayBase<S, D>> for Robj ie conversion from any ndarray to an Robj, with tests

Outstanding questions:

  • How can we actually compare R objects in Rust? My tests are failing I assume because the pointers are not the same, even though the string representation seems identical
  • Why do the arrays created using R! not have any classes set when I a.get_attrib(class_symbol())?

@clauswilke
Copy link
Member

You'll probably have to test for equality of each individual element, just like the other tests in the same source file.

@multimeric
Copy link
Member Author

Oh is that why all the other tests did that. Is there any discussion about implementing a PartialEq for Robj against Robj that compares by value?

@yutannihilation
Copy link
Contributor

Is there any discussion about implementing a PartialEq for Robj against Robj that compares by value?

I can't find any. My guess is that Robj is basically a pointer to a data owned by R, so the comparison is done in pointer-level. But I agree it's handy if we can compare by value somehow.

@multimeric
Copy link
Member Author

I can't find any. My guess is that Robj is basically a pointer to a data owned by R, so the comparison is done in pointer-level. But I agree it's handy if we can compare by value somehow.

Well I'll open an issue for it. Surely the R API has a way to compare by value? Or if not we can just write comparison functions like I have done here for vectors/arrays

@yutannihilation
Copy link
Contributor

Thanks, I think we need to write it by ourselves.

@multimeric
Copy link
Member Author

Turns out I'm wrong, actually. We're using the R_compute_identical API function in our implementation:

impl PartialEq<Robj> for Robj {
fn eq(&self, rhs: &Robj) -> bool {
unsafe {
if self.get() == rhs.get() {
return true;
}
// see https://github.com/hadley/r-internals/blob/master/misc.md
R_compute_identical(self.get(), rhs.get(), 16) != 0
}
}
}

And this function quite neatly compares all aspects of any Robj by value:
https://github.com/SurajGupta/r-source/blob/a28e609e72ed7c47f6ddfbb86c85279a0750f0b7/src/main/identical.c#L86

I think my code was not working with this function only because the attribute lists were different (which I still don't understand).

@yutannihilation
Copy link
Contributor

Oh, I didn't notice it, sorry.

extendr-api/src/robj_ndarray.rs Outdated Show resolved Hide resolved
@yutannihilation
Copy link
Contributor

@andy-thomason @clauswilke @Ilia-Kosenkov
Though I'm not familiar with the implementation around ndarray, this pull request basically looks good to me. We might need to tweak some implementation in association with the ongoing changes on conversions, but I expect this won't be a burden as the implementation is simple. Any major concerns?

@clauswilke
Copy link
Member

I agree.

@multimeric
Copy link
Member Author

multimeric commented Sep 1, 2021

I just pushed a refactor, it should be much cleaner now.

Other than what is being discussed in the review, I'd appreciate review on:

  • The way that I'm cloning the data before conversion to R. Is that correct?
  • The way that I'm handling memory layouts. It occurred to me that since ndarray supports both C and F layouts, while R supports only F (I think?), there needed to be some kind of handling. I've since implemented this with a test, but it's doing my head in and I would appreciate someone double checking that I've done this correctly
  • Do we approve of using as i32 to convert between usize and i32? I gather that it truncates so that if we have a dimension that's too long to represent as an i32 then it will just fail silently and instead return the maximum i32 value.

@Ilia-Kosenkov
Copy link
Member

@multimeric,
I enabled CI for you so you can test your changes en masse.

@Ilia-Kosenkov
Copy link
Member

@multimeric

The way that I'm cloning the data before conversion to R. Is that correct?

I can answer this. We have recently devised some ground rules on how to handle the marshaling of R objects into Rust, starting with R vectors (not arrays of dim 1 though). As a result, the way we handle arrays will likely change in the future, but for now, I believe, any reasonably working solution will do fine.

Do we approve of using as i32 to convert between usize and i32? I gather that it truncates so that if we have a dimension that's too long to represent as an i32 then it will just fail silently and instead return the maximum i32 value.

I am not really familiar with ndarray, but I assume you are talking about converting array dimensions to i32 used to set dim.
I do not like such unchecked casts, if a Rust array is 'too large', I doubt R can handle it correctly anyway, so overflowing casts should just fail.
I suggest to at least leave a comment in the code mentioning this issue.

@multimeric
Copy link
Member Author

multimeric commented Sep 1, 2021

I am not really familiar with ndarray, but I assume you are talking about converting array dimensions to i32 used to set dim.
I do not like such unchecked casts, if a Rust array is 'too large', I doubt R can handle it correctly anyway, so overflowing casts should just fail.

Okay, I've changed this such that it will panic in this (rare) scenario. I could also change it to return an error, but then we would have to implement TryFrom instead of From, but I don't think any other conversions do this?

@clauswilke
Copy link
Member

Okay, I've changed this such that it will panic in this (rare) scenario. I could also change it to return an error, but then we would have to implement TryFrom instead of From, but I don't think any other conversions do this?

We're moving to TryFrom throughout to exactly avoid these kinds of issues. We're just not there yet, you caught us in the middle of this transition. I don't like random panics or unexpected data corruption, even if extremely rare.

@multimeric
Copy link
Member Author

If there's another branch I should target in order to use TryFrom, I happily will.

@clauswilke
Copy link
Member

Therefore, I would suggest moving to TryFrom now rather than leaving the code half-finished and hoping somebody will fix it later. It'll cause 100x more pain later on, when people are actually using the API and experience breaking changes.

@clauswilke
Copy link
Member

No, development is happening in master.

@multimeric
Copy link
Member Author

multimeric commented Sep 1, 2021

Okay. Do we have some kind of error scheme I should be following? Just strings? anyhow? thiserror?

@Ilia-Kosenkov
Copy link
Member

Okay. Do we have some kind of error scheme I should be following? Just strings?

I think this is the one
https://github.com/extendr/extendr/blob/master/extendr-api/src/error.rs

@multimeric
Copy link
Member Author

Okay. Do I just add new errors to that enum? Because the kind of errors I expect are like "axis too long", and whatever kind of error set_attrib returns.

@clauswilke
Copy link
Member

Maybe just use Other with a string? Since this is going to be an extremely rare and obscure error, I'm not sure it needs its own type.

@Ilia-Kosenkov
Copy link
Member

Other with a string is probably preferable because this may be reworked in the nearest future (not entirely, of course), and other Error types may emerge. I am not entirely sure about the usage of the word axis though, perhaps array dimension?
Coming from R, I'd expect to receive "axis too long" when making plots, not when dealing with arrays.

@multimeric
Copy link
Member Author

Done.

@Ilia-Kosenkov
Copy link
Member

Ilia-Kosenkov commented Sep 1, 2021

LGTM, but I'd like to suggest to test one other edge case -- 'empty' arrays. As far as I can see, we have no such tests.
I propose something like matrix(integer(), nrow = 10), which actually produces an empty matrix.
I believe it is a valid use case as R matrix can be sliced into such 'empty' object.

PS: I was under the impression the CI will continue working after I approved it once, but it seems it requires manual triggering for now.

EDIT01: to clarify, an empty R matrix (rank 2 array) is a valid input, so it would be nice to get an idea of what will happen if such an input is received and whether it is (ideally) roundtripable.

EDIT02: array(integer(), c(0, 1, 2, 3)) is also a valid thing.

@clauswilke
Copy link
Member

I believe CI needs to be approved manually until somebody has made at least one accepted contribution to a repo: https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks

It doesn't say so in the github docs, but I believe this is primarily to prevent cryptocurrency mining via github actions.

@multimeric
Copy link
Member Author

LGTM, but I'd like to suggest to test one other edge case -- 'empty' arrays.

Done.

@multimeric
Copy link
Member Author

Only the MacOS test is failing but I don't think it's due to my PR?

@clauswilke
Copy link
Member

I'm re-running the tests. Not clear to me why OSX is failing.

@clauswilke
Copy link
Member

The OSX failure has persisted for a while now. Maybe @yutannihilation knows what's going on?

@Ilia-Kosenkov
Copy link
Member

Only the MacOS test is failing but I don't think it's due to my PR?

CRAN sometimes incorrectly distributes R binaries/sources, especially devel ones. In our case, it fails at the R installation step with an HTTP 404 error while trying to download the newest version, which is pretty much self-explanatory :)
We can ignore this failure for now as there seem to be no other issues on other platforms, including macOS.

Copy link
Member

@Ilia-Kosenkov Ilia-Kosenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
Few suggestions:

  • First, I'd recommend to update the CHANGELOG.md with a one-line entry mentioning the support of TryFrom for arrays. You not only implemented new features, but also bumped the version of an (optional) dependency, so I'd say it is worth writing down.
  • Second, if you feel like doing some R coding, it could be nice to have some array tests in tests/extendrtests. This package showcases extendr features (and also tests them), so having, say, a Rust function that accepts two matrices, multiplies them, and then returns the product back to R would be nice. This is a topic for a separate PR and can be implemented by someone else.

@yutannihilation
Copy link
Contributor

The OSX failure has persisted for a while now. Maybe @yutannihilation knows what's going on?

No idea, but let's ignore.

Error: Failed to get R devel: Failed to get R devel: Failed to download version devel: Error: Unexpected HTTP response: 404

@multimeric
Copy link
Member Author

First, I'd recommend to update the CHANGELOG.md

Done.

I also got some helpful comments from one of the ndarray devs here: rust-ndarray/ndarray#1060 which I used to inform a slight optimisation.

Second, if you feel like doing some R coding, it could be nice to have some array tests in tests/extendrtests. This package showcases extendr features (and also tests them), so having, say, a Rust function that accepts two matrices, multiplies them, and then returns the product back to R would be nice. This is a topic for a separate PR and can be implemented by someone else.

Sounds good, but yes I would rather leave this to another PR.

@Ilia-Kosenkov
Copy link
Member

@multimeric,
Seems all is good, thank you for your contribution. I am merging now.

@Ilia-Kosenkov Ilia-Kosenkov merged commit 6935c7d into extendr:master Sep 3, 2021
@multimeric
Copy link
Member Author

Hooray! Thanks for the merge. I look forward to the next major release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants