Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rarray rebuild #610

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Rarray rebuild #610

wants to merge 8 commits into from

Conversation

daniellga
Copy link

I have been using RArray for some time so I made changes that I think will make its use easier for an inexperienced user.
So this PR is an RArray rebuild, making it easier to create an RArray and also removing RColumn and RMatrix3D, which I don't think make that much sense.
collect_rarray now also accepts dynamic arguments like Vec.

@CGMossa
Copy link
Member

CGMossa commented Aug 31, 2023

Yes! I support this fully. I also hope it has multiple dimension support.
I'll definitely review this thoroughly.

@CGMossa
Copy link
Member

CGMossa commented Sep 5, 2023

Can you request reviews from me / us when you're ready?

@daniellga
Copy link
Author

Can you request reviews from me / us when you're ready?

It seems ready for me, but I can't (or don't know how to) ask for you to be a reviewer...

I tried to do as told here but it seems I am not able to do it.
https://stackoverflow.com/a/41028889/11587489

@Ilia-Kosenkov
Copy link
Member

Maybe there is some issue with access that you cannot ask for reviews?
Anyway, looks impressive, I need time to go through these things.

What about unit tests? Did you manage to make it work with old ones or do we have no tests for RMatrix? If it is the latter, could you add some tests, especially for creation/type conversion?

@daniellga
Copy link
Author

Maybe there is some issue with access that you cannot ask for reviews? Anyway, looks impressive, I need time to go through these things.

What about unit tests? Did you manage to make it work with old ones or do we have no tests for RMatrix? If it is the latter, could you add some tests, especially for creation/type conversion?

I added some tests but I don't know if it covers type conversions... Could you be more specific on what these should be?
I think these tests cover nearly everything the old ones did.

let prod = dims_ref.iter().product::<usize>();
if prod != robj.len() {
return Err(Error::Other(format!(
"The vector length ({}) does not match the length implied by the dimensions ({})",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw this error twice in the codebase, I believe it deserves its own error type with perhaps two parameters and this string moved to impl Debug impl Display.

Copy link
Author

@daniellga daniellga Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to implement a "more generic" ValuesMismatch error where it accepts 2 values that implement the Debug trait + a String but the PartialEq derive for Error blocked it... I think it would be better to create an issue in order to implement it another time...
I implemented a DimensionMismatch error anyway, but I ended up not too happy about it since it is too specialized...

// Create 2D RArray.
let slice = [1,2,3,4,5,6];
let dim = [2,3];
let array = RArray::from_slice(slice, dim).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other construction methods besides from_slice()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also from_slice_unchecked. Since creating a new RArray creates a copy of the inner data, these were the only 2 methods I thought were needed. Note that you can use from_slice() directly on a Vec or on a rust array in this case.
I made RArray::new() available only to be used internally, since it doesn't add the dim attribute for the Robj.
Let me know if you are thinking of another type of constructor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to create one without copying data? That would be quite nice. Additionally, it would be handy to be able to determine the col/row major order or at least provide a closure similar to the matrix method.

Copy link
Author

@daniellga daniellga Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is possible to create an Robj without copying data.

Is it ok for you if I add an example on how to build row major matrices? Or is the RMatrix constructor using a closure really desirable? I removed it because think the data manipulation should be left for the user to figure out and also because it was a somewhat complicated function to understand how to use...

Maybe @CGMossa and @Ilia-Kosenkov can give their opinion on this.

    #[test]
    fn test_collect_rarray_matrix_row_major() {
        test! {
            // Test building an RMatrix in row major order.
            let v: Vec<i32> = (1i32..=12).collect();
            let nrow = 4_usize;
            let ncol = 3_usize;
            let rmat = (0..ncol)
                .flat_map(|i| {
                    v.iter()
                        .skip(i)
                        .step_by(ncol)
                        .cloned()
                })
                .collect_rarray([nrow, ncol]);
            assert!(rmat.is_ok());
            assert_eq!(Robj::from(rmat), R!("matrix(1:12, nrow=4, byrow = TRUE)").unwrap());
        }
    }

Using the transpose crate would probably be way more efficient than what I did in the test above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that the closure is very important. I can find 8 examples where I'm using it in existing code.
https://github.com/search?q=owner%3AJosiahParry%20new_matrix&type=code

It's very helpful in that I often iterate over rows and store a Vec<T> where T is some structure that is on a per-row basis. Presently I am writing code that creates a matrix from Vec<[f64;2]> with the closure |r, c| x[r][c].

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The closure is definitely useful, but I find it very confusing. That's why I removed it in the first place. The changes I made are very opinionated as I think these transformations made by the closure should be done outside extendr. When creating the array, I think the user should already have the Vec (or slice) already ready to be converted to an RArray.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It requires a whole lot of mental gymnastics to understand how to create a flattened vector in the appropriate shape for R's column-major matrices. Not many people will be able to figure that out--myself included. QOL >>> imo

@CGMossa
Copy link
Member

CGMossa commented Oct 31, 2023

Hello @daniellga. I am honestly very sorry that we haven't resolved this. I want to solve it within the next two weeks. If you have availability to get it over the finish line, please tell me.

@daniellga
Copy link
Author

daniellga commented Oct 31, 2023

@CGMossa and @JosiahParry I don't have time at all at the moment... Family stuff. Feel free to modify it if you think changes are worth. I think you are allowed to do it, right? (not very used to github)
Anyway I will try to keep track of this PR whenever I can.

@JosiahParry
Copy link
Contributor

I still would like to improve this experience here. I continually find myself needing to make matrixes by row. Here is a current use case and makes it harder to achieve by removal of the closure approach which, I must admit is clunky, but better than nothing.

Given a vector of a struct with a const usize parameter, Vec<EsriCoord<N>>, I need to make a matrix with dimension (x.len(), N). Constructing a matrix right now is so awkward. This PR gets some of the way there but intentional excludes this use case.

pub struct EsriCoord<const N: usize>(pub [f64; N]);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants