Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rmatrix docs #466

Merged
merged 11 commits into from
Feb 10, 2023
Merged

Rmatrix docs #466

merged 11 commits into from
Feb 10, 2023

Conversation

multimeric
Copy link
Member

  • Document RArray::new_matrix() (closes Document how to create a new matrix #463)
  • Add a new CollectRMatrix trait that can be used to collect arbitrary iterables into an R matrix
  • Includes tests for this new trait

@multimeric
Copy link
Member Author

multimeric commented Jan 29, 2023

I removed the proposed by_row parameter because I thought it made for an ugly API, it encouraged using row-major iterables when we want to discourage this for performance reasons, and because there is no simple way to implement this until array_chunks is stabilised. My feeling is that people who want this functionality should use ndarray instead, which has a nice transpose function, or they can just use R's t() function. However, I'm open to being convinced.

@daniellga
Copy link

daniellga commented Jan 29, 2023

@multimeric Hi! What happens here if you call collect_rmatrix when vector.len() is not a multiple of the nrow argument given as input?

@multimeric
Copy link
Member Author

Good point. That would surely fail, but possibly on the R side. Maybe I should just make this function fallible and have it return an Error if the user providers an invalid nrow?

@daniellga
Copy link

I am not certain that would fail on the R side, unfortunatelly I can't run any tests now. I don't know how other people do, but I only create these extendr types right before returning them to R, so, in the way I use it, I don't expect to run into any trouble before R would detect the problem.
Anyway, it is probably better to make it clear in docs that the function is not fallible (or fallible) if vector.len() is not a multiple of nrows (thus it's the caller's responsibility to make the check) or something like that.

@JosiahParry
Copy link
Contributor

In relation to "... it encouraged using row-major iterables when we want to discourage this for performance reasons..." perhaps you can indicate in the docs that by row is slower performance. R, numpy, and ndarray support insertion rowwise—i think having a counterpart would be appreciated.

Re "...should use ndarray instead": perhaps this is the best answer. If it is the documentation for using ndarray + extendr should be improved. I have not been able to find any, personally. For example I had to write my own conversion function from RMatrix -> Array2.

@daniellga
Copy link

Hi @JosiahParry! Would you mind giving examples of those functionalities on ndarray? I was looking through the docs but I don't see how you create a new column major array row by row or a new row major array column by column. I see row major arrays being populated by row or column major array being populated by column. Remember R doesn't support row major matrices.
I think @Ilia-Kosenkov is already improving ndarray conversions and docs in another PR. Any help on the docs for these conversions is very appreciated.

@JosiahParry
Copy link
Contributor

@daniellga Again, i'm referring to insertion into a matrix not internal storage. You can insert rowwise into a column major matrix in R.

ndarray is row-major so the inverse method is provided using the [ShapeBuilder](https://docs.rs/ndarray/0.13.1/ndarray/struct.ArrayBase.html#constructor-methods-for-n-dimensional-arrays for the ) trait's .f() method. Further see the wonderful examples at https://docs.rs/ndarray/latest/ndarray/doc/ndarray_for_numpy_users/index.html#array-creation

@JosiahParry
Copy link
Contributor

Perhaps the .f() method changes the memory layout but for someone like me who isn't concerned about the memory layout but rather ease of use being able to insert data in either way is a lovely convenience.

@Ilia-Kosenkov
Copy link
Member

If we use R's dim symbol, it will likely fail. E.g. you can try something like this `dim<-`(1:16, c(2, 3)) and see for yourself. Because of this, I'd prefer to have it return a Result rather than call into R and fail from there, which is a much more expensive transition.

@multimeric
Copy link
Member Author

ndarray is row-major so the inverse method is provided using the [ShapeBuilder](https://docs.rs/ndarray/0.13.1/ndarray/struct.ArrayBase.html#constructor-methods-for-n-dimensional-arrays for the ) trait's .f() method. Further see the wonderful examples at https://docs.rs/ndarray/latest/ndarray/doc/ndarray_for_numpy_users/index.html#array-creation

My understanding is that ndarray can actually store the array in-memory as column major, as long as you use the f() at construction time. Which is ideal for conversion back to R.

@CGMossa
Copy link
Member

CGMossa commented Feb 8, 2023

I must apologise. Since I'm a contributor, I was able to push a commit to your pr. I had hoped it would be suggestions. But it just took it whole sale.
Please feel free to overwrite them. Or downright revert the commit.
I'll not do that again.

@multimeric
Copy link
Member Author

multimeric commented Feb 8, 2023

TODO:

  • Convert into arbitrary RArray using collect_array
  • Move the trait into RobjItertools instead of using own trait

@CGMossa
Copy link
Member

CGMossa commented Feb 8, 2023 via email

@CGMossa
Copy link
Member

CGMossa commented Feb 9, 2023

Suggested edit:

diff --git a/extendr-api/src/robj/into_robj.rs b/extendr-api/src/robj/into_robj.rs
index 1d084cb5..bb1fe15c 100644
--- a/extendr-api/src/robj/into_robj.rs
+++ b/extendr-api/src/robj/into_robj.rs
@@ -478,8 +478,8 @@ pub trait RobjItertools: Iterator {
         vec.into_iter().collect_robj()
     }
 
-    /// Collects an iterable into an RArray.
-    /// The iterable must yield items column by column (aka Fortan order)
+    /// Collects an iterable into an `RArray`.
+    /// The iterable must yield items column by column (aka Fortran order)
     ///
     /// # Arguments
     ///
diff --git a/extendr-api/src/wrapper/matrix.rs b/extendr-api/src/wrapper/matrix.rs
index 92fc4449..95dd6230 100644
--- a/extendr-api/src/wrapper/matrix.rs
+++ b/extendr-api/src/wrapper/matrix.rs
@@ -134,8 +134,8 @@ where
     /// * `f` - a function that will be called for each entry of the matrix in order to populate it with values.
     ///     It must return a scalar value that can be converted to an R scalar, such as `u32`, `f64`, i.e. see [ToVectorValue].
     ///     It accepts two arguments:
-    ///     * `r` (usize) - the current row of the entry we are creating
-    ///     * `c` (usize) - the current column of the entry we are creating
+    ///     * `r` - the current row of the entry we are creating
+    ///     * `c` - the current column of the entry we are creating
     pub fn new_matrix<F: Clone + FnMut(usize, usize) -> T>(
         nrows: usize,
         ncols: usize,
@@ -148,7 +148,7 @@ where
             })
             .collect_robj();
         let dim = [nrows, ncols];
-        let mut robj = robj.set_attrib(wrapper::symbol::dim_symbol(), dim).unwrap();
+        let mut robj = robj.set_attrib(wrapper::symbol::dim_symbol(), dim)?;
         let data = robj.as_typed_slice_mut().unwrap().as_mut_ptr();
         RArray::from_parts(robj, data, dim)
     }

@JosiahParry
Copy link
Contributor

Why remove (usize)? I feel like removing that only adds ambiguity. Having it only helps and doesnt hurt.

@CGMossa
Copy link
Member

CGMossa commented Feb 9, 2023 via email

Copy link
Member

@CGMossa CGMossa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. You can add my changes if you like.

@multimeric
Copy link
Member Author

The function types are visible in the function signature, but they're not super clear:
image

@multimeric
Copy link
Member Author

I can't use this, because set_attrib returns an option, not a result:

+        let mut robj = robj.set_attrib(wrapper::symbol::dim_symbol(), dim)?;

@CGMossa
Copy link
Member

CGMossa commented Feb 10, 2023

Sorry about that. You can use ok_or_else||Error::Other(format!("some message"))), but it doesn't matter that much honestly.

Also I think r should be nrow and c be ncol now that I see the documentation.

I think the documentation should follow Rust documentation etiquette, and not some other way...
But if it is important, we can revisit it in another PR. I won't be pushing this further.

@multimeric
Copy link
Member Author

nrow is used in R but nrows in ndarray. It's tricky.

@CGMossa
Copy link
Member

CGMossa commented Feb 10, 2023

That's not what I meant. In the docstring the argument name should match the function signature.

@multimeric
Copy link
Member Author

But r is the current row, in the iteration, not the number of rows.

@CGMossa
Copy link
Member

CGMossa commented Feb 10, 2023

Ah.. The documentation is for the closure. Damn.. Yeah you're absolutely right.
Keep the usize as well. You were right about all of it. My bad.

@multimeric
Copy link
Member Author

I don't mind removing the usize. Technically it's in the function signature just like for nrow. It's just in a slightly odd place.

@multimeric multimeric merged commit 9dd0b14 into master Feb 10, 2023
@CGMossa CGMossa deleted the rmatrix-docs branch April 11, 2023 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document how to create a new matrix
5 participants