Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback on Ndarray/Ndarray-linalg (from Reddit) #649

Open
LukeMathWalker opened this issue Jun 23, 2019 · 5 comments
Open

Feedback on Ndarray/Ndarray-linalg (from Reddit) #649

LukeMathWalker opened this issue Jun 23, 2019 · 5 comments

Comments

@LukeMathWalker
Copy link
Member

Some interesting feedback on using ndarray and ndarray-linalg to port a numerical routine from Python to Rust: Reddit thread

Actionable highlights:

  • NumPy to ndarray documentation

The ndarray for numpy users document was a tremendous help. I really appreciate it and all the work the developers have put into making it.
I'd love to see the numpy-to-ndarray docs expanded. I know that the developers of ndarray are trying to make most function calls as general as possible, but I might request that some more of these general functions get added to the ndarray for numpy users document (either how to implement them or where to go for the implementation). For instance: the max/min and argmax/argmin functions.

  • Crate organization

On the topic of max/argmax functions, I implemented my own but I just saw that ndarray-stats has them. I am a bit confused why/how the ndarray and the semi-associated subcrates are being partitioned. For instance, the other stats functions in ndarray-stats seem pretty high level and make sense. However, the "max" function seems like something that should either be implemented in the ndarray crate itself (like the "sum" function/method has been) or "max" should be part of another "ndarray-lowlevelmath" crate.

  • Array indexing

Not being able to slice/index ndarrays with other ndarrays was a bit of a pain point. It's mentioned that you can't do that with ndarray on the ndarray for numpy users document. Are there plans to implement indexing by array?

  • Best practice wrt views/references/etc.

I'm not sure how / when to use views of arrays and when to use references. This is likely my lack of experience with Rust, but I don't know what the pros and cons of each are or when I should write a function that takes an array view versus a reference to an array. I haven't seen any tutorial information on this (so if anyone could point me to a document I'd be super appreciative).

Sweet closure 😛

My project that uses Python will really be improved greatly by switching to Rust and I'm thrilled that so much work has been done on the ndarray and associated crates in the last year that is enabling my work!

@LukeMathWalker
Copy link
Member Author

I have been reviewing our Ndarray for NumPy user doc and I think we could make it a little bit more comprehensive now that ndarray-linalg is part of the organization and more functionality gaps have been covered by ndarray-stats.
This should also improve ease of discovery (is there a max function? where?) - we could as well add some references to ndarray-linalg, ndarray-stats and ndarray-rand (and the respective scopes) in the homepage of ndarray docs, to make them more visible.

Happy to work on this and open a PR if it's desirable.

I was planning to address best practices when it comes to views and references in the series I have been putting together, but my free time has been quite limited recently. I should be able to get back on it in 2 or 3 weeks.

@SuperFluffy
Copy link
Contributor

SuperFluffy commented Jun 23, 2019

Before we advertise ndarray-linalg more prominently, I think we should discuss its design. I have been looking into using it for my work, but have decided to use my own implementations. While ndarray-linalg works great for one-off calculations, at the moment it's not very useful for tight/hot loops. I will use its qr code here as an example:

The first thing to note is its repeated use of ArrayBase::uninitialized. This was probably done for performance, but it should probably be benchmarked if in the real world this has any real effects on performance. People repeatedly expressed on reddit that unsafe code makes them question the maturity of libraries. Fortunately, this is an easy fix.

The main issue with the qr code is the large amount of allocations. A typical call does the following:

  • QR::qr allocates via ArrayBase::to_owned;
  • QRInto::qr_into allocates twice via take_slice and take_slice_upper;
  • QR_::qr allocates once via a Vec inside it's body;
  • QR_::householder allocates once via Vec.

Given that a typical use case in scientific calculations is to repeatedly perform QR factorization on a matrix of the same dimensions, it would be great to have a struct QR that contains all pre-allocated data structures and comes equipped with a method compute taking a matrix A and storing the result inside it. The factors Q and R can then be accessed by reference and cloned, if needed.

One possible design is what I did with my library for Gram Schmidt QR factorization. As an example, struct Classical implements the GramSchmidt trait. After construction using the trait methods from_matrix or from_shape, an array of the same dimensions can be repeatedly factorized using compute.

For one-off calculations, one can have convenience wrappers like this one here.

I did the same thing when implementing Al-Mohy and Higham's matrix exponentiation algorithm here: https://github.com/SuperFluffy/rust-expm/blob/master/src/lib.rs#L332

@LukeMathWalker
Copy link
Member Author

I haven't had a chance yet to take a deeper look at the internal of ndarray-linalg, but, as a matter of principle, I don't think we should "hide" it until it's perfect and polished.
On the contrary, we should make it more visible while providing an easy reference to all its shortcomings (at least the ones we know of), thus making it easier for people to contribute and improve on what it's currently available.

Could we get something similar to #597 for ndarray-linalg @SuperFluffy @termoshtt?

@termoshtt
Copy link
Member

The main issue with the qr code is the large amount of allocations

I agree that ndarray-linalg should have an interface which minimize memory allocation, but it can coexist with current luxury interface as LUFactorize does.

Could we get something similar to #597 for ndarray-linalg

This will be great, but I can get little time for working with it. I hope another main maintainer handles this if possible...

Crate organization

I've created ndarray-linalg crate around the comment #178 (comment) basically for my studies. ndarray crate itself is still not enough for general usage, and we need associated sub-crate. We have two options:

  • Merge these sub-crates into ndarray itself
    • simplify the development
  • Re-split existing functionalities into sub-crates
    • e.g. ndarray-rand and ndarray_linalg::generate::random has similar features

Monolithic ndarray crate is useful for users, and it will be also useful for developers because they have no need to consider which crate should implement some function. However, it will be harder for maintainer.

@LukeMathWalker
Copy link
Member Author

LukeMathWalker commented Jul 4, 2019

Could we get something similar to #597 for ndarray-linalg

This will be great, but I can get little time for working with it. I hope another main maintainer handles this if possible...

I am not super familiar with the ndarray-linalg yet, so it would be beneficial if you could do a "brain dump" of current issues, future directions, etc.
I am happy to benchmark ndarray-linalg against NumPy to see what we are missing from a functionality point of view.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants