Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide element-wise math functions for floats #1042

Merged
merged 8 commits into from
Dec 6, 2023

Conversation

KmolYuan
Copy link
Contributor

@KmolYuan KmolYuan commented Jul 13, 2021

Closes #992.

Add a new module impl_float_maths under std feature.

If there are missing anything you like, or something should not comes here, any suggestion are welcome!

For integer things, maybe it belongs to another PR.

Three types of operators done by three local macros:

  • boolean_op: Returns a boolean array, join them together by *_any methods.
  • unary_op: Unary functions.
  • binary_op: Binary functions.

New functions I made:

  • is_nan_any: A "any" operation under is_nan.
  • is_infinite_any: A "any" operation under is_infinite_any.
  • square: A x*x shortcut.
  • clip: Same as Numpy clip, done by using Float::max and Float::min. This function has a doctest, copied from Numpy.

Ported functions: (here is the Float trait)

  • is_nan
  • is_infinite
  • floor
  • ceil
  • round
  • trunc
  • fract
  • abs
  • signum
  • recip
  • powi
  • powf
  • sqrt
  • exp
  • exp2
  • ln
  • log
  • log2
  • log10
  • abs_sub
  • cbrt
  • sin
  • cos
  • tan
  • to_degrees
  • to_radians

@bluss
Copy link
Member

bluss commented Jul 19, 2021

Looks to be in the right direction, that's nice - I'll be a bit unreachable during the summer, but back in a bit

Copy link

@multimeric multimeric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great PR, as the lack of these operations has been a major pain point for me trying to implement mathematical algorithms.

It's unfortunate the maintainers haven't had a chance to look at this, ideally there would be a few more people able to review PRs. In any case I'm giving some feedback in the hope that it might speed up the ultimate review process once it happens.

Some miscellaneous thoughts:

  • Could you actually implement the Float trait for an ndarray? You seem to have provided implementations for most if not all of the required methods.
  • I assume we would need (or at least, should have) some tests for all of the new methods, and not just clip
  • Although these are not part of the Float trait, do you think there is room for greater_than, less_than, and the rest of the family for a scalar operand? ie array![1, 2].greater_than(1) -> array![false, true]

@@ -0,0 +1,157 @@
//! Element-wise methods for ndarray

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the other code files seem to have the Apache license and copyright statement at the top. I suppose one of those is needed here.

src/impl_float_maths.rs Outdated Show resolved Hide resolved
src/impl_float_maths.rs Outdated Show resolved Hide resolved
src/impl_float_maths.rs Outdated Show resolved Hide resolved
src/impl_float_maths.rs Outdated Show resolved Hide resolved
@KmolYuan
Copy link
Contributor Author

Could you actually implement the Float trait for an ndarray? You seem to have provided implementations for most if not all of the required methods.

The main reason is that these methods need to import Float trait, otherwise they are not allowed to be used. The Float trait is still has some requirement, I think this is not easy to solve, such as the Num trait.

Although these are not part of the Float trait, do you think there is room for greater_than, less_than, and the rest of the family for a scalar operand? ie array![1, 2].greater_than(1) -> array![false, true]

I'm willing to add more useful methods as you mentioned, if they are compatible with the requirements.

@multimeric
Copy link

The Float trait is still has some requirement, I think this is not easy to solve, such as the Num trait.

Ah yes you're right, implementing Float requires implementing Num, which requires from_str_radix(str: &str, radix: u32) -> Result<Self, Self::FromStrRadixErr> which doesn't make sense for an array. It also has to implement NumCast which doesn't make sense either.

I'm willing to add more useful methods as you mentioned, if they are compatible with the requirements.

What requirements are you talking about?

@KmolYuan
Copy link
Contributor Author

What requirements are you talking about?

Oh, I mean the functions should be implemented with the Float trait.

@multimeric
Copy link

multimeric commented Aug 19, 2021

Looks like that won't be a problem if we don't bother with actually implementing Float.

@KmolYuan
Copy link
Contributor Author

Additionally, I think there should be add a #[must_use] notation, same as the standard library.

#[must_use = "method returns a new array and does not mutate the original value"]

Copy link
Member

@jturner314 jturner314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly looks good to me. I added a few comments.

Long-term, I think we should make these methods lazy instead of eager, but that will require a decent amount of additional infrastructure. There seems to be a moderate demand for these methods today, so it's probably worth including the eager versions now and changing them to lazy in a later breaking change.

src/impl_float_maths.rs Outdated Show resolved Hide resolved
/// assert_eq!(a.clip(8., 1.), array![1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]);
/// assert_eq!(a.clip(3., 6.), array![3., 3., 3., 3., 4., 5., 6., 6., 6., 6.]);
/// ```
pub fn clip(&self, min: A, max: A) -> Array<A, D> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd hold off including this until num-traits adds a clamp method to Float. In particular, the behavior should be the same as std's clamp methods (on f32, f64, and Ord) in the presence of NaNs and the min > max case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance is also an issue and the current implementation would probably need tweaking

src/impl_float_maths.rs Outdated Show resolved Hide resolved
src/impl_float_maths.rs Outdated Show resolved Hide resolved
src/impl_float_maths.rs Outdated Show resolved Hide resolved
src/impl_float_maths.rs Outdated Show resolved Hide resolved
src/impl_float_maths.rs Outdated Show resolved Hide resolved
@bluss
Copy link
Member

bluss commented Nov 1, 2021

I haven't added my general perspective, I think.

We have avoided these methods in the past for two reasons:

  • Focus on general ndarray features, not specific ones. mapv covers all of these. But I agree that these are useful, if we have contributors that want to maintain them
  • Focus on performance. These methods are not efficient, because they create new arrays - allocation and copying - for all operations. But they are convenient, and a lot of times the user doesn't need to care. But it's something to be aware of and keep in mind. We need to explain this to users as well, and we try in various places in the documentation. We provide things like Zip that allows the user to modify and transform elements in-place.

fn powi(i32)

/// Float power of each element.
fn powf(A)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Food for thought: This method is "powf(array, scalar)". We need a separate name for powf(array, array). What should the name of that method be? I don't mean that we should have that in this PR, but sometime down the line.

src/lib.rs Outdated Show resolved Hide resolved
@multimeric
Copy link

These methods are not efficient, because they create new arrays - allocation and copying - for all operations. But they are convenient, and a lot of times the user doesn't need to care.

Could we not solve this by having in-place versions of all these methods? It would also force users to decide which one they need in each case instead of always picking the simpler code. Should be fairly simple to add by updating the macros.

@bluss
Copy link
Member

bluss commented Nov 2, 2021

In-place versions are just a band aid, not a solution . Actual solutions does most the needed operations in one pass, not one by one (how do you handle chaining operations). I'd avoid in-place methods for now. 🙂

@multimeric
Copy link

Is this to do with loading the elements into CPU registers or something? Because otherwise I can't see why processing the whole array in one pass and then again in a second pass is any worse than applying the same transformations to each element sequentially. I would have thought that avoiding a new array allocation would dwarf this optimisation.

In any case if that would be a significant optimisation could we do some kind of chain pattern like:

arr.chain().powi(2).exp().eval()

@adamreichold
Copy link
Collaborator

adamreichold commented Nov 2, 2021

Is this to do with loading the elements into CPU registers or something?

With contemporary systems, moving the data from memory into caches and registers often dominates the runtime of numerical algorithms depending on how many arithmetic operations they perform relative to each memory access. Fusing operations from multiple passes into a single pass increases this ratio and thereby the likelihood of efficient CPU utilisation.

@bluss
Copy link
Member

bluss commented Nov 3, 2021

@multimeric Good that we got to explain processor caches. This will help a lot when working with Rust.

Your point about eval is close to what I'm thinking of too. Not in particular that we do that here, but that we lay the groundwork for a "language", where you can call .exp() etc on arrays.

Then a shim/layer can be inserted that has the same or sameish interface but lazy evaluation, like in your sketch! Both the in-place instead of allocating and batching operations instead of one per loop ideas are important here. There's some prior art around this. I don't know eigen (C++) very well but I think they do something like this. And the very old Rust experiment algebloat also tried this.

@bluss
Copy link
Member

bluss commented Nov 3, 2021

It would be exciting if

  1. We traitifed ndarray - not so many specific "Array", "ArrayView" in the signatures, instead just an impl NdArray<Elem=f64> or something would be enough.
  2. A lazy expression could implement the same trait, i.e you could pass array.lazy().exp() in place of an array.

I don't know what kinds of techniques can be used in Rust to make this possible. We also don't want to make the type system situation too hard to understand, unfortunately.

@multimeric
Copy link

Okay I'll move the discussion of this lazy eval into a new issue for now so we can get this PR merged.

@bluss
Copy link
Member

bluss commented Nov 8, 2021

I wanted to say, if you can, please use rebase and not merges when updating the PR branch, we don't want merges crossing. There are of course exceptions to every rule, but in general this is what we want to do. I've force-pushed the branch with a rebase, since it was easy to do. (This is possible when maintainers are allowed to edit.) Feel free to squash together commits to clean up history if you'd like.

@bluss
Copy link
Member

bluss commented Nov 12, 2021

In the spirit of issue #415 we might prefer to expose these methods with a bound like A: 'static + Float. 'static is for the moment the key to making ad-hoc type specialization (this is how we do it for matmul). I.e if we need to explicitly special case f32, f64 etc.

@ethanhs
Copy link
Contributor

ethanhs commented Nov 16, 2021

Just to add this also solves #1047 I think.

@bluss
Copy link
Member

bluss commented Dec 7, 2021

fwiw, no rush, this PR is waiting on updates from the author

@killme2008
Copy link

Any update of this PR?

@fgsch
Copy link

fgsch commented Sep 21, 2022

This would be a very useful addition.

Is there any more work that needs to happen so this can be merged?

@danjenson
Copy link

Looking forward to this...

@AnotherCoolDude
Copy link

What's the status in this PR? Would be great to have all those methods implemented...

@nilgoyette
Copy link
Collaborator

Same as last time: "this PR is waiting on updates from the author"

Since this MR has been discussed and reviewed, I wouldn't mind merging it if @KmolYuan updates this MR.

@KmolYuan
Copy link
Contributor Author

Same as last time: "this PR is waiting on updates from the author"

Since this MR has been discussed and reviewed, I wouldn't mind merging it if @KmolYuan updates this MR.

Sure, if the API & docs are accepted, I agree to merge this PR.

@nilgoyette
Copy link
Collaborator

@KmolYuan What about the clamp function? Shouldn't we use the one from std?

clamp is stable since 1.50 and the minimum Rust version in this project is 1.51.

@KmolYuan
Copy link
Contributor Author

@KmolYuan What about the clamp function? Shouldn't we use the one from std?

clamp is stable since 1.50 and the minimum Rust version in this project is 1.51.

Thanks, I'll use num_traits::clamp for generic floating number types.

@nilgoyette
Copy link
Collaborator

@adamreichold Any idea why the CI tests fail? You repaired them in the last merged MR, so I would have thought that the tests would still be ok. It's still using 1.51 and the errors are not related to this MR.

@adamreichold
Copy link
Collaborator

@adamreichold Any idea why the CI tests fail? You repaired them in the last merged MR, so I would have thought that the tests would still be ok. It's still using 1.51 and the errors are not related to this MR.

These errors are due to new lints introduced into rustc itself since then. It does check pub use in addition to use now and checks against collision with possibly upcoming but still unstable names. I think we have to fix the former and suppress the latter for now, c.f. #1337

@nilgoyette
Copy link
Collaborator

Thank you @KmolYuan, and sorry for the delay.

@nilgoyette nilgoyette merged commit fa57078 into rust-ndarray:master Dec 6, 2023
0 of 7 checks passed
@AnotherCoolDude
Copy link

I am sorry to reopen this discussion. For some reason, my project doesn't pick up the merged changes. I would expect a version bump that tells cargo to update the package. I could probably refer to the commit specifically, but that doesn't feel right.
Is that a mistake on my side?

@adamreichold
Copy link
Collaborator

I assume you are using a Git dependency against our default master branch here? If so, running cargo update should suffice to use the latest version. (Even Git dependencies are locked via Cargo.lock. There is no updated version available on crates.io yet.)

@AnotherCoolDude
Copy link

I did { version = "0.15.6", features = ["serde"] }, which doesn't pick up the changes. When I change that to { git = "https://github.com/rust-ndarray/ndarray", features = ["serde"] } and hit cargo update, I still don't receive the latest changes, and it creates conflicts with ndarray_rand in addition to that (cargo.lock shows two entries for ndarray at that point, maybe that's the reason).

If there isn't an easy fix for that, I'll implement the changes myself. But that won't be as sophisticated, and I'd like to make use of the PR. But I also don't want to waste your time with my issues (which are not related to ndarray specifically).

@adamreichold
Copy link
Collaborator

I think you want to use something like

[patch.crates-io]
ndarray = { git = "https://github.com/rust-ndarray/ndarray" }

to manage the indirect dependencies, c.f. https://doc.rust-lang.org/cargo/reference/overriding-dependencies.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide element-wise math functions for floats