-
Notifications
You must be signed in to change notification settings - Fork 28
Weighted mean #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weighted mean #51
Conversation
I think we should mimic the numpy behavior here and not assume the weights are pre-normalized. I think we should divide by the weight sum inside the |
I'd say it makes sense for them to go there, given that it's where
It makes perfect sense.
I don't feel too strongly in either direction - if you feel like doing it, all the better 👍
I'd say that it's standard for integration tests to go outside the crate, yes. On the other side, it's sometimes nice to have tests next to the code they refer to - we haven't been very consistent across the crate. It would probably makes sense to migrate them to the Overall, this looks good to me, thanks for working on it! I suggested one area of improvement around testing that would increase the robustness of our current check - let me know what you think about it @nilgoyette. |
I am not sure - I am concerned about introducing rounding errors doing a normalization step that might or might not be necessary. |
I see your point and I tend to agree. However, my concern is that the
Thoughts? |
I think that makes perfect sense 👍 |
src/summary_statistics/mod.rs
Outdated
|
||
/// Returns the [`arithmetic weighted mean`] x̅ along `axis`. Assumes that the weights are | ||
/// already normalized. | ||
/// Like `weighted_mean`, but assumes that the `weights` are normalized. In that case, this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just describe what a weighted sum is and then point to the equivalence with weighted_mean if the weights are normalised.
@@ -28,13 +28,17 @@ where | |||
where | |||
A: Clone + FromPrimitive + Add<Output = A> + Div<Output = A> + Zero; | |||
|
|||
/// Returns the [`arithmetic weighted mean`] x̅ of all elements in the array. Assumes that the | |||
/// weights are already normalized. | |||
/// Returns the [`arithmetic weighted mean`] x̅ of all elements in the array. Use `weighted_sum` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the weights sum to 0
? I assume the function panics:
- we should add it to the docs;
- we should add a test for it.
Same story for the _axis
version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I forgot about this. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, it panics on integer and it returns a NaN on floating point, which is quite normal. Do we want to panic on floating point numbers or we let the users learn the hard facts of IEEE754?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would let them know the hard facts, unfortunately 😂
We could use something along the same lines of what we say for mean_axis
in ndarray
:
Panics if axis is out of bounds or if the length of the axis is zero and division by zero panics for type A.
Which makes me think that we need to add a test and a note about zero-length axis on both weighted_mean_axis
and weighted_sum_axis
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't understand this part. How can "the length of the axis" be zero? Are we talking about a Array0
(single number)? Surely the len of an Array0
is 1 and not 0? Or it's considered a data point and it has no length?
Also, reading ndarray
's version, it seems like it's saying "Panics if x OR (y AND z)" Should it be all "or"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can have an Array2
with shape [0, n]
, where n
is strictly greater than 0.
"Panics if x OR (y AND z)"
That is indeed correct - if the axis lenght is 0
, but division by zero doesn't panic for A
(e.g. floats), then the method does not panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to disturb you again, but I'm lost here.
- Those zero-length dimensions don't make sense to me. It looks like they are uselessly complex empty arrays. Do you have more information on them? Because I can't code something for a "feature" that I don't understand.
- I removed the
* MultiInputError::EmptyInput if self is empty
error onweighted_mean_axis
andweighted_sum_axis
because I don't understand why they should return an error when the array is empty. It does seem to make sense forweighted_mean_axis
because we divide all lanes by weights_sum, but there's no lane to divide anyway. - I can't respect
Panics if axis is out of bounds or if the length of the axis is zero and division by zero panics for type A.
because my functions usemap_axis
, which panics with an error not defined inmap_axis
documentation.
let a = Array2::<f32>::zeros((0, 20));
a.mean_axis(Axis(0)); // pass
a.weighted_mean_axis(Axis(0), &Array1::zeros(0)).unwrap(); // fail
panicked at 'collapse_axis: Index 0 must be less than axis length 0 for array with shape [0, 20]
Of course I could try to use something else (there are other ways to code weighted_sum_axis
), but I don't see why I should, mostly because of 1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issue at all, happy to reach consensus on these changes 😄
- Those zero-length dimensions don't make sense to me. It looks like they are uselessly complex empty arrays. Do you have more information on them? Because I can't code something for a "feature" that I don't understand.
I wouldn't indeed consider them a feature - they are edge cases (empty arrays with associated non-trivial dimension-related information), but given that we are writing a library we need to take care of them (or at least document what happens if you pass them in).
Would it possible to forbid them? It's a possibility, but as long as ndarray
's codebase allows them, we need to deal with them.
- I removed the
* MultiInputError::EmptyInput if self is empty
error onweighted_mean_axis
andweighted_sum_axis
because I don't understand why they should return an error when the array is empty. It does seem to make sense forweighted_mean_axis
because we divide all lanes by weights_sum, but there's no lane to divide anyway.
I agree on weighted_sum_axis
/ weighted_sum
- there is a very good default value to return (A::zero()
) and the behaviour is consistent with sum
in ndarray
.
For the mean though, it gets trickier. What is the mean of an empty array of integers? 0
? If would argue that there is no sound answer to that question, hence it makes sense to return None
. Given that those functions can fail in multiple ways (mismatched shapes), it's more ergonomic to return it as an error variant than a Result<Option<A>, ErrType>
(see mean
in ndarray
).
- I can't respect
Panics if axis is out of bounds or if the length of the axis is zero and division by zero panics for type A.
because my functions usemap_axis
, which panics with an error not defined inmap_axis
documentation.let a = Array2::<f32>::zeros((0, 20)); a.mean_axis(Axis(0)); // pass a.weighted_mean_axis(Axis(0), &Array1::zeros(0)).unwrap(); // fail panicked at 'collapse_axis: Index 0 must be less than axis length 0 for array with shape [0, 20]
Of course I could try to use something else (there are other ways to code
weighted_sum_axis
), but I don't see why I should, mostly because of 1).
This was a bug in ndarray:0.12.1
(indeed when dealing with 0
-length axes 😛). It has been fixed in 0.13
- as soon as the CI is finished in #52 I will merge it into master
, so that you can get the fix in this branch as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#52 has been merged - I'd say that we could this PR merged and then cut a release 👍
I think it's better now. I had to remove Can you please check the two first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good 👍
Thanks for working on this @nilgoyette 🙏
Here's a first version of
weighted_mean
andweighted_mean_axis
.Disclaimers:
weighted_mean
and friends should go. Issummary_statistics
ok?return_err_if_empty
andreturn_err_unless_same_shape
because ther are useful elsewhere.weighted_mean_axis
, to avoid(2 conditions + 1 unwrap) x nb_lanes
. Maybe I could create an inner function calledinner_weighted_mean
or something, then call it in both functions?Questions:
summary_statistics
tests (and others) not in/tests/*
? I thought that the public API was supposed to be tested outside the crate. Is this not a "standard"?