-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: stats.resampling: automatically detect whether statistic is vectorized #16651
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Matt, I am +1 here. Seems sensible as it's standard to use axis
for that and I am not aware of other use/conventions.
scipy/stats/tests/test_resampling.py
Outdated
@@ -638,6 +665,35 @@ def statistic(x, axis): | |||
assert_allclose(res.statistic, expected.statistic) | |||
assert_allclose(res.pvalue, expected.pvalue, atol=self.atol) | |||
|
|||
def test_vectorized(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All tests are very similar. Did you try to parametrize or even have a if
for options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to mention it - yes, I did try to write one test for all three functions (you can see how I used fun=...
), but it got unwieldy. It would have been fewer lines, but more complicated than I thought it was worth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok because I would have thought that just having fun, options, rvs
as param should be enough.
If you still prefer to keep 3 tests, fine. Just maybe only define once the 2 functions statistic
. Also options
does not need to be a function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would not, if you'd permit me to use the global random state or a simple seed. In fact, if I could do that, I'd be happy to try to combine all the tests again! iIRC the main difficulty in combining them was because I have to worry about passing identical Generators to multiple calls of the function - but the functions have different needs.
But fat chance of that! : P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, I can combine the tests if we don't check that the results match - only that the assertions don't fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the generator part. But fine with me not to check the results as we have other tests for all that.
I would suggest we don't use vectorization as a feature name in the future because it has a different meaning. This also happened in the recent QR related pull request and batch processing is not vectorization per se. I am not opposed to it but just saying. |
Do you mean that you would prefer to make a distinction between how the processor performs the computation (e.g. SIMD is "vectorized") and how the user writes the code (operating on a whole array at once is not "vectorized")? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me @mdhaber!
@@ -265,10 +269,14 @@ def bootstrap(data, statistic, *, n_resamples=9999, batch=None, | |||
`statistic`. Memory usage is O(`batch`*``n``), where ``n`` is the | |||
sample size. Default is ``None``, in which case ``batch = n_resamples`` | |||
(or ``batch = max(n_resamples, n)`` for ``method='BCa'``). | |||
vectorized : bool, default: ``True`` | |||
vectorized : bool, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I didn't actually change any defaults to None
. Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not sure about this. But it makes sense and should be more robust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can manage this without a deprecation warning:
- If the user was passing either
vectorized=True
orvectorized=False
explicitly, nothing changes. - If the user was relying on the
bootstrap
defaultvectorized=True
, then (if their code is working) theirstatistic
must already haveaxis
and be vectorized. - If the user was relying on the
monte_carlo_test
orpermutation_test
defaultvectorized=False
, then nothing changes unless theirstatistic
1) has a keyword argument calledaxis
and 2) is not actually vectorized. I wouldn't expect many cases of this, and rather than silently producing the wrong result, they'll get an error because the results of the statistic will not be the right shape (unless they've done something really wonky that I'm having trouble predicting). We'll also put this in the release notes so that users can watch out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes agreed it should be transparent for users because we expected axis
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright LGTM, let's move forward. Thanks Matt. CI failure is unrelated.
Reference issue
raphaelvallat/pingouin#189
What does this implement/fix?
@raphaelvallat commented:
The answer is yes, and this PR implements it.
Additional information
Please review when you have a moment, @raphaelallat!