ENH: stats.resampling: automatically detect whether statistic is vectorized #16651

mdhaber · 2022-07-20T02:40:02Z

Reference issue

What does this implement/fix?

I do love the vectorized argument in scipy.stats.bootstrap, but I think for the (many) Python users that are not familiar with vectorization, this can be a little difficult to understand and use ( it took me a few trials to understand how to configure the axes, etc). Do you think that there might be a way to automatically detect if a function is vectorizable? Like if there is an axis keyword-argumnt, like in most numpy/scipy functions?

The answer is yes, and this PR implements it.

Additional information

Please review when you have a moment, @raphaelallat!

…orized

scipy/stats/tests/test_resampling.py

tupui

Thanks Matt, I am +1 here. Seems sensible as it's standard to use axis for that and I am not aware of other use/conventions.

tupui · 2022-07-20T07:10:10Z

scipy/stats/tests/test_resampling.py

@@ -638,6 +665,35 @@ def statistic(x, axis):
        assert_allclose(res.statistic, expected.statistic)
        assert_allclose(res.pvalue, expected.pvalue, atol=self.atol)

+    def test_vectorized(self):


All tests are very similar. Did you try to parametrize or even have a if for options?

I was going to mention it - yes, I did try to write one test for all three functions (you can see how I used fun=...), but it got unwieldy. It would have been fewer lines, but more complicated than I thought it was worth.

ok because I would have thought that just having fun, options, rvs as param should be enough.

If you still prefer to keep 3 tests, fine. Just maybe only define once the 2 functions statistic. Also options does not need to be a function.

It would not, if you'd permit me to use the global random state or a simple seed. In fact, if I could do that, I'd be happy to try to combine all the tests again! iIRC the main difficulty in combining them was because I have to worry about passing identical Generators to multiple calls of the function - but the functions have different needs.

But fat chance of that! : P

Alternatively, I can combine the tests if we don't check that the results match - only that the assertions don't fail.

Not sure about the generator part. But fine with me not to check the results as we have other tests for all that.

ilayn · 2022-07-20T13:20:23Z

I would suggest we don't use vectorization as a feature name in the future because it has a different meaning. This also happened in the recent QR related pull request and batch processing is not vectorization per se. I am not opposed to it but just saying.

mdhaber · 2022-07-20T14:20:01Z

batch processing is not vectorization per se

Do you mean that you would prefer to make a distinction between how the processor performs the computation (e.g. SIMD is "vectorized") and how the user writes the code (operating on a whole array at once is not "vectorized")?

raphaelvallat

Looks good to me @mdhaber!

mdhaber · 2022-07-20T15:31:53Z

scipy/stats/_resampling.py

@@ -265,10 +269,14 @@ def bootstrap(data, statistic, *, n_resamples=9999, batch=None,
        `statistic`. Memory usage is O(`batch`*``n``), where ``n`` is the
        sample size. Default is ``None``, in which case ``batch = n_resamples``
        (or ``batch = max(n_resamples, n)`` for ``method='BCa'``).
-    vectorized : bool, default: ``True``
+    vectorized : bool, optional


Oops, I didn't actually change any defaults to None. Will do.

I was not sure about this. But it makes sense and should be more robust.

I think we can manage this without a deprecation warning:

If the user was passing either vectorized=True or vectorized=False explicitly, nothing changes.

If the user was relying on the bootstrap default vectorized=True, then (if their code is working) their statistic must already have axis and be vectorized.

If the user was relying on the monte_carlo_test or permutation_test default vectorized=False, then nothing changes unless their statistic 1) has a keyword argument called axis and 2) is not actually vectorized. I wouldn't expect many cases of this, and rather than silently producing the wrong result, they'll get an error because the results of the statistic will not be the right shape (unless they've done something really wonky that I'm having trouble predicting). We'll also put this in the release notes so that users can watch out.

Yes agreed it should be transparent for users because we expected axis.

tupui

Alright LGTM, let's move forward. Thanks Matt. CI failure is unrelated.

ENH: stats.resampling: automatically detect whether statistic is vect…

ed1a194

…orized

mdhaber added scipy.stats enhancement A new feature or improvement labels Jul 20, 2022

mdhaber commented Jul 20, 2022

View reviewed changes

scipy/stats/tests/test_resampling.py Outdated Show resolved Hide resolved

mdhaber mentioned this pull request Jul 20, 2022

Use scipy.stats.bootstrap in pingouin.compute_bootci? raphaelvallat/pingouin#189

Open

tupui reviewed Jul 20, 2022

View reviewed changes

tupui added this to the 1.10.0 milestone Jul 20, 2022

raphaelvallat approved these changes Jul 20, 2022

View reviewed changes

mdhaber commented Jul 20, 2022

View reviewed changes

mdhaber added 2 commits July 20, 2022 15:12

TST: stats.resampling: combine tests

e2667a1

ENH: stats.resampling: change default vectorized=None

10f74f7

tupui approved these changes Jul 20, 2022

View reviewed changes

tupui merged commit 7ab63a5 into scipy:main Jul 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: stats.resampling: automatically detect whether statistic is vectorized #16651

ENH: stats.resampling: automatically detect whether statistic is vectorized #16651

mdhaber commented Jul 20, 2022

tupui left a comment

tupui Jul 20, 2022

mdhaber Jul 20, 2022 •

edited

Loading

tupui Jul 20, 2022

mdhaber Jul 20, 2022 •

edited

Loading

mdhaber Jul 20, 2022

tupui Jul 20, 2022

ilayn commented Jul 20, 2022

mdhaber commented Jul 20, 2022 •

edited

Loading

raphaelvallat left a comment

mdhaber Jul 20, 2022

tupui Jul 20, 2022

mdhaber Jul 20, 2022

tupui Jul 20, 2022

tupui left a comment

ENH: stats.resampling: automatically detect whether statistic is vectorized #16651

ENH: stats.resampling: automatically detect whether statistic is vectorized #16651

Conversation

mdhaber commented Jul 20, 2022

Reference issue

What does this implement/fix?

Additional information

tupui left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdhaber Jul 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdhaber Jul 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilayn commented Jul 20, 2022

mdhaber commented Jul 20, 2022 • edited Loading

raphaelvallat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tupui left a comment

Choose a reason for hiding this comment

mdhaber Jul 20, 2022 •

edited

Loading

mdhaber Jul 20, 2022 •

edited

Loading

mdhaber commented Jul 20, 2022 •

edited

Loading