-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature request: make scipy.stats.pearsonr
accept 2-D arrays
#9307
Comments
I know this is probably no more relevant for you, but maybe it could help someone else. |
I still do such tests. Thank you! |
At some point @swallan or @DominicChm might be vectorizing |
@Phillip-M-Feldman would gh-13312 close this issue, from your perspective? It allows |
This sounds fine.
…On Wed, Dec 30, 2020 at 6:00 PM Matt Haberland ***@***.***> wrote:
@Phillip-M-Feldman <https://github.com/Phillip-M-Feldman> would gh-13312
<#13312> close this issue, from your
perspective? It allows pearsonr to process nd-arrays, but it is just for
convenience, not speed.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9307 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIEDRF5EVUKOL5FJBVKYJTSXPLNLANCNFSM4FW7T5RQ>
.
|
I don't think gh-13312 would address this as requested, actually. I think the request is for it to work more like |
On the other hand, if the decorator in gh-13312 were applied to this function, I think the user could achieve the desired behavior by passing in arrays of shapes I won't add |
@tirthasheshpatel @tupui @Kai-Striega thought you might be interested in this one as we try to make the from scipy.stats import mannwhitneyu, spearmanr
x = np.random.rand(3, 10); y = np.random.rand(3, 10)
mannwhitneyu(x, y, axis=1).statistic.shape # (3,)
spearmanr(x, y, axis=1)[0].shape # (6, 6) It does pairwise comparisons between every row of If we hope to make the pearsonr(x, y, axis=1)[0].shape # we would want (3,), not (6, 6) If the user wants to do pairwise comparisons, then they can do: z = np.concatenate([x, y], axis=0)
pearsonr(z[:, np.newaxis, :], z[np.newaxis, :, :], axis=2)[0].shape I'd suggest that we deprecate the 2D behavior then, after the deprecation cycle, replace it. |
@tirthasheshpatel @tupui @Kai-Striega what do you think? Shall we get the deprecation in before 1.9.0 branches so we can vectorize these consistently? |
I'm +1 on this but, won't have the bandwidth to implement it |
No problem. I'll do it! |
Which other reduction functions in stats do you think should behave that way? Would |
kendalltau would be the third correlation measure (pandas treats those 3 the same, AFAIR) However, kendalltau is not "naturally" vectorized For statsmodels, I have an old PR with few more outlier robust cov/corr functions. It also depends a bit on the common usecase: cov, corr and it's variant are very often (or mostly) multivariate measures with the corresponding matrix returns. Hypothesis test functions are mainly for one, two or k- sample cases, with e.g. possibly vectorized (stratified) two-sample case. |
another distinction Most of the 2-sample tests assume independent samples, there is no underlying multivariate structure. It's just comparing different independent samples. (correlation in tukeyhsd comes from the contrast of comparisons and not from the correlation of the underlying data). In the multivariate cases like MANOVA, repeated measures ANOVA or general correlated samples we are back to having to handle the correlation across samples in a joint way. example: |
after another coffee break: I guess what I'm saying boils down to whether a function is mainly for multivariate analysis or for analysis of k (= 1, 2, 2) independent samples. IMO, corr, cov, pearsonr, kendalltau and spearmanr are in the multivariate camp, the bivariate off-diagonal element is just a special case. |
|
I used pointbiserialr is redundant and just for name recognition I never looked at weightedtau, vectorizing to correlation matrix might not gain anything computationally compared to a loop and I have no idea about use cases linregress is just used as single linear regression, nothing multivariate about it, If vectorization is available, then it can be in be parallel of separate cases. I don't know how 2-d sommersd works, My scipy version is too old |
(after another break) I would put sommersd in a similar category as the correlations measures for ordinal association. But that's purely theoretical, I've never seen a usecase like that. And I don't know if there are interesting followup analyses that would use it. |
I'd love to see
scipy.stats.pearsonr
modified to allow 2-D arrays to be passed, with one column per variable and one row per observation. The output would be a pair of 2-D arrays--one for the Pearson correlation coefficient and one for the p-value. Currently, obtaining something like this requires a pair of nested loops, withscipy.stats.pearsonr
being called once for each pair of variables.The text was updated successfully, but these errors were encountered: