New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skipped correlation gives different results than the Pernet implementation in Matlab #164
Comments
Hi @adamnarai, I coded this function a long time ago, but from what I remember it is essentially an (optimized) translation of the Matlab code from the robust correlation toolbox by Cyril Pernet and Guillaume Rousselet. Furthermore, the output of the function is tested against the original toolbox: pingouin/pingouin/tests/test_correlation.py Lines 22 to 25 in 91b1847
That said, if you have any idea for a better implementation, and/or if you find inconsistent results with the robust corr toolbox then I'm all in for reworking this function. Thanks! |
Hi @raphaelvallat, This test yields the same result as the robust correlation toolbox, however I can create a different test data, where it's not the case. For example the data generated by replacing line 17 with the following: x[3], y[5] = 7, 2.6 yields I would suggest replacing this line: pingouin/pingouin/correlation.py Line 82 in 91b1847
with dis[i, :] = np.linalg.norm(B.dot(B[i, :, None]) * B[i, :] / bot[i], axis=1) By applying this change, I found the results consistent with the robust correlation toolbox. Best, |
Hi @adamnarai, Thanks for looking into that! You were right, I have now updated the line and unit testing in this commit: ce7545d I will release a new version of Pingouin in the next couple of weeks. |
Hi @raphaelvallat, I still found some differences in the number of outliers between the two implementations when applied on real data (even 5 vs. 1 outlier in one case). As an example, using the following values in the unit test (line 18): x[3], y[5] = 20, 3.5 I get Best, |
Hi @adamnarai, Re-opening this issue for visibility. You're right, I just checked the output of MCD with the example you provide and Matlab returns: I think we could implement the following fix which gives Let me know what you think! |
I just pushed a commit with these changes on the develop branch if you want to try it out on real-world data: f2deb57 |
Hi @raphaelvallat, I think the normalization only affects the covariance and not the location and there is something else causing this difference. I will try to look more into it if my time allows. By using the Best, |
@adamnarai Ah, you're absolutely right, my bad. I've just tried with the following (inspired from sklearn): X = np.column_stack((x, y))
nrows, ncols = X.shape
center, cov, support, dist = fast_mcd(X, cov_computation_method=lambda x: np.cov(x, rowvar=False))
correction = np.median(dist) / chi2(X.shape[1]).isf(0.5)
dist /= correction
mask = dist < chi2(2).isf(0.025)
print(X[mask].mean(0)) But I still get the same center values as sklearn.MinCovDet, meaning that indeed the normalization has no effect on the location. I'm reverting the commit now. I've tried playing around with different Best, |
I agree, at this point that's the best we can realistically do. For future reference: Best, |
Warning added in e56df01. This will be included in the next stable release of Pingouin. Thanks again for looking into this, |
The warning has been included in the new stable version of Pingouin (v0.4.0). Please make sure to upgrade with Thanks, |
I think this line in
correlation.py
:is functionally different compared to the Matlab implementation and the equation in Wilcox, R. (2004)
The text was updated successfully, but these errors were encountered: