Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add corrections for multiple testing in correlation matrix #40

Open
Cryaaa opened this issue May 3, 2023 · 2 comments
Open

Add corrections for multiple testing in correlation matrix #40

Cryaaa opened this issue May 3, 2023 · 2 comments

Comments

@Cryaaa
Copy link

Cryaaa commented May 3, 2023

Hey @haesleinhuepf,
I was just reading @marabuuu's awesome blogpost about feature extraction and saw that you suggest using the correlation matrix implemented here. Recently I was using correlation matrices in my own project when I was reminded that there might be quite a few false positive correlation hits just by chance in a correlation matrix because we are performing so many statistical tests in such a matrix. One could fine-tune the correlation matrix by correcting for falsely significant correlations using statistical methods such as FDR correction using Bootstrapping or others implemented in statistical analysis libraries. Just thought I'd suggest it here so I don't forget!

@haesleinhuepf
Copy link
Owner

Hey @Cryaaa ,

thanks for the input!

we are performing so many statistical tests in such a matrix

Technically, the Pearson correlation coefficient does not involve any statistical test. It's a method of descriptive statistics.

FDR correction using Bootstrapping

Can you provide a link to this method? Is there by chance a python implementation?

Thanks again!

Best,
Robert

@Cryaaa
Copy link
Author

Cryaaa commented May 4, 2023

@haesleinhuepf,
Ahhhhh I just checked my code and I was using the Spearman rank correlation which might be different. I guess I was trying to replicate what other libraries in R do, where usually false detection rate corrections are implemented in the functions (usually designed for gene expression data so maybe it's a bit more crucial there). Anyway: here is a function which has a few multiple test corrections (all you need are the p_values unraveled).

I think it might also be alright to leave the function as is for a simple first test, but I could think that a correlation matrix with unsignificant results filtered out could be another option if the feature matrix get's really big. In this case FDR correction copuld make sense. I could try and code a function and test it since I have the code (almost) ready somewhere in a notebook!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants