-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ash shrinkage applied to correlation matrix #130
Comments
Hi @dariober ! This is more or less the idea implemented in Kushal Dey's CorShrink package, which is described here: https://www.biorxiv.org/content/10.1101/368316v2. Please have a look and let us know your thoughts! |
can i ask why it is a "matrix" of correlation coefficients? It suggests some additional structure that maybe could |
@willwerscheid, @stephens999 thanks a lot guys it looks like I was going to reinvent the wheel and I feel a bit overwhelmed by options now!
Here's more detail: For each of N=27 patients I have gene expression for 32 biomarkers and 47 genes. For each combination of biomarker and gene I calculate the correlation coefficient. So the matrix has dimensions 32 x 47 and each cell is the correlation. All this measured in tissue "BM". In addition, for a 15 of these 27 patients I have the same biomarkers and genes measured in tissue "PBMC". So I have another 32 x 47 matrix of correlation coefficients. I was thinking to keep the data for the two tissues separate but I'm sure there are better ways of handling it. As a side note: It appears CorShrink package has been removed from cran and archived. |
ok, yes, i would start by applying the corshrink approach to each vector (formed from the matrix for each tissue ) in turn. it could be interesting to look for patterns of "sharing" of effects across the biomarkers, but with data on only 47 genes that could be difficult to get reliable estimates.... If the biomarkers and/or genes are highly correlated with one another you might need to be careful in interpreting results from ashr as it assumes independence of the different measurements you give it. |
Hello again - I experimented with the various options and here's what I got. This is considering only tissue "BM", roughly the same applies to PBMC. Applying My method of using raw correlations and standard error as se = (1-cor^2)/sqrt((N-2)) passed to I also tried
In both cases, the input is the fisher-transformed correlations with standard errors as In the figure, each biomarker is coloured differently (so 32 colours!). Going by gut feeling, cor_mash_ed seems the most sensible to me since it accounts for the fact that there are 32 groups (biomarkers) and 47 conditions (genes) in contrast to CorShrink and "my" solution that treat each of 1504 correlations as distinct entries. Am I right in this? cor_mash_canonical looks odd in that correlations for many biomarkers collapse to basically the same value (the horizontal stripes of the same colour). An awkward feature of using Finally, there is the question of using the raw correlations and standard errors or the Fisher-converted correlations. Any thoughts are very welcome and thanks again! |
interesting. I looked a bit more at the issue of Fisher-converged or raw correlations. For this reason I suspect the transformed approach is preferable. In any case it may explain why the results Applying mashr to the matrix X or its transpose X' are not making the same assumptions, so won't give the same answer. In your case I honestly worry about mashr because it really is best suited to "tall thin" matrices (ie many independent observations on a few correlated conditions). In typical applications the number of rows is many 1000s... |
Thanks @stephens999 your explanation makes sense. Now, my next step would be to apply some clustering and heatmap plotting to the correlation matrix to see if there are patterns of co-variation between genes and biomarkers. It's an approach that I don't particularly endorse since it makes it very easy to overinterpret streaks originating from random noise, anyway... I was hoping to get more meaningful results after shrinkage but if all correlations collapse to zero there is nothing you can do. I'll have a think on how to proceed, if you have any suggestions, please post! |
Hi- I'd like to ask your advice on the following situation.
I computed a 32x47 matrix of Pearson correlation coefficients and now I would like to apply shrinkage to these coefficients. If it matters, the histogram of coefficients is here:
For each coefficient I calculated the standard error using the formula se = (1-cor^2)/sqrt((N-2)).
Is it sensible to apply
ash
to the vector 1504 correlations? Is there anything obviously wrong? Any thoughts much appreciated!The text was updated successfully, but these errors were encountered: