Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

susie_rss gives different results using different R #132

Closed
Zepeng-Mu opened this issue Jun 30, 2021 · 7 comments
Closed

susie_rss gives different results using different R #132

Zepeng-Mu opened this issue Jun 30, 2021 · 7 comments

Comments

@Zepeng-Mu
Copy link

Hello,
I've been trying to run susie_rss using different R matrices.
The first version I used is the correlation matrix of genotype count from a reference panel. Part of which looks like:

            [,1]        [,2]        [,3]        [,4]        [,5]
[1,]  1.00000000  1.00000000 -0.09664246  1.00000000  1.00000000
[2,]  1.00000000  1.00000000 -0.09664246  1.00000000  1.00000000
[3,] -0.09664246 -0.09664246  1.00000000 -0.09664246 -0.09664246
[4,]  1.00000000  1.00000000 -0.09664246  1.00000000  1.00000000
[5,]  1.00000000  1.00000000 -0.09664246  1.00000000  1.00000000

In this way I did not need to scale or center the count matrix, because cor gives the same output. The result makes much sense to me:
Screen Shot 2021-06-30 at 13 37 20

I also saw in package documentation that the R option can be:

A p by p symmetric, positive semidefinite matrix.
It can be X'X, the covariance matrix X'X/(n-1), or a correlation matrix.
It should be estimated from the same samples used to compute bhat and shat.
Using an out-of-sample matrix may produce unreliable results.

So the second matrix I used is t(geno) %*% geno, where geno is the nxp matrix of counts that is centered and scaled. The matrix looks like this:

           1:37312926 1:37312963 1:37313050 1:37313060 1:37313086
1:37312926  3201.0000  3201.0000  -309.3525  3201.0000  3201.0000
1:37312963  3201.0000  3201.0000  -309.3525  3201.0000  3201.0000
1:37313050  -309.3525  -309.3525  3201.0000  -309.3525  -309.3525
1:37313060  3201.0000  3201.0000  -309.3525  3201.0000  3201.0000
1:37313086  3201.0000  3201.0000  -309.3525  3201.0000  3201.0000

This matrix only differs from the first one by a factor of 3201, as I have 3202 individuals in the 1KG reference panel. So I expected the result from susie_rss would be the same as before. However, using this matrix I got no CS and PIP is flat.
Screen Shot 2021-06-30 at 13 36 58

I'm wondering what can lead to such huge difference in PIP when the R is very similar. Should I just use the correlation matrix that is directly calculated from genotype matrix?

Thanks so much!

@zouyuxin
Copy link
Member

In susie_rss documentation, R is a p by p correlation matrix, so you should use the correlation matrix. If you multiply R by 3201, the z scores should multiply by sqrt(3201) to get the same result.

@stephens999
Copy link
Contributor

stephens999 commented Jun 30, 2021 via email

@pcarbo
Copy link
Member

pcarbo commented Jun 30, 2021

@stephens999 It is in susie_ss.R.

@Zepeng-Mu
Copy link
Author

Hi, I'm using version "0.11.43" of susieR and found that documentation in ?susie_suff_stat

@zouyuxin
Copy link
Member

The documentation you quoted is for susie_suff_stat, which is different from susie_rss. If you are using susie_rss, please check ?susie_rss.

@stephens999
Copy link
Contributor

@zouyuxin so in susie_suff_stat I find the documentation of the R matrix confusing.
In the details it says R = (1/(n-1))X'X
But in the documentation for R it says it can also be X'X.
why is that?

@zouyuxin
Copy link
Member

zouyuxin commented Jul 1, 2021

In susie_suff_stat, the R is converted to a correlation matrix internally. I agree it is clearer to require R to be a correlation matrix in the documentation. I will update the documentation.

@gaow gaow closed this as completed Nov 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants