-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
different sets in high LD #114
Comments
If r2 = 1, the two corresponding z-scores should be identical if they match the original data. I assume in your case they will be a bit different; but how different are they? Just to double-check, 1) are the two variants in different credible sets both have high PIP close to 1 and the corresponding CS only contain one variant? 2) have you used I think what you did make sense but I'm trying to see if we can solve it with a less heuristic approach. Could you also double-check the estimated residual variance? We have seen in the past that residual estimates ended up quite small as many CS having only one variant were generated for L = 10 (a large L). It would also help if it is possible to share an example data for us to take a look at (you can remove the variant ID so we just get a matrix of numbers without knowing what the data is about) |
@gaow Do you think this could be resolved with a better intialization? |
@pcarbo I think a better (or at least alternative) initialization is what @chr1swallace has done; and I'd like to get some additional information and understand the problem better before deciding if initialization is all we need to work this out. |
Thanks for the quick reply (and for the great work)!
The data are public, so happy to share, but it takes a while to run. I
can produce the same result thinning to SNPs with smaller p values
(<0.1) - is that useful to you?
Re your questions, the sets are
s$sets$cs
$L2
[1] 1156
$L1
[1] 1159 1160 1161 1164 1165 1166 1167
lapply(s$sets$cs, function(i) z[i])
$L2
21:39092998
11.67021
$L1
21:39093586 21:39093608 21:39093975 21:39094373 21:39094542
21:39094644
-11.83871 -11.83957 -11.86096 -11.91935 -11.86631 -
11.87701
21:39094818
-11.83957
LD[s$sets$cs[[2]],s$sets$cs[[1]]]
21:39093586 21:39093608 21:39093975 21:39094373 21:39094542
21:39094644
1 1 1 1 1 1
21:39094818
1
So z scores are similar but not identical, while the (reference
population) r2=1.
I set z_ld_weight=1/503 because the ref pop has 503 individuals.
Perhaps this is just stupidly small and I should use a larger
reference population?
…On Mon, 2020-12-14 at 06:57 -0800, gaow wrote:
If r2 = 1, the two corresponding z-scores should be identical if they
match the original data. I assume in your case they will be a bit
different; but how different are they? Just to double-check, 1) are
the two variants in different credible sets both have high PIP close
to 1 and the corresponding CS only contain one variant? 2) have you
used z_ld_weight in susie_rss function call to try accounting for
small discrepancy between LD reference and z-scores?
I think what you did make sense but I'm trying to see if we can solve
it with a less heuristic approach. Could you also double-check the
estimated residual variance? We have seen in the past that residual
estimates ended up quite small as many CS having only one variant
were generated for L = 10 (a large L). It would also help if it is
possible to share an example data for us to take a look at (you can
remove the variant ID so we just get a matrix of numbers without
knowing what the data is about)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
It's better to have the dataset without thinning. I'll take a look and thining SNPs if it takes a long time to run. |
I notice the z scores have opposite signs. Is it possible it is an allele switch issue or something like that? |
That's a good spot, I'll investigate
http://chr1swallace.github.io
…On 14 Dec 2020, 17:22, at 17:22, stephens999 ***@***.***> wrote:
I notice the z scores have opposite signs. Is it possible it is an
allele switch issue or something like that?
Is the correlation between them in the reference correlation matrix
(R) 1 or -1? [just to be clear, note that the entries in the matrix R
should be the correlations, not r2]
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#114 (comment)
|
That fixes it - an ambiguous G/C SNP that I failed to properly align. Thank you so much, and apologies for not spotting that myself!
On Mon, 2020-12-14 at 17:29 +0000, Chris Wallace
wrote:
… That's a good spot, I'll investigate
http://chr1swallace.github.io
On 14 Dec 2020, at 17:22, stephens999 ***@***.***>
wrote:
> I notice the z scores have opposite signs. Is it possible it is an
> allele switch issue or something like that?
> Is the correlation between them in the reference correlation matrix
> (R) 1 or -1? [just to be clear, note that the entries in the matrix
> R should be the correlations, not r2]
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or unsubscribe.
>
|
Occasionally, I get a run of susie_rss() that puts variants in different credible sets although they have high LD (perhaps r2=1). When this happens, I can resolve this issue - ie force those high LD sets to merge, by rerunning with L=length(s$sets$cs)-1 where s is the output of the first susie_rss(). But is this sensible, is there a better way?
NB I do not have access to the original genotype data, so I am using an LD matrix from a reference population rather than the sample population. Whilst I think it is a good match, I realise this could cause issues.
The text was updated successfully, but these errors were encountered: