different sets in high LD #114

chr1swallace · 2020-12-14T14:12:31Z

Occasionally, I get a run of susie_rss() that puts variants in different credible sets although they have high LD (perhaps r2=1). When this happens, I can resolve this issue - ie force those high LD sets to merge, by rerunning with L=length(s$sets$cs)-1 where s is the output of the first susie_rss(). But is this sensible, is there a better way?

NB I do not have access to the original genotype data, so I am using an LD matrix from a reference population rather than the sample population. Whilst I think it is a good match, I realise this could cause issues.

gaow · 2020-12-14T14:56:54Z

If r2 = 1, the two corresponding z-scores should be identical if they match the original data. I assume in your case they will be a bit different; but how different are they? Just to double-check, 1) are the two variants in different credible sets both have high PIP close to 1 and the corresponding CS only contain one variant? 2) have you used z_ld_weight in susie_rss function call to try accounting for small discrepancy between LD reference and z-scores?

I think what you did make sense but I'm trying to see if we can solve it with a less heuristic approach. Could you also double-check the estimated residual variance? We have seen in the past that residual estimates ended up quite small as many CS having only one variant were generated for L = 10 (a large L). It would also help if it is possible to share an example data for us to take a look at (you can remove the variant ID so we just get a matrix of numbers without knowing what the data is about)

pcarbo · 2020-12-14T15:33:21Z

@gaow Do you think this could be resolved with a better intialization?

gaow · 2020-12-14T15:49:57Z

@pcarbo I think a better (or at least alternative) initialization is what @chr1swallace has done; and I'd like to get some additional information and understand the problem better before deciding if initialization is all we need to work this out.

chr1swallace · 2020-12-14T15:57:21Z

Thanks for the quick reply (and for the great work)! The data are public, so happy to share, but it takes a while to run. I can produce the same result thinning to SNPs with smaller p values (<0.1) - is that useful to you? Re your questions, the sets are

s$sets$cs

$L2 [1] 1156 $L1 [1] 1159 1160 1161 1164 1165 1166 1167

lapply(s$sets$cs, function(i) z[i])

$L2 21:39092998 11.67021 $L1 21:39093586 21:39093608 21:39093975 21:39094373 21:39094542 21:39094644 -11.83871 -11.83957 -11.86096 -11.91935 -11.86631 - 11.87701 21:39094818 -11.83957

LD[s$sets$cs[[2]],s$sets$cs[[1]]]

21:39093586 21:39093608 21:39093975 21:39094373 21:39094542 21:39094644 1 1 1 1 1 1 21:39094818 1 So z scores are similar but not identical, while the (reference population) r2=1. I set z_ld_weight=1/503 because the ref pop has 503 individuals. Perhaps this is just stupidly small and I should use a larger reference population?

…

On Mon, 2020-12-14 at 06:57 -0800, gaow wrote: If r2 = 1, the two corresponding z-scores should be identical if they match the original data. I assume in your case they will be a bit different; but how different are they? Just to double-check, 1) are the two variants in different credible sets both have high PIP close to 1 and the corresponding CS only contain one variant? 2) have you used z_ld_weight in susie_rss function call to try accounting for small discrepancy between LD reference and z-scores? I think what you did make sense but I'm trying to see if we can solve it with a less heuristic approach. Could you also double-check the estimated residual variance? We have seen in the past that residual estimates ended up quite small as many CS having only one variant were generated for L = 10 (a large L). It would also help if it is possible to share an example data for us to take a look at (you can remove the variant ID so we just get a matrix of numbers without knowing what the data is about) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

zouyuxin · 2020-12-14T16:51:57Z

It's better to have the dataset without thinning. I'll take a look and thining SNPs if it takes a long time to run.
Could you briefly describe the background about your data? How did you get z scores and what's the reference panel?

stephens999 · 2020-12-14T17:21:36Z

I notice the z scores have opposite signs. Is it possible it is an allele switch issue or something like that?
Is the correlation between them in the reference correlation matrix (R) 1 or -1? [just to be clear, note that the entries in the matrix R should be the correlations, not r2]

chr1swallace · 2020-12-14T17:29:07Z

That's a good spot, I'll investigate ⁣http://chr1swallace.github.io

…

On 14 Dec 2020, 17:22, at 17:22, stephens999 ***@***.***> wrote: I notice the z scores have opposite signs. Is it possible it is an allele switch issue or something like that? Is the correlation between them in the reference correlation matrix (R) 1 or -1? [just to be clear, note that the entries in the matrix R should be the correlations, not r2] -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: #114 (comment)

chr1swallace · 2020-12-14T18:49:03Z

That fixes it - an ambiguous G/C SNP that I failed to properly align. Thank you so much, and apologies for not spotting that myself! On Mon, 2020-12-14 at 17:29 +0000, Chris Wallace wrote:

…

That's a good spot, I'll investigate http://chr1swallace.github.io On 14 Dec 2020, at 17:22, stephens999 ***@***.***> wrote: > I notice the z scores have opposite signs. Is it possible it is an > allele switch issue or something like that? > Is the correlation between them in the reference correlation matrix > (R) 1 or -1? [just to be clear, note that the entries in the matrix > R should be the correlations, not r2] > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub, or unsubscribe. >

chr1swallace closed this as completed Dec 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

different sets in high LD #114

different sets in high LD #114

chr1swallace commented Dec 14, 2020

gaow commented Dec 14, 2020

pcarbo commented Dec 14, 2020

gaow commented Dec 14, 2020

chr1swallace commented Dec 14, 2020 via email

zouyuxin commented Dec 14, 2020

stephens999 commented Dec 14, 2020

chr1swallace commented Dec 14, 2020 via email

chr1swallace commented Dec 14, 2020 via email

different sets in high LD #114

different sets in high LD #114

Comments

chr1swallace commented Dec 14, 2020

gaow commented Dec 14, 2020

pcarbo commented Dec 14, 2020

gaow commented Dec 14, 2020

chr1swallace commented Dec 14, 2020 via email

zouyuxin commented Dec 14, 2020

stephens999 commented Dec 14, 2020

chr1swallace commented Dec 14, 2020 via email

chr1swallace commented Dec 14, 2020 via email