Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

different sets in high LD #114

Closed
chr1swallace opened this issue Dec 14, 2020 · 8 comments
Closed

different sets in high LD #114

chr1swallace opened this issue Dec 14, 2020 · 8 comments

Comments

@chr1swallace
Copy link
Contributor

Occasionally, I get a run of susie_rss() that puts variants in different credible sets although they have high LD (perhaps r2=1). When this happens, I can resolve this issue - ie force those high LD sets to merge, by rerunning with L=length(s$sets$cs)-1 where s is the output of the first susie_rss(). But is this sensible, is there a better way?

NB I do not have access to the original genotype data, so I am using an LD matrix from a reference population rather than the sample population. Whilst I think it is a good match, I realise this could cause issues.

@gaow
Copy link
Member

gaow commented Dec 14, 2020

If r2 = 1, the two corresponding z-scores should be identical if they match the original data. I assume in your case they will be a bit different; but how different are they? Just to double-check, 1) are the two variants in different credible sets both have high PIP close to 1 and the corresponding CS only contain one variant? 2) have you used z_ld_weight in susie_rss function call to try accounting for small discrepancy between LD reference and z-scores?

I think what you did make sense but I'm trying to see if we can solve it with a less heuristic approach. Could you also double-check the estimated residual variance? We have seen in the past that residual estimates ended up quite small as many CS having only one variant were generated for L = 10 (a large L). It would also help if it is possible to share an example data for us to take a look at (you can remove the variant ID so we just get a matrix of numbers without knowing what the data is about)

@pcarbo
Copy link
Member

pcarbo commented Dec 14, 2020

@gaow Do you think this could be resolved with a better intialization?

@gaow
Copy link
Member

gaow commented Dec 14, 2020

@pcarbo I think a better (or at least alternative) initialization is what @chr1swallace has done; and I'd like to get some additional information and understand the problem better before deciding if initialization is all we need to work this out.

@chr1swallace
Copy link
Contributor Author

chr1swallace commented Dec 14, 2020 via email

@zouyuxin
Copy link
Member

It's better to have the dataset without thinning. I'll take a look and thining SNPs if it takes a long time to run.
Could you briefly describe the background about your data? How did you get z scores and what's the reference panel?

@stephens999
Copy link
Contributor

I notice the z scores have opposite signs. Is it possible it is an allele switch issue or something like that?
Is the correlation between them in the reference correlation matrix (R) 1 or -1? [just to be clear, note that the entries in the matrix R should be the correlations, not r2]

@chr1swallace
Copy link
Contributor Author

chr1swallace commented Dec 14, 2020 via email

@chr1swallace
Copy link
Contributor Author

chr1swallace commented Dec 14, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants