Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CS with individual level data #131

Closed
CharliHarlow opened this issue Jun 30, 2021 · 12 comments
Closed

CS with individual level data #131

CharliHarlow opened this issue Jun 30, 2021 · 12 comments

Comments

@CharliHarlow
Copy link

Hi, I have been trying to run susie with individual level data from UKB. I convert the genotypes into dosages and then create a genotype matrix. I also perform the covariates adjustment as suggested before to create the final input files for susie. Each time susie is run however I am getting very large cred sets which have over 11,000 variants in them. I have been running susie with the default settings and have adjusted some of the options in order to see if any of these would give more meaningful cred sets. I wonder if you could offer any advice or help with this?
Here is an examples of the cred sets I am getting:

Command to run
fitted_adjusted <- susie(raw2, pheno_susie[,1],
L = 10,
estimate_residual_variance = TRUE,
estimate_prior_variance = FALSE,
scaled_prior_variance = 0.1,
min_abs_corr = 0.0,
verbose = TRUE)

Cred set output
image

$purity
min.abs.corr mean.abs.corr median.abs.corr
L9 1.939965e-05 0.05201907 0.01831098
L7 1.774827e-05 0.05630721 0.01949726
L6 1.054947e-05 0.05143106 0.01714920
L3 1.034395e-05 0.05039611 0.01706981
L5 8.795920e-06 0.06292139 0.02208106
L2 6.810453e-06 0.05276931 0.01733549
L8 3.465249e-06 0.05655636 0.01968297
L1 9.722116e-07 0.04467518 0.01365854

$cs_index
[1] 9 7 6 3 5 2 8 1

$coverage
[1] 0.9500093 0.9500169 0.9500354 0.9500451 0.9500114 0.9500218 0.9500402 0.9500175

$requested_coverage
[1] 0.95

@gaow
Copy link
Member

gaow commented Jun 30, 2021

@CharliStoneman To check on some basics, is there a reason you set scaled prior variance to 0.1? This, interpreted as the percentage of variance explained per SNP in this context, is very high for a GWAS study. What if you let SuSiE estimate it and not specify it?

@CharliHarlow
Copy link
Author

Hi @gaow no there was not a reason i set it to that, I was just using the settings applied in the vignette here: https://stephenslab.github.io/susieR/
I can give it a go rerunning with no scaled prior variance set.

@stephens999
Copy link
Contributor

stephens999 commented Jun 30, 2021 via email

@pcarbo
Copy link
Member

pcarbo commented Jun 30, 2021

Note that if the number of individuals (rows) in your data matrix is much larger than the number of SNPs, it might be faster to use susie_suff_stat.

@CharliHarlow
Copy link
Author

I have re-run without the scaled_prior_variance option but set estimate_prior_variance = TRUE and get a resulting plot like the following
We also now get credsets=NULL
image

@stephens999
Copy link
Contributor

stephens999 commented Jun 30, 2021 via email

@gaow
Copy link
Member

gaow commented Jun 30, 2021

@stephens999 yes I just update the vignette. We did discuss choice of priors in the vignette but the ordering of the narrative is not very good now that I read it again. It is now improved along with the R code updates.

@CharliHarlow
Copy link
Author

Yes several are significant. When running for the summ stats for the same trait we get the following PIP plot
image

@gaow
Copy link
Member

gaow commented Jul 1, 2021

When running for the summ stats for the same trait we get the following PIP plot

@CharliStoneman are you referring to running susie_rss? What's your input for R -- how did you obtain that? The LD structure of this region looks complicated (the SNPs in purple are spread out yet very highly correlated)

@CharliHarlow
Copy link
Author

We used susie_suff_stat for fine-mapping using the summary stats and the input for susie was the summary stats where we got the betas & se from and also calculated the z-scores. We also input an ld matrix for the region we were looking at.
For the individual level data, we generated a genotype matrix for SNPs within the region of interest, the phenotype values and covariates which we regressed out of both the genotype and the phenotype to generate the input for susie.

@stephens999
Copy link
Contributor

@CharliStoneman i think you will need to share code and data to get further feedback.
The results from full data and summary data usually agree closely when the LD matrix is computed from the
in-sample genotypes, so something is probably wrong with the pipeline if you get different results.

@gaow
Copy link
Member

gaow commented Nov 10, 2021

Close this ticket due to a lack of follow up. Discussions related to complications with using external LD reference are

@gaow gaow closed this as completed Nov 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants