Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify covariates file and PCA #12

Open
llniu opened this issue Jun 22, 2022 · 6 comments
Open

Specify covariates file and PCA #12

llniu opened this issue Jun 22, 2022 · 6 comments

Comments

@llniu
Copy link

llniu commented Jun 22, 2022

Hi Frank,

Thank you for creating this streamlined tool for performing GWAS analysis. Regarding covariates, if I want to correct for e.g. age, sex and population structure, how can I do this? Can I specify a covariate file AND subtract top PCs from PCA of the vcf file? Thanks a lot for your help.

Best,
Lili

@gaushi
Copy link
Collaborator

gaushi commented Jun 29, 2022

Hi Lili,
If I understood your question correctly, you would like to add 1.age, 2.sex and 3. population structure as covariates.
In that case, I would provide the age and sex in covariates file separately. Regarding population structure either you can allow GEMMA's kinship matrix to take care of population structure, or do the PCA based correction for population structure. In the PCA-based correction of population structure you can select top PCs.
I hope that answers your question. @frankvogt : would you like to comment further?
Best,
Gautam

@frankvogt
Copy link
Owner

So indeed it is right now unfortunately only possible to either add your covariates manually with the covariate file or let vcf2gwas extract the PCs from the VCF file and use them as covariates (as shown in the Manual). Additionally using the PCA based correction instead of GEMMA's kinship matrix of course only works when you are using the linear mixed model (see here).
Thanks for the suggestion though, I will add the functionality to automatically add PCs to the manually added covariate file in the future.
Best,
Frank

@llniu
Copy link
Author

llniu commented Jul 5, 2022

Hi Gautam and Frank,

Thank you very much for the answers. As I understood, kinship matrix implemented in GEMMA calculates relatedness between individuals and is different from population structure. Therefore, if one wants to correct for population structure on top of relatedness when using a linear mixed model, one can either let vcf2gwas calculates and subtract top PCs or provide a covariate file with top PCs. In the first scenario, does vcf2gwas calculate PCA based on all SNPs in the genotype file, or will vcf2gwas calculates PCs on independent SNPs on autosomal chromosomes only? I'm new to population genetics so pardon me if I ask stupid questions.

Best,
Lili

@gaushi
Copy link
Collaborator

gaushi commented Jul 5, 2022

Hi Lili,
Kinship matrix would be based on all the SNPs (together) so it captures population structure. PCA will also use all the genome-wide SNPs, but co-varying SNPs will dominate first few PCs (thus to certain extent population structure correction with PCs will not be generic but rather somewhat more specific depending on your choice of number of PCs). You can provide PCs as covariate file. Be careful with the choice of number of PCs to provide as covariate file though. If your causal SNP/s are under first few PCs then those will be used in population structure correction and will not be used in the association analysis.
Best,
Gautam

@llniu
Copy link
Author

llniu commented Jul 5, 2022

Hi Gautam,

Thanks a lot for the fast reply. Since kinship matrix captures population struture, does that mean further adjusting for population strucutre by including top N number of PCs is not needed?

Best,
Lili

@gaushi
Copy link
Collaborator

gaushi commented Sep 1, 2022

Hi Lili,
Sorry for the late reply. But simple answer is you can use kinship matrix and not include PCs. But for more nuanced answer please see this .
Best,
Gautam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants