Skip to content
Guo-Bo Chen edited this page May 9, 2024 · 77 revisions

EigenGWAS conducts unsupervised Fst scan for re-sequencing populations.

Citation 1: Chen, G.B., S.H. Lee, ZX Zhu, B Benyamin, MM Robinson, EigenGWAS: finding loci under selection through genome-wide association studies of eigenvectors in structured populations, Heredity, 2016, 117:51-61.

Citation 2: Qi, GA, et al, EigenGWAS: An online visualizing and interactive application for detecting genomic signatures of natural selection, Mol Ecol Res, 2021, 21:1732-44. A new visualized implementation for EigenGWAS is available at www.EigenGWAS.com.

The latest EigenGWAS in GEAR can be downloaded HERE. Demon can be downloaded from Dropbox, or JianguoCloud.


Master command: eigengwas

Options

--bfile

Specify the genotype files in plink binary format.

--ev

Specify the eigenvectors that are used from EigenGWAS analysis.

--inbred

If the population is inbred, such as Arabidopsis or rice, please switch this option on.

--thread-num

Specify the number of threads to run the analysis.

Examples

java -jar gear.jar eigengwas --bfile TSI_CEU --ev 2 --out TSI_CEU
java -jar gear.jar eigengwas --bfile Arab --inbred --ev 2 --out arab 
java -jar gear.jar eigengwas --bfile Arab --inbred --ev 2 --thread-num 4 --out arab 

In addition, EigenGWAS supports data management options (the full list of data management option for GEAR can be found here). See examples below.

Examples of EigenGWAS with data options for SNP selection

#chromosome selection
java -jar gear.jar eigengwas --bfile TSI_CEU --ev 2 --chr 1 4-6 --out TSI_CEU
java -jar gear.jar eigengwas --bfile TSI_CEU --ev 2 --not-chr 1 4-6 --out TSI_CEU

#allele frequency selection
java -jar gear.jar eigengwas --bfile TSI_CEU --ev 2 --maf 0.05 --max-maf 0.45 --out TSI_CEU
java -jar gear.jar eigengwas --bfile TSI_CEU --ev 2 --maf-range 0.05-0.3 0.4-0.5 --out TSI_CEU

#SNP extraction and exclusion
java -jar gear.jar eigengwas --bfile TSI_CEU --ev 2 --extract snp_list1.txt --out TSI_CEU
java -jar gear.jar eigengwas --bfile TSI_CEU --ev 2 --exclude snp_list2.txt --out TSI_CEU

Examples of EigenGWAS with data option for sample selection

#individual selection
java -jar gear.jar eigengwas --bfile TSI_CEU --ev 2 --keep ind_keep.txt --out TSI_CEU
java -jar gear.jar eigengwas --bfile TSI_CEU --ev 2 --remove ind_rem.txt --out TSI_CEU

#family selection
java -jar gear.jar eigengwas --bfile TSI_CEU --ev 2 --keep-fam fam_keep.txt --out TSI_CEU
java -jar gear.jar eigengwas --bfile TSI_CEU --ev 2 --remove-fam fam_rem.txt --out TSI_CEU

Of course, individuals and SNP selection can be combined together.

Example with data option for SNP selection

java -jar gear.jar eigengwas --bfile TSI_CEU --chr 1 4-6 --keep ind_keep.txt --out TSI_CEU
java -jar gear.jar eigengwas --bfile TSI_CEU --chr 1 4-6 --keep ind_keep.txt --out TSI_CEU

Of note, eigengwas option combines three steps, as below, in one. If you want to have more flexibility, you can split the whole eigengwas into three steps below.

java -jar gear.jar grm --bfile TSI_CEU --out TSI_CEU 
java -jar gear.jar pca --grm TSI_CEU --ev 10 --out TSI_CEU
java -jar gear.jar egwas --bfile TSI_CEU --pheno TSI_CEU.eigenvec --mpheno 1 --out TSI_CEU

The first step generates genetic relationship matrix for samples (TSI_CEU.grm.gz, TSI_CEU.grm.id); the second step generates the top 10 eigenvalues and eigenvectors (TSI_CEU.eigenval and TSI_CEU.eigenvec); the third step runs linear model for selected eigenvector, which is generated in the second step.

In the output file TSI_CEU.egwas, it contains fields

Field Content
SNP SNP ID
CHR Chromosome ID
BP Position
RefAllele Reference allele
AltAllele Alternative allele
Freq Frequency of the reference allele
Beta eigenGWAS effect
SE Standard deviation
Chi Chi-sq test statistic for the marker
P p-value
PGC p-value with GC correction
N1 Individuals in subgroup 1
Freq1 Frequency of the reference allele in subgroup 1
N2 Individuals in subgroup 2
Freq2 Frequency of the reference allele in subgroup 2
Fst Fst between group 1 and group 2

The EigenGWAS results for HapMap CEU vs TSI cohorts CEU_TSI

The top panel: on Chr 2 the peak is LCT (lactase persistence locus), and on Chr 15 is HERC2, p-value without GC correction.

The middle panel: p-values with GC correction (PGC), in other words with adjustment of genetic drift.

The bottom panel: is the correlation between chi-sq test statistic for each marker and Fst for each marker.


EigenGWAS has been applied in such as studies below:

Recent natural selection causes adaptive evolution of an avian polygenic trait, Science, 2017, 358:365-8

Signatures of negative selection in the genetic architecture of human complex traits, Nat Genet, 2018, 50:746-53

Heritage genetics for adaptation to marginal soils in barley, Trends in Plant Science, 2023, 28, 544-551

Bottom-up perspective – The role of roots and rhizosphere in climate change adaptation and mitigation in agroecosystems, Plant Soil

The STROMICS genome study: deep whole-genome sequencing and analysis of 10K Chinese patients with ischemic stroke reveal complex genetic and phenotypic interplay, Cell Discovery, 2023, 9:75

PigBiobank: a valuable resource for understanding genetic and biological mechanisms of diverse complex traits in pigs, Nucleic Acids Research, 2023, gkad1080

Genotype-by-environment interactions and local adaptation shape selection in the US National Chip Processing Trial, Theor App Genet, 2024, 137:99

Genomic wide association study and selective sweep analysis identify genes associated with improved yield under drought in Turkish winter wheat germplasm, Sci Reports, 2024, 14:8431

Genotype–environment associations reveal genes potentially linked to avian malaria infection in populations of an endemic island bird, Mol Ecology, 2024

Limitations and advantages of using metabolite-based genome-wide association studies: Focus on fruit quality traits, Plant Science, 2023, 333:111748

Genomic basis of selective breeding from the closest wild relative of large-fruited tomato, Horticulture Research, 2023, 10: uhad142

Rapid genetic adaptation to a novel ecosystem despite a large founder event, Mol Ecology, 2023, DOI: 10.1111/mec.17121

Landscape genomics reveals adaptive genetic differentiation driven by multiple environmental variables in naked barley on the Qinghai-Tibetan Plateau, Heredity, 2023, 131:316–326

Plant breeding highlights master genes in major regulatory pathways, Mol Plant, 2022, 15:391-2

Genetic diversity of North American popcorn germplasm and the effect of population structure on nicosulfuron response, Crop Science, 2023

GGoutlieR: an R package to identify and visualize unusual geo-genetic patterns of biological samples, bioRxiv, 2023

Leveraging GWAS for complex traits to detect signatures of natural selection in humans, Cur Opinion in Genet Develop, 2018, 53:9-14

A brief history and popularity of methods and tools used to estimate micro-evolutionary forces, Ecology & Evolution, 2021, 11:13723–43

Discovering Loci for Breeding Prospective and Phenology in Wheat Mediterranean Landraces by Environmental and eigenGWAS, Int J Mol Sci, 2023, 24:1700

GWAS Case Studies in Wheat, Genome-wide Association Studies, 2022, 341-351

Genome-wide association study of eigenvectors provides genetic insights into selective breeding for tomato metabolites , BMC Biology, 2022, 20:120

EigenGWAS: An online visualizing and interactive application for detecting genomic signatures of natural selection, Mol Ecol Res, 2021, 21:1732-44

A spectral theory for Wright’s inbreeding coefficients and related quantities, PLoS Genet, 2021, 17:e1009665

Genetic networks underlying salinity tolerance in wheat uncovered with genome-wide analyses and selective sweeps Theor Appl Genet, 2022

Analysis of historical selection in winter wheat, Theor Appl Genet, 2022

Genetic diversity and selection signatures in synthetic-derived wheats and modern spring wheat, Front Plant Sci, 2022, 13:877496

Looking for local adaptation: convergent microevolution in Aleppo Pine (Pinus halepensis), Genes, 2019, 10:673

Genomic associations with bill length and disease reveal drift and selection across island bird populations, Evol Lett, 2018, 2:22-36

On the importance of time scales when studying adaptive evolution, Evol Lett, 2019, 3:240–247

Response to Perrier and Charmantier: On the importance of time scales when studying adaptive evolution, Evol Lett, 2019, 3:248–253

LEA 3: Factor models in population genetics and ecological genomics with R, Mol Ecol Res, 2021, 21:2738-48

Trends of genetic changes uncovered by Env- and Eigen-GWAS in wheat and barley, Theor Appl Genet, 2022, 135:667–678

Agronomic, physiological and genetic changes associated with evolution, migration and modern breeding in durum wheat, Front Plant Sci, 2021, 12:674470

Inferences of genetic architecture of bill morphology in house sparrow using a high-density SNP array point to a polygenic basis, Mol Ecology, 2018, 27:3498-514

Genomic variation, population history and within-archipelago adaptation between island bird populations, R Soc Open Sci, 2021, 8:201146

A sex-linked supergene controls sperm morphology and swimming speed in a songbird, Nature Ecology & Evolution, 2017, 1:1168-76

Social and spatial effects on genetic variation between foraging flocks in a wild bird population, Molecular Ecology, 2017, 26:5807-19

Identifying loci with breeding potential across temperate and tropical adaptation via EigenGWAS and EnvGWAS, Molecular Ecology, 2019, 28:3544-60

Identification of genes associated with litter size combining genomic approaches in Luzhong mutton sheep, Animal Genetics, 2021, 52:545-9

Labelling selective sweeps used in durum wheat breeding from a diverse and structured panel of landraces and cultivars, Biology, 2021, 10:258

Sun et al.’s study led to the underperformance of EigenGWAS, Heredity, 2019, 123:283–284

Application of partial least squares in exploring the genome selection signatures between populations, Heredity, 2019, 122:288–293

The sockeye salmon genome, transcriptome, and analyses identifying population defining regions of the genome, PLoS ONE, 2020, 15:e0240935

Whole-genome resequencing provides insights into the evolution and divergence of the native domestic yaks of the Qinghai–Tibet Plateau, BMC Evolutionary Biology, 2020, 20:137

Genome-wide analyses reveal footprints of divergent selection and popping-related traits in CIMMYT's maize inbred lines, Journal Experimental Botany, 2021, 74:1307-20

Whole‐genome SNP markers reveal conservation status, signatures of selection, and introgression in Chinese Laiwu pigs, Evolutionary Applications, 2021, 14:383-98

Variation among 532 genomes unveils the origin and evolutionary history of a global insect herbivore, Nature Communication, 2020, 11:2321

Identifying the unique characteristics of the Chinese indigenous pig breeds in the Yangtze River Delta region for precise conservation, BMC Genomics, 2021, 22:151

Whole genome variants across 57 pig breeds enable comprehensive identification of genetic signatures that underlie breed features, J Anim Sci Biotech, 2020, 11:115

Characterization of genetic diversity and genome-wide association mapping of three agronomic traits in Qingke barley (Hordeum Vulgare L.) in the Qinghai-Tibet Plateau, Front Genet, 2020, 11:638

Discovery of selection‐driven genetic differences of Duroc, Landrace, and Yorkshire pig breeds by EigenGWAS and Fst analyses, Animal Genetics, 2020, 51:531-40

Genome-wide variation patterns between landraces and cultivars uncover divergent selection during modern wheat breeding, Theor Appl Genetics, 2019, 132:2509–23

Genome-wide analyses reveal footprints of divergent selection and drought adaptive traits in synthetic-derived wheats, G3, 2019, 9:1957-73

Identifying genetic differences between Dongxiang blue-shelled and white leghorn chickens using sequencing data, G3, 2018, 8:469-76

Comparison of genetic diversity between Chinese and American soybean (Glycine max (L.)) accessions revealed by high-density SNPs, Front Plant Sci, 2017, 8:2014

Genome-wide association study for plant height and grain yield in rice under contrasting moisture regimes, Front Plant Sci, 2016, 7:1801

Genomic analysis reveals genes affecting distinct phenotypes among different Chinese and western pig breeds, Sci Reports, 2018, 8:13352

Using information of relatives in genomic prediction to apply effective stratified medicine, Sci Reports, 2017, 7:42091

Inference on the genetic basis of eye and skin color in an admixed population via Bayesian linear mixed models, Genet, 2017, 206:1113-25

Ancestors’ dietary patterns and environments could drive positive selection in genes involved in micronutrient metabolism—the case of cofactor transporters, Genes and Nutrition, 2017, 12:28

Fast principal components analysis reveals independent evolution of ADH1B gene in Europe and East Asia, Am J Hum Genet, 2016, 98:456-72

Analysis of selection signatures on the Z chromosome of bidirectional selection broiler lines for the assessment of abdominal fat content, BMC Genomic Data, 2021, 22:18

Consequences of PCA graphs, SNP codings, and PCA variants for elucidating population structure, PLoS ONE, 2018, 14: e0218306

Agronomic, physiological and genetic changes associated with evolution, migration and modern breeding in durum wheat, Front Plant Sci, 2021:674470

Characterization of the genetic basis of local adaptation of wheat landraces from Iran and Pakistan using genome-wide association study, Plant Genome, 2021, e20096

The pink salmon genome: uncovering the genomic consequences of a strict two-year life-cycle, PLoS ONE, 2021, 16:e0255752


Return to GEAR Home

Clone this wiki locally