Skip to content

Scripts for Wang et al. (2017) "A major locus controls local adaptation and adaptive life history variation in a perennial plant".


Notifications You must be signed in to change notification settings


Repository files navigation


Scripts for Wang et al. (2017) "A major locus controls local adaptation and adaptive life history variation in a perennial plant".

Documentation of Scripts

1. Sequencing quality checking, read mapping and post-mapping filtering - Use Picard to correct for artifacts of PCR duplication - Use Trimmomatic v0.30 and FastQC to do sequence quality checking - Use GATK to do read realignment around indels - Use BWA-MEM to do read mapping

2. SNP and genotype calling - Use GATK to do SNP calling - Use snpEff to annotate the SNPs - Use perl script to map each variant to only one of all possible gene isoforms

3. Relatedness, population structure and isolation-by-distance - Use smartpca program in EIGENSOFT to perform PCA analysis

    fst_matrix.R - create the matrix of Fst estimates

    Isolation_by_distance.R - create the matrix of geographic distance
    mantel.R - Mantel test - Use PLINK to generate Linkage-disequilibrium(LD)-trimmed SNP sets
    SwAsp.IBD.plot.R - plot Isolation-by-distance

4. Screening for SNPs associated with local adaptation

    env_PC.error_bar.R - Relationship between the environmental PC1 scores and the number of days with degree higher than 5 - Use GEMMA to do GWAS for budset - Use Asreml to estimate the genetic values of bud set
    LEA.K1.R LEA.K2.R LEA.K3.R - use a latent factor mixed-effect model (LFMM) implemented in the R package LEA to detect SNPs associated with first environmental PC, with the latent factors (K) from 1 to 3.
    LEA_zscore.94samples.R - Transform the z-scores from LFMM results to p-values

    PCAdapt.SwAsp94.R - PCAadapt test

    pcadapt_fst_cor.R - The relationship between PCadapt results and Fst values

    pca_corr_env.R - PCA analysis for the environment dat

    Figure2 - scripts for recreating Figure 2

        chr10.gemma.ld_Dprime.R - calculate LD and colorise points relative to tthe top SNP in PtFT2
        - shell script for manhattan plots
        FT2.gemma.ld_Dprime.R - plot associations and colorise according to LD for the PtFT2 SNPs
        legend.col.R - plot legend
        manhanttan_plot.R - manhattan plots (modified from
        manhanttan.3methods.R - plot manhattan plots and qqplots and combine for three types of analyses (PCAdapt, LFMM, GWAS)
        scaffold_likely_wrong.txt - scaffolds that are likely missasembled and therefore exluded from the plots

5. Genotype imputation - Create several missing genotypes for simulation

    createFile_imputate.R - Create imputation files with different level of missing values - Tor simulate by BEAGLE and calculate accuracy - Use BEAGLE to do genotype imputation and also define ancestal and derived allele based on the sequences of outgroup species of P.tremuloides and P.trichocarpa - Calculate the imputation accuracy by each sample and SNP

    impute_evaluate.R - Evaluate the imputation accuracy

6. Positive selection

    ABCinference.R - Script for performing Approximate Bayesian Computation (ABC) to jointly estimate s (the strength of selection on the beneficial mutation causing the sweep) and T (the time since the beneficial allele fixed). Script modified from original provided by Ormond et al (2016) at - Use ANGSD to estimate the genetic diversity in specific groups of populations caviar.z-score.R - Run CAVIAR on specific region of Chr10

    chr10_genome.sig.angsd_tP_tajD.group_plot.R - Compare the genetic diversity between chr10 region and genome-wide levels

    chr10_genome.sig.fst.group_plot.R - Compare Fst between chr10 region and genome-wide levels

    chr10_genome.sig.h12.group_plot.R - Compare H12 and H2/H1 between chr10 region and genome-wide levels

    chr10_genome.sig.sweepfinder2.group_plot.R - Compare CLR between chr10 region and genome-wide levels

    chr10_genome.sig.ihs_nsl.R - Compare iHS and nSL values between chr10 region and genome-wide levels colormap.plotting.R - Create EHH plot - Use vcftools, selscan and H12 test to perform a set of selection tests across the genome - Convert msms output to input format for SweepFinder2 LDheatmap.R - Use PLINK to calculate LD across SNPs - Simulate independent selective sweep events using the coalescent simulation program msms and analyse the results using SweepFinder2 to assess region affected by sweep - Use vcftools, selscan and H12 test to perform a set of selection tests on the specific region (~700kbp) on Chr10 - Run SweepFinder2 to detect selection signals