Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.


Scripts for Wang et al. (2017) "A major locus controls local adaptation and adaptive life history variation in a perennial plant".

Documentation of Scripts

1. Sequencing quality checking, read mapping and post-mapping filtering - Use Picard to correct for artifacts of PCR duplication - Use Trimmomatic v0.30 and FastQC to do sequence quality checking - Use GATK to do read realignment around indels - Use BWA-MEM to do read mapping

2. SNP and genotype calling - Use GATK to do SNP calling - Use snpEff to annotate the SNPs - Use perl script to map each variant to only one of all possible gene isoforms

3. Relatedness, population structure and isolation-by-distance - Use smartpca program in EIGENSOFT to perform PCA analysis

    fst_matrix.R - create the matrix of Fst estimates

    Isolation_by_distance.R - create the matrix of geographic distance
    mantel.R - Mantel test - Use PLINK to generate Linkage-disequilibrium(LD)-trimmed SNP sets
    SwAsp.IBD.plot.R - plot Isolation-by-distance

4. Screening for SNPs associated with local adaptation

    env_PC.error_bar.R - Relationship between the environmental PC1 scores and the number of days with degree higher than 5 - Use GEMMA to do GWAS for budset - Use Asreml to estimate the genetic values of bud set
    LEA.K1.R LEA.K2.R LEA.K3.R - use a latent factor mixed-effect model (LFMM) implemented in the R package LEA to detect SNPs associated with first environmental PC, with the latent factors (K) from 1 to 3.
    LEA_zscore.94samples.R - Transform the z-scores from LFMM results to p-values

    PCAdapt.SwAsp94.R - PCAadapt test

    pcadapt_fst_cor.R - The relationship between PCadapt results and Fst values

    pca_corr_env.R - PCA analysis for the environment dat

    Figure2 - scripts for recreating Figure 2

        chr10.gemma.ld_Dprime.R - calculate LD and colorise points relative to tthe top SNP in PtFT2
        - shell script for manhattan plots
        FT2.gemma.ld_Dprime.R - plot associations and colorise according to LD for the PtFT2 SNPs
        legend.col.R - plot legend
        manhanttan_plot.R - manhattan plots (modified from
        manhanttan.3methods.R - plot manhattan plots and qqplots and combine for three types of analyses (PCAdapt, LFMM, GWAS)
        scaffold_likely_wrong.txt - scaffolds that are likely missasembled and therefore exluded from the plots

5. Genotype imputation - Create several missing genotypes for simulation

    createFile_imputate.R - Create imputation files with different level of missing values - Tor simulate by BEAGLE and calculate accuracy - Use BEAGLE to do genotype imputation and also define ancestal and derived allele based on the sequences of outgroup species of P.tremuloides and P.trichocarpa - Calculate the imputation accuracy by each sample and SNP

    impute_evaluate.R - Evaluate the imputation accuracy

6. Positive selection

    ABCinference.R - Script for performing Approximate Bayesian Computation (ABC) to jointly estimate s (the strength of selection on the beneficial mutation causing the sweep) and T (the time since the beneficial allele fixed). Script modified from original provided by Ormond et al (2016) at - Use ANGSD to estimate the genetic diversity in specific groups of populations caviar.z-score.R - Run CAVIAR on specific region of Chr10

    chr10_genome.sig.angsd_tP_tajD.group_plot.R - Compare the genetic diversity between chr10 region and genome-wide levels

    chr10_genome.sig.fst.group_plot.R - Compare Fst between chr10 region and genome-wide levels

    chr10_genome.sig.h12.group_plot.R - Compare H12 and H2/H1 between chr10 region and genome-wide levels

    chr10_genome.sig.sweepfinder2.group_plot.R - Compare CLR between chr10 region and genome-wide levels

    chr10_genome.sig.ihs_nsl.R - Compare iHS and nSL values between chr10 region and genome-wide levels colormap.plotting.R - Create EHH plot - Use vcftools, selscan and H12 test to perform a set of selection tests across the genome - Convert msms output to input format for SweepFinder2 LDheatmap.R - Use PLINK to calculate LD across SNPs - Simulate independent selective sweep events using the coalescent simulation program msms and analyse the results using SweepFinder2 to assess region affected by sweep - Use vcftools, selscan and H12 test to perform a set of selection tests on the specific region (~700kbp) on Chr10 - Run SweepFinder2 to detect selection signals


Scripts for Wang et al. (2017) "A major locus controls local adaptation and adaptive life history variation in a perennial plant".




No packages published