Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.

PhotoperiodLocalAdaptation

Scripts for Wang et al. (2017) "A major locus controls local adaptation and adaptive life history variation in a perennial plant". https://www.biorxiv.org/content/early/2017/12/13/178921

Documentation of Scripts

1. Sequencing quality checking, read mapping and post-mapping filtering

    Picard_markduplicates_SwAsp.sh - Use Picard to correct for artifacts of PCR duplication

    QualityControl.sh - Use Trimmomatic v0.30 and FastQC to do sequence quality checking

    realign.sh - Use GATK to do read realignment around indels

    runBWA_mem_asp201.SwAsp.sh - Use BWA-MEM to do read mapping

2. SNP and genotype calling

    HaplotypeCaller_mem_tremula.sh - Use GATK to do SNP calling

    snpEff.gatk.sh - Use snpEff to annotate the SNPs

    vcf_to_maf.sh vcf2maf.pl - Use perl script to map each variant to only one of all possible gene isoforms

3. Relatedness, population structure and isolation-by-distance

    eigen_pop_genetics.sh - Use smartpca program in EIGENSOFT to perform PCA analysis

    fst_matrix.R - create the matrix of Fst estimates

    Isolation_by_distance.R - create the matrix of geographic distance
    
    mantel.R - Mantel test

    plink_prunedLD.sh - Use PLINK to generate Linkage-disequilibrium(LD)-trimmed SNP sets
    
    SwAsp.IBD.plot.R - plot Isolation-by-distance

4. Screening for SNPs associated with local adaptation

    env_PC.error_bar.R - Relationship between the environmental PC1 scores and the number of days with degree higher than 5
   
    GEMMA.lmm.noPC.budset.sh - Use GEMMA to do GWAS for budset

    Ekebo_Savar_budset.as - Use Asreml to estimate the genetic values of bud set
     
    LEA.K1.R LEA.K2.R LEA.K3.R - use a latent factor mixed-effect model (LFMM) implemented in the R package LEA to detect SNPs associated with first environmental PC, with the latent factors (K) from 1 to 3.
     
    LEA_zscore.94samples.R - Transform the z-scores from LFMM results to p-values


    PCAdapt.SwAsp94.R - PCAadapt test

    pcadapt_fst_cor.R - The relationship between PCadapt results and Fst values

    pca_corr_env.R - PCA analysis for the environment dat

    Figure2 - scripts for recreating Figure 2

        chr10.gemma.ld_Dprime.R - calculate LD and colorise points relative to tthe top SNP in PtFT2
        
        excute_manhanttan.sh - shell script for manhattan plots
        
        FT2.gemma.ld_Dprime.R - plot associations and colorise according to LD for the PtFT2 SNPs
        
        legend.col.R - plot legend
        
        manhanttan_plot.R - manhattan plots (modified from https://cran.r-project.org/web/packages/qqman/index.html)
        
        manhanttan.3methods.R - plot manhattan plots and qqplots and combine for three types of analyses (PCAdapt, LFMM, GWAS)
        
        scaffold_likely_wrong.txt - scaffolds that are likely missasembled and therefore exluded from the plots

5. Genotype imputation

    Beagle_comparison.sh - Create several missing genotypes for simulation

    createFile_imputate.R - Create imputation files with different level of missing values

    Create_simu_missing_file.sh - Tor simulate by BEAGLE and calculate accuracy

    define_ancestral.SwAsp.pseudo_chr.sh - Use BEAGLE to do genotype imputation and also define ancestal and derived allele based on the sequences of outgroup species of P.tremuloides and P.trichocarpa

    impute_accuracy_beagle_vcf.pl - Calculate the imputation accuracy by each sample and SNP

    impute_evaluate.R - Evaluate the imputation accuracy

6. Positive selection

    ABCinference.R - Script for performing Approximate Bayesian Computation (ABC) to jointly estimate s (the strength of selection on the beneficial mutation causing the sweep) and T (the time since the beneficial allele fixed). Script modified from original provided by Ormond et al (2016) at http://jjensenlab.org/wp-content/uploads/2016/02/ABC_inference.zip
    
    angsd_SFS_SwAsp.all.sh - Use ANGSD to estimate the genetic diversity in specific groups of populations

    caviar.chr10.sh caviar.z-score.R - Run CAVIAR on specific region of Chr10

    chr10_genome.sig.angsd_tP_tajD.group_plot.R - Compare the genetic diversity between chr10 region and genome-wide levels

    chr10_genome.sig.fst.group_plot.R - Compare Fst between chr10 region and genome-wide levels

    chr10_genome.sig.h12.group_plot.R - Compare H12 and H2/H1 between chr10 region and genome-wide levels

    chr10_genome.sig.sweepfinder2.group_plot.R - Compare CLR between chr10 region and genome-wide levels

    chr10_genome.sig.ihs_nsl.R - Compare iHS and nSL values between chr10 region and genome-wide levels

    ehh.plot.sh colormap.plotting.R - Create EHH plot

    genome_wide.summary.compare.sh - Use vcftools, selscan and H12 test to perform a set of selection tests across the genome

    ms2sf2.pl - Convert msms output to input format for SweepFinder2

    plink.ldheatmap.sh LDheatmap.R - Use PLINK to calculate LD across SNPs

    run_sweep_sim.sg - Simulate independent selective sweep events using the coalescent simulation program msms and analyse the results using SweepFinder2 to assess region affected by sweep

    select.chr10.summary.sh - Use vcftools, selscan and H12 test to perform a set of selection tests on the specific region (~700kbp) on Chr10

    sweepfinder2.FT2paper.sh - Run SweepFinder2 to detect selection signals

About

Scripts for Wang et al. (2017) "A major locus controls local adaptation and adaptive life history variation in a perennial plant". https://www.biorxiv.org/content/early/2017/12/13/178921 https://zenodo.org/badge/100579273.svg

Resources

License

Packages

No packages published