Almost all the scripts for the introgression AWT paper.
The scripts for the exome capture part (assembly, annotation, etc) is split over:
- https://github.com/singhal/exomeCapture
- https://github.com/CGRL-QB3-UCBerkeley/denovoTargetCapturePopGen
2-ScrubReads: used to scrub reads3-GenerateAssemblies: used to assemble reads4-FinalAssembly: used to assemble across assemblies
Pipeline 2: https://github.com/singhal/exomeCapture
7annotateContigs.pl: used to match assembles to original targets8initialSNPset.pl: used to map reads and call SNPs in transcriptome data9clineSNPset.pl: used to map reads and call SNPs in clinal populations10benchmarking.pl: used to get data for loci used in benchmarking.11clineAFfiles.pl: used to get allele frequency data from the cline SNP set
divPoly.pl: calculate divergence & polymorphism for transcriptome data to identify highly-differentiated transcriptsdnds.pl: calculate a (rough) version of dn/ds for transcriptome data to identify transcripts putatively under positive selectionexomeCapture.pl: the script that actually does most of the work; identifies possible exons, filters for GC content etc.finalExons_species.pl: picks some exons and adds in the other targeted sequence (i.e., mtDNA, UTRs, etc)finalFile_makeShorter.pl: makes the exon capture array shorter because the original files were overshootsfst.pl: calculate a (rough) version of Fst for transcriptome data to identify highly-differentiated transcriptsinitialExonFile.pl: generated the exon list to be used for downstream selectionutrExtract.pl: extracts UTR sequences to be used in benchtruthing
aliquotDNA.pl: figured out amounts of DNA to spike into tube to get pooled librariesaverageExonMtCoverage.R: calculated average coverage across each population in each lineage-pairbenchtruthing_af.R: determines how much variance there is in mtDNA SNPscline_fitting.py: fits clines to variants; takes in allele frequencies at each variant in each populationdivergence_contig.py: estimates divergence & polymorphism for transcriptome data on a per contig (transcript) basisdivergence_full.py: estimates divergence & polymorphism for transcriptome data on a per exon basisdnds.py: estimates dn/ds for each transcript using PAMLfst_contig.py: estimates Fst for transcriptome data on a per contig (transcript) basisfst_full.py: estimates Fst for transcriptome data on a per exon basismakeLDfiles.pl: looks at how cline centers and widths change over genomic space -- allows Anolis to be used as a referencemoransI_bootstrap.py: takes in the files generated bymakeLDfiles.pland generates Moran's I estimates as well as doing some bootstrappingpooled_discrepancy.R: runs the simple R simulations to see how pooling might affect our ability to infer allele frequency accurately
paper_figures.ipynb: includes Python code for most of the figures in the manuscript and some figures that didn't survivecorrelationCoverage.R: produces Figure S5 in Supplementary InformationcoverageAcrossContacts.R: produces Figure S6 in Supplementary InformationcumulativeCoveragePlot.R: produces Figure S7 in Supplementary InformationdivpolyCoverage.R: produces Figure S8 in Supplementary InformationgcContentCoverage.R: produces Figure S9 in Supplementary Informationspecificity.R: produces Figure S4 in Supplementary Information