The code in this repository accompany the paper:
Marisa C.W. Lim, Ke Bi, Christopher C. Witt, Catherine H. Graham, Liliana M. Davalos
Note: this pipeline uses scripts written by UC Berkeley MVZ/CGRL scientists.
1. Exon capture probe design: There were two probe design procedures - the first to capture previously identified candidate genes for high-altitude adaptation and the second to capture a 'random' set of exons across the entire hummingbird genome using the Anna's hummingbird (Calypte anna) as the reference.
2. Clean raw_reads_and assemble: Remove low quality reads, adapter sequences, PCR duplicates, and potential bacterial contaminant sequences; merge paired-end reads; and generate de novo assemblies as species-specific references
3. Read alignment: Map reads to species-specific reference assemblies and evaluate capture experiment (% reads retained after filtering but before alignment, length of mapped data, % of reads aligned to target region (specificity), % of targeted regions covered by at least one read (sensitivity), average coverage, variation in coverage, and % of sites retained at multiple coverage depths)
4. SNP calling: Call variants and filter loci with too much missing data, within 10bp of indels, that are not biallelic, and/or that have excessive heterozygosity
- PCA
- Admixture
- Relatedness coefficient
- Gene flow estimates
- Between population diversity (Fst, Dxy)
- Isolation by distance - geodesic and least cost distance
- Within population diversity
- Estimating Effective Migration Surfaces (EEMS)
- Test for natural selection with Latent Factor Mixed Models (LFMM)