Skip to content

Scripts used for tule elk de novo genome assembly and SNP calling

Notifications You must be signed in to change notification settings

jessicalumian/tule-elk

Repository files navigation

Tule Elk de novo genome assembly and SNP calling

This repo contains scripts used for de novo tule elk genome assembly and SNP calling as described in Titus' blog post.

Software versions (also listed in hpcc.modules file):

  • FastQC v0.11.3
  • Trimmomatic v0.30
  • khmer v2.0
  • bwa v0.7.7.r441
  • SAMTools v1.2
  • MEGAHIT v1.0.5 (installed by cloning the github repo)

Assembly steps:

  1. Quality Evaluation using FastQC (no script for this, just fastqc *.fq.gz in the same directory as the files)
  2. Trimming using trimmomatic - interleave-to-before-asm.sh (great script name, I know)
  3. Interleaving reads using khmer - interleave-to-before-asm.sh
  4. Assembly using MEGAHIT - megahit-asm.sh (I was initially using velvet but found it too slow, but the scripts are still up)
  5. Assembly evaluation using QUAST - quast.sh (Results are available for viewing here)

SNP calling and determining heterozygous sites:

  1. Using assembly file, mapping using bwa and Samtools and call polymorphic sites using freebayes - Example for one elk: 1339-mapping-snp-calling.sh
  2. Optional - generate mapping stats - mapping-stats.sh

Because this was my first time mapping, I made a diagram for my own reference that might be helpful to others in a similar position:

alt text

The tule elk (Cervus elaphus nannodes) is a California-endemic subspecies that underwent a major genetic bottleneck when its numbers were reduced to as few as 3 individuals in the 1870s (McCullough 1969; Meredith et al. 2007). Since then, the population has grown to an estimated 4,300 individuals which currently occur in 22 distinct herds (Hobbs 2014). Despite their higher numbers today, the historical loss of genetic diversity combined with the increasing fragmentation of remaining habitat pose a significant threat to the health and management of contemporary populations. As populations become increasingly fragmented by highways, reservoirs, and other forms of human development, risks intensify for genetic impacts associated with inbreeding. By some estimates, up to 44% of remaining genetic variation could be lost in small isolated herds in just a few generations (Williams et al. 2004). For this reason, the Draft Elk Conservation and Management Plan and California Wildlife Action Plan prioritize research aimed at facilitating habitat connectivity, as well as stemming genetic diversity loss and habitat fragmentation (Hobbs 2014; CDFW 2015).

You can read more about this on Titus' blog until we get the paper written.

About

Scripts used for tule elk de novo genome assembly and SNP calling

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages