Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 

Genotyping pipeline for short read Illumina data generated from whole genome sequencing

Scripts for processing raw reads to generate a set of high quality genotypes, as in "Genomic signatures of extensive inbreeding in Isle Royale wolves, a population on the threshold of extinction" by Robinson et al. (2019). Pipeline adapted from the Best Practices for GATK3.

Required software:

Other requirements:

Pipeline

1. FastqToSam

Convert raw read data to an unmapped BAM file.

2. MarkIlluminaAdapters

Mark Illumina adapter sequences in the unmapped BAM file.

3. AlignCleanBam

Efficiently align reads to a reference genome.

4. MarkDuplicates

Mark (and optionally remove) duplicate reads.

5. RemoveBadReads (optional)

Remove reads with low mapping quality or that don't align in proper pair (as indicated by flags). Optional step, reduces the size of the BAM file by eliminating reads that are not desired in downstream processing.

6. BaseQualityScoreRecalibration

Recalibrate base quality scores to reach convergence between reported and empirical base quality scores.

7. HaplotypeCaller

Generate a gVCF file for each BAM file.

8. GenotypeGVCFs

Generate a VCF file from gVCF files.

9. TrimAlternates and VariantAnnotator

Remove alleles from the VCF file that don't appear in any genotypes, and add desired annotations to the INFO field.

10. Variant Effect Predictor

Add mutation impact to the INFO field.

11. VariantFiltration and custom filtering

Apply site- and genotype-level filters.

About

No description, website, or topics provided.

Resources

Packages

No packages published