Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
genotyping_pipeline
README.md

README.md

Genotyping pipeline for short read Illumina data generated from whole genome sequencing

Scripts for processing raw reads to generate a set of high quality genotypes, as in "Genomic signatures of extensive inbreeding in Isle Royale wolves, a population on the threshold of extinction" by Robinson et al. (2019). Pipeline adapted from the Best Practices for GATK3.

Required software:

Other requirements:

Pipeline

1. FastqToSam

Convert raw read data to an unmapped BAM file.

2. MarkIlluminaAdapters

Mark Illumina adapter sequences in the unmapped BAM file.

3. AlignCleanBam

Efficiently align reads to a reference genome.

4. MarkDuplicates

Mark (and optionally remove) duplicate reads.

5. RemoveBadReads (optional)

Remove reads with low mapping quality or that don't align in proper pair (as indicated by flags). Optional step, reduces the size of the BAM file by eliminating reads that are not desired in downstream processing.

6. BaseQualityScoreRecalibration

Recalibrate base quality scores to reach convergence between reported and empirical base quality scores.

7. HaplotypeCaller

Generate a gVCF file for each BAM file.

8. GenotypeGVCFs

Generate a VCF file from gVCF files.

9. TrimAlternates and VariantAnnotator

Remove alleles from the VCF file that don't appear in any genotypes, and add desired annotations to the INFO field.

10. Variant Effect Predictor

Add mutation impact to the INFO field.

11. VariantFiltration and custom filtering

Apply site- and genotype-level filters.

You can’t perform that action at this time.