Skip to content

halelab/GBS-SNP-CROP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GBS-SNP-CROP

Latest release v.4.1 (October 6, 2019)

Introduction

The GBS SNP Calling Reference Optional Pipeline (GBS-SNP-CROP) is executed via a sequence of seven Perl scripts that integrate custom parsing and filtering procedures with well-known, vetted bioinformatic tools, giving the user full access to all intermediate files. By employing a novel strategy of variant (SNPs and indels) calling based on the correspondence of within-individual to across-population patterns of polymorphism, the pipeline is able to identify and distinguish high-confidence variants from both sequencing and PCR errors, whether or not a reference genome is available. In the latter case, the pipeline adopts a clustering strategy to build a population-tailored "Mock Reference" using the same GBS data for downstream calling and genotyping. Designed for libraries of either paired-end (PE) or single-end (SE) reads of arbitrary lengths, GBS-SNP-CROP maximizes data usage by eliminating unnecessary data culling due to imposed length uniformity requirements. GBS-SNP-CROP is a complete bioinformatics pipeline developed primarily to support curation, research, and breeding programs wishing to utilize GBS for the cost-effective genome-wide characterization of plant genetic resources.

Pipeline workflow

Stage 1. Process the raw GBS data

  • Step 1: Parse the raw reads
  • Step 2: Trim based on quality and adaptors
  • Step 3: Demultiplex

Stage 2. Build the Mock Reference

  • Step 4: Cluster reads and assemble the Mock Reference

Stage 3. Map the processed reads and generate standardized alignment files

  • Step 5: Align with BWA-mem and process with SAMtools
  • Step 6: Parse mpileup outputs and produce the variants discovery matrix

Stage 4. Call Variants and Genotypes

  • Step 7: Filter variants and call genotypes

PLEASE NOTE: GBS-SNP-CROP is an intentionally modular and flexible pipeline. If your data are already demultiplexed and filtered, simply skip Stage 1 and enter the pipeline at Stage 2. If you have a reference genome and no need for a Mock Reference, simply skip Stage 2 and go directly to Stage 3. Refer to the User Manual for input file naming conventions for each Step.

Below is a schematic of the workflow, with inputs and outputs (boxes) indicated for each Step (arrows).

Released versions

v.4.1: Released on 10/6/2019
v.4.0: Released on 10/22/2018
v.3.0: Released on 2/8/2018
v.2.0: Released on 2/22/2017
v.1.1: Released on 3/11/2016
v.1.0: Released on 1/12/2016

Getting Help

Begin by carefully going through the GBS-SNP-CROP User manual. Before posting a question or starting a discussion, please first refer to the FAQ page. Also, please check your barcode ID file for empty characters or blank spaces and verify that it was saved as a tab-delimited file. If you're still facing an issue or have suggestions for improving this tool, kindly submit your question or comment to our Google groups page.

Requirements

  • Java 7 or higher - The latest version of GBS-SNP-CROP (v.4.1) was tested using Java 8 (update 221)
  • Trimmomatic Latest version tested using v.0.39 (Bolger et al., 2014)
  • PEAR Latest version tested with v.0.9.11 (Zhang et al., 2014)
  • VSEARCH Latest version tested with v2.13.7 (Rognes et al., 2016)
  • BWA aligner Latest version tested with v.0.7.12 (Li & Durbin, 2009)
  • SAMTools Latest version tested with v.1.7 (Li et al., 2009)
  • The following five CPAN modules also need to be installed: Getopt::Long, IO::Zlib, List::Util, List::MoreUtils, Parallel::ForkManager

Citing GBS-SNP-CROP

Melo et al. GBS-SNP-CROP: A reference-optional pipeline for SNP discovery and plant germplasm characterization using genotyping-by-sequencing data. BMC Bioinformatics. 2016. 17:29. DOI 10.1186/s12859-016-0879-y.