Skip to content

guilherme-pereira/tassel4-poly

Repository files navigation

Tassel4-Poly

Modified version of Tassel4 (v.4.3.7) for running the Tassel-GBS pipeline modified for polyploid species with high read depths

Overview

Tassel is a well-known Java program that has also implemented a genotyping-by-sequencing (GBS) analysis pipeline called Tassel-GBS. This pipeline has been broadly used by plant geneticists who want to use GBS-based single nucleotide polymorphism (SNP) in their genetic studies. However, one limitation for its use in polyploid species is that the pipeline provides an approximate or truncated number of read counts (up to 127), which are ultimately used to call diploidized genotypes. We modified Tassel-GBS pipeline so that it could store and return the actual read counts per individual per locus.

Running the pipeline

In the same way as Tassel3, Tassel4 is run from the terminal with command lines such as those found in this tutorial. Only two plugins have been directly affected by the modification: FastqToTBT and DiscoverySNPCaller. In order to get the exact read counts, one should change the flag -y to -sh so that a maximum number of 32,767 reads will be recorded (instead of up to 127 with the -y). This number should suffice the GBS experiments with higher depths demanded by polyploid species studies.

Example commands (adapted from the tutorial):

  • FastqToTBTPlugin
./tassel4-poly/run_pipeline.pl -fork1 -FastqToTBTPlugin -i fastq -k myGBSProject_key.txt -e ApeKI -o tbt -sh –t mergedTagCounts/myMasterTags.cnt -endPlugin -runfork1
  • DiscoverySNPCallerPlugin
./tassel4-poly/run_pipeline.pl -fork1 -DiscoverySNPCallerPlugin -i mergedTBT/myStudy.tbt.shrt -sh -m topm/myMasterTags.topm -mUpd topm/myMasterTagsWithVariants.topm -vcf -o vcf/raw/myGBSGenos_chr+ -mnF 0.8 -p myPedigreeFile.ped -mnMAF 0.02 -mnMAC 100000 -ref MyReferenceGenome.fa -sC 1 -eC 10 -endPlugin -runfork1

Remember to change the file extension *.byte to *.shrt where needed in subsequent plugins. Also, notice that VCF files returned by Tassel4-Poly still contains genotype calls as in a diploid species. In order to get quantitative (dosage) genotype calls for polyploid species as provided by SuperMASSA software, you may want to use the VCF files as input data in the VCF2SM pipeline.

Cite

If you end up using Tassel4-Poly in your study, please make sure to cite the publications below in your paper:

  • Glaubitz JC, Casstevens TM, Lu F, Harriman J, Elshire RJ, Sun Q, Buckler ES. (2014) TASSEL-GBS: A High Capacity Genotyping by Sequencing Analysis Pipeline. PLoS ONE 9(2): e90346. https://doi.org/10.1371/journal.pone.0090346

  • Pereira GS, Garcia AAF, Margarido GRA. (2018) A fully automated pipeline for quantitative genotype calling from next generation sequencing data in autopolyploids. BMC Bioinformatics 19:398. https://doi.org/10.1186/s12859-018-2433-6.

About

Modified version of Tassel4 (v.4.3.7) for running the Tassel-GBS pipeline modified for polyploid species with high read depths

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages