Skip to content

A simple method to simulate Genotyping-by-Sequencing (GBS) data.

License

Notifications You must be signed in to change notification settings

anshess/SimGBS.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SimGBS: A Julia Package to Simulate Genotyping-by-Sequencing (GBS) Data

Open In Colab latest Build Status GitHub release (latest by date) GitHub issues Hits GitHub license

Introduction

SimGBS is a versatile method of simulating Genotyping-by-Sequencing (GBS) data. It can be implemented with any genome of choice. Users can modify different parameters to customise GBS setting, such as the choice of restriction enzyme and sequencing depth. By taking the gene-drop approach, users can also specify the demographic history and define population structure (by supplying a pedigree file). Like real sequencers, SimGBS will output data into FASTQ format.

Installation

SimGBS.jl is registered in the General registry. It can be installed using Pkg.add,

julia> import Pkg;Pkg.add("SimGBS")

or simply

julia> ] 
pkg> add SimGBS

Input

  • Reference genome of the target species in FASTA format (e.g., xxx.fasta.gz/xxx.fa.gz)

  • A list of Illumina barcodes (e.g., GBS_Barcodes.txt)

  • (optional) Pedigree File (e.g.,small.ped)

Output

  • GBS fragments generated by virtual digestion (e.g.,rawGBStags.txt)

  • Selected GBS fragments after fragment size-selection (e.g.,GBStags.txt)

  • Haplotypes, SNP and QTL genotypes (e.g.,hap.txt, snpGeno.txt and qtlGeno.txt)

  • Basic information about simulated GBS experiment (e.g.,keyFile.txt)

  • Simulated GBS reads in FASTQ format (e.g.,xxxxx.fastq)

etc.

Overview

For more information, please visit the documentation page.

Citation

Please cite the following if you use SimGBS.jl,

What's Next?

The following tools are recommended for downstream analyses of GBS data,

  • snpGBS: a simple bioinformatics workflow to identify single nucleotide polymorphism (SNP) from Genotyping-by-Sequencing (GBS) data.

  • KGD: R code for the analysis of genotyping-by-sequencing (GBS) data, primarily to construct a genomic relationship matrix for the genotyped individuals.

  • GUSLD: An R package for estimating linkage disequilibrium using low and/or high coverage sequencing data without requiring filtering with respect to read depth.

  • SMAP a software package that analyzes read mapping distributions and performs haplotype calling to create multi-allelic molecular markers.