Skip to content
/ xs Public

A FASTQ read simulator


GPL-3.0, Unknown licenses found

Licenses found

Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



15 Commits

Repository files navigation


XS is a skilled FASTQ read simulation tool, flexible, portable (does not need a reference sequence) and tunable in terms of sequence complexity. XS handles Ion Torrent, Roche-454, Illumina and ABI-SOLiD simulation sequencing types. It has several running modes, depending on the time and memory available, and is aimed at testing computing infrastructures, namely cloud computing of large-scale projects, and testing FASTQ compression algorithms. Moreover, XS offers the possibility of simulating the three main FASTQ components individually (headers, DNA sequences and quality-scores). Quality-scores can be simulated using uniform and Gaussian distributions.


git clone
cd XS/

Run XS

Run XS:

./XS [parameters] [outFile]


To see the possible options type

./XS -h

For additional help:


System options:

 -h                       give this help -> show possible parameters;

 -v                       verbose mode -> show more information;


Main FASTQ options:

 -t  <sequencingType>     type: 1=Roche-454, 2=Illumina, 3=ABI SOLiD, 4=Ion 
                          Torrent -> the four main types of sequencing 
                          supported by XS, the default is option 1;

 -hf <headerFormat>       header format: 1=Length appendix, 2=Pair End -> Uses
                          format with pair ends or with appendix reporting the
                          size of the read, such as: "length=80" (only for 
                          sequencing types 2 and 3);

 -i  n=<instrumentName>   the unique instrument name (use n= before name) -> 
                          this name appears in the read;

 -o                       use the same header in third line of the read ->
                          some FASTQ files have an optional header which is a 
                          copy of the real header;

 -ls <lineSize>           static line (bases/quality scores) size -> the size
                          of each DNA base and quality scores size. This 
                          option assume static size;

 -ld <minSize>:<maxSize>  dynamic line (bases/quality scores) size -> This 
                          sets a dynamic line size ranging sizes between a 
                          minimum and a maximum, such as "-ls 35:80";

 -n  <numberOfReads>      number of reads per file -> example: "-n 10000";


DNA options:

 -f  <A>,<C>,<G>,<T>,<N>  symbols frequency -> this sets the nucleotide 
                          distribution. A for Adenine, C for Cytosine, G for 
                          Guanine, T for Thymine and N for raw (any symbol)
                          Example for the overall human DNA:
                          "-f 0.29,0.19,0.19,0.29,0.04";

 -rn <numberOfRepeats>    repeats: number (default: 0) -> number of copies
                          (read article for more detailed explanation);

 -ri <repeatsMinSize>     repeats: minimum size -> minimum size of the 
                          repeats, such as: "-ri 300" (average minimum size of
                          transposable elements, TEs);

 -ra <repeatsMaxSize>     repeats: maximum size -> maximum size of the 
                          repeats. Example; "-ra 3000" (average maximum size 
                          of transposable elements, TEs);
 -rm <mutationRate>       repeats: mutation frequency -> mutation rate.
                          Example: "-rm 0.01" mean that 1 base in 100 have an
                          uniform chance of be mutated;

 -rr                      repeats: use reverse complement repeats (also known 
                          as inverted repeats). Use: "-rr". For example: the 
                          string "ACGTA" will be reverse complemented to 


Quality scores options:

 -qt <assignmentType>     quality scores distribution: 1=uniform, 2=gaussian. 
                          There are two possible quality scores destributions:
                          uniform (each symbol has got equal probability) and
                          normal (gaussian). Example: "-qt 1" will use an 
                          uniform distribution;

 -qf <statsFile>          load file: mean, standard deviation (when: -qt 2)
                          For a normal distribution there is the option of 
                          load a file with mean and standard deviation:
                          "-qf FILE", vi FILE:
                          If this option is not used an array with both will
                          be used (mean and stdev in init.h); 

 -qc <template>           custom template ascii alphabet; It is possible to 
                          use custom quality scores, such as: 
                          "-qc 33:36,55,57:59" wil use the ascii values:


Filtering options:

 -eh                      excludes the use of headers from output -> The 
                          output FASTQ file will omit this data source;

 -eo                      excludes the use of optional headers (+) from output

 -ed                      excludes the use of DNA bases from output -> The
                          output FASTQ file will omit this data source;

 -edb                     excludes '\n' when DNA bases line size is reached ->
                          this is normally used to facilitate the output
                          analysis, leaving the necessity of post-processing;

 -es                      excludes the use of quality scores from output ->
                          The output FASTQ file will omit this data source;


Stochastic options:

 -s  <seed>               generation seed -> a fixed seed with fixed 
                          parameters will generate the same output data.
                          However, a different seed with the same parameters
                          will generate different output data, but with the
                          same distribution;


<genFile>                 simulated output file -> FASTQ simulated file;


Common usage:

 ./XS -v -t 1 -i n=MySeq -ld 30:80 -n 20000 -qt=1 -qc 33,36,39:43 File
 (Common FASTQ file simulation)

 ./XS -v -ls 100 -n 10000 -eh -eo -es -edb -f 0.3,0.2,0.2,0.3,0.0 -rn 50 
-ri 300 -ra 3000 -rm 0.1 File
 (Transposable Elements simulation example)


On using this software please cite:

Pratas, D., Pinho, A. J., & Rodrigues, J. M. R. (2014). XS: a FASTQ read simulator. BMC research notes, 7(1), 40.


For any issue let us know at issues link.


GPL v3.

For more information: