Jared Simpson edited this page Dec 4, 2013 · 9 revisions

sga comes with a quality control and data exploration module. This module will estimate sequence coverage, per-base error rates and genome size, heterozygosity and repeat content. It is highly recommended to run this module on your data to better understand how difficult the assembly will be. Once you have produced the preqc PDF report, feel free to share it on the sga-users mailing list and ask for advice on how to best proceed with the assembly.

A full description can be found in this announcement post on the sga mailing list:


A preprint of the preqc manuscript is available on the arxiv:


To generate a preqc report for your data, run these four commands:

sga preprocess --pe-mode 1 reads_R1.fastq reads_R2.fastq > mygenome.fastq
sga index -a ropebwt --no-reverse -t 8 mygenome.fastq
sga preqc -t 8 mygenome.fastq > mygenome.preqc
sga-preqc-report.py mygenome.preqc sga/src/examples/*.preqc
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.