Join GitHub today
sga comes with a quality control and data exploration module. This module will estimate sequence coverage, per-base error rates and genome size, heterozygosity and repeat content. It is highly recommended to run this module on your data to better understand how difficult the assembly will be. Once you have produced the preqc PDF report, feel free to share it on the sga-users mailing list and ask for advice on how to best proceed with the assembly.
A full description can be found in this announcement post on the sga mailing list:
A preprint of the preqc manuscript is available on the arxiv:
To generate a preqc report for your data, run these four commands:
sga preprocess --pe-mode 1 reads_R1.fastq reads_R2.fastq > mygenome.fastq sga index -a ropebwt --no-reverse -t 8 mygenome.fastq sga preqc -t 8 mygenome.fastq > mygenome.preqc sga-preqc-report.py mygenome.preqc sga/src/examples/*.preqc