Jared Simpson edited this page Nov 25, 2013 · 17 revisions


SGA is a de novo assembler designed to assemble large genomes from high coverage short read data. It is designed as a modular set of programs, which are used to form an assembly pipeline. A description of the SGA design is found here.

Data exploration and quality control

It is highly recommended that you run the 'preqc' module on your data prior to assembly. This module will give you information about the read quality and genome characteristics. See this page for more information.

First steps

The source directory contains examples of real assemblies using SGA. You should read these scripts or (better) download the data for one of the smaller genomes (I recommend the C. elegans data set) and run the example yourself. This will help you get understand the SGA pipeline so you can run the assembler effectively on your own data.


It is highly recommended that you read the SGA FAQ page.

Further help

There is a mailing list for sga on google groups.