Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
SGA is a de novo assembler designed to assemble large genomes from high coverage short read data. It is designed as a modular set of programs, which are used to form an assembly pipeline. A description of the SGA design is found here.
Data exploration and quality control
It is highly recommended that you run the 'preqc' module on your data prior to assembly. This module will give you information about the read quality and genome characteristics. See this page for more information.
The source directory contains examples of real assemblies using SGA. You should read these scripts or (better) download the data for one of the smaller genomes (I recommend the C. elegans data set) and run the example yourself. This will help you get understand the SGA pipeline so you can run the assembler effectively on your own data.
It is highly recommended that you read the SGA FAQ page.
There is a mailing list for sga on google groups.