Skip to content

ASQG Format

Jared Simpson edited this page Mar 5, 2014 · 12 revisions

The ASQG format describes an assembly graph. Each line is a tab-delimited record. The first field in each record describes the record type. The three types are:

  1. HT - Header record. This record contains metadata tags for the file version (VN tag) and parameters associated with the graph (for example the minimum overlap length).
  2. VT - Vertex records. The second field contains the vertex identifier, the third field contains the sequence. Subsequent fields contain optional tags.
  3. ED - Edge description records. The second field describes a pair of overlapping sequences. A full description of this field is below. Subsequent fields contain optional tags.

Tags

Tags follow the same format as SAM

Example

HT	VN:i:1	ER:f:0	OL:i:45	IN:Z:reads.fa	CN:i:1	TE:i:0
VT	read1	GATCGATCTAGCTAGCTAGCTAGCTAGTTAGATGCATGCATGCTAGCTGG
VT	read2	CGATCTAGCTAGCTAGCTAGCTAGTTAGATGCATGCATGCTAGCTGGATA
VT	read3	ATCTAGCTAGCTAGCTAGCTAGTTAGATGCATGCATGCTAGCTGGATATT
ED	read2 read1 0 46 50 3 49 50 0 0
ED	read3 read2 0 47 50 2 49 50 0 0

Edge descriptions

The second field of ED records describe an overlap between a pair of sequences. This field contains 10 elements which are:

  1. sequence 1 name
  2. sequence 2 name
  3. sequence 1 overlap start (0 based)
  4. sequence 1 overlap end (inclusive)
  5. sequence 1 length
  6. sequence 2 overlap start (0 based)
  7. sequence 2 overlap end (inclusive)
  8. sequence 2 length
  9. sequence 2 orientation (1 for reversed with respect to sequence 1)
  10. number of differences in overlap (0 for perfect overlaps, which is the default).
You can’t perform that action at this time.