ToC
- Learning Objective
This tutorial introduces you to the graph data structures that can be used to represent an alignment in SeqAn. You will learn basic techniques to create and modify such data structures and how to access certain information from these data structures.
- Difficulty
Basic
- Duration
15 min
- Prerequisites
tutorial-getting-started-first-steps-in-seqan
,tutorial-datastructures-sequences
Another very useful representation of alignments is given by the AlignmentGraph Alignment Graph
. It is a graph in which each vertex corresponds to a sequence segment, and each edge indicates an ungapped alignment between the connected vertices, or more precisely between the sequences stored in those vertices. Here is an example of such a graph:
In the following we will actually construct this example step by step. First we include the iostream
header from the STL and the <seqan/align.h>
header to include all necessary functions and data structures we want to use. We use the namespace seqan
and write the main
function with an empty body.
demos/tutorial/alignment/graph.cpp
At the begin of the function we define our types we want to use later on. We define TSequence
as the type of our input strings. Since we work with a Dna
alphabet we define TSequence
as a String
over a Dna alphabet. For the AlignmentGraph we need two StringSets. The TStringSet
is used to actually store the input sequences and the TDepStringSet
is internally used by the AlignmentGraph. That is the AlignmentGraph does not copy the sources into its data structure but rather stores a reference to each of the given input strings as it does not modify the input sequences. The DependentStringSet Dependent StringSet
facilitates this behavior. In the end we define the actual AlignmentGraph type.
demos/tutorial/alignment/graph.cpp
We first create our two input sequences TTGT
and TTAGT
append them to the StringSet strings
using the StringConcept#appendValue
function and pass the initialized strings
object as a parameter to the constructor of the AlignmentGraph alignG
.
demos/tutorial/alignment/graph.cpp
Before adding vertices to the graph align
prints the empty adjacency and edge list.
demos/tutorial/alignment/graph.cpp.stdout
Before we construct the alignment we print the unmodified AlignmentGraph. Then we add some alignment information to the graph. In order to add an ungapped alignment segment we have to add an edge connecting two vertices of different input sequences. To do so we can use the function Graph#addEdge
and specify the two vertices that should be connected. Since we do not have any vertices yet, we create them on the fly using the function Graph#addVertex addVertex()
. The function addVertex gets as second parameter the id which points to the the correct input sequence within the strings
object. We can use the function StringSet#positionToId positionToId()
to receive the id that corresponds to a certain position within the underlying Dependent StringSet of the AlignmentGraph.
We can access the Dependent StringSet using the function Align#stringSet stringSet()
. The third parameter of addVertex specifies the begin position of the segment within the respective input sequence and the fourth parameter specifies its length. Now, we add an edge between the two vertices of each input sequence which covers the first two positions. In the next step we have to add a gap. We can do this simply by just adding a vertex that covers the inserted string. Finally we have to add the second edge to represent the last ungapped sequence and then we print the constructed alignment.
Note that we use AlignmentGraph#findVertex findVertex()
to find the the last two inserted vertices. The syntax is the same as Graph#addVertex
addVertex()
, but omits the length parameter.
demos/tutorial/alignment/graph.cpp
Now align
prints the desired alignment.
demos/tutorial/alignment/graph.cpp.stdout
The general usage of graphs is explained in the tutorial-datastructures-graphs
tutorial.
- Type
Review
- Objective
Construct a multiple sequence alignment using the Alignment Graph data structure. Use the three sequences
GARFIELDTHECAT
,GARFIELDTHEBIGCAT
andTHEBIGCAT
and align them such that you obtain the maximal number of matches.- Hints
- Solution