Skip to content

Latest commit

 

History

History
115 lines (80 loc) · 5.38 KB

AlignmentGraph.rst

File metadata and controls

115 lines (80 loc) · 5.38 KB

ToC

Alignment Representation (Graph)

Learning Objective

This tutorial introduces you to the graph data structures that can be used to represent an alignment in SeqAn. You will learn basic techniques to create and modify such data structures and how to access certain information from these data structures.

Difficulty

Basic

Duration

15 min

Prerequisites

tutorial-getting-started-first-steps-in-seqan, tutorial-datastructures-sequences


Another very useful representation of alignments is given by the AlignmentGraph Alignment Graph. It is a graph in which each vertex corresponds to a sequence segment, and each edge indicates an ungapped alignment between the connected vertices, or more precisely between the sequences stored in those vertices. Here is an example of such a graph:

image

In the following we will actually construct this example step by step. First we include the iostream header from the STL and the <seqan/align.h> header to include all necessary functions and data structures we want to use. We use the namespace seqan and write the main function with an empty body.

demos/tutorial/alignment/graph.cpp

At the begin of the function we define our types we want to use later on. We define TSequence as the type of our input strings. Since we work with a Dna alphabet we define TSequence as a String over a Dna alphabet. For the AlignmentGraph we need two StringSets. The TStringSet is used to actually store the input sequences and the TDepStringSet is internally used by the AlignmentGraph. That is the AlignmentGraph does not copy the sources into its data structure but rather stores a reference to each of the given input strings as it does not modify the input sequences. The DependentStringSet Dependent StringSet facilitates this behavior. In the end we define the actual AlignmentGraph type.

demos/tutorial/alignment/graph.cpp

We first create our two input sequences TTGT and TTAGT append them to the StringSet strings using the StringConcept#appendValue function and pass the initialized strings object as a parameter to the constructor of the AlignmentGraph alignG.

demos/tutorial/alignment/graph.cpp

Before adding vertices to the graph align prints the empty adjacency and edge list.

demos/tutorial/alignment/graph.cpp.stdout

Before we construct the alignment we print the unmodified AlignmentGraph. Then we add some alignment information to the graph. In order to add an ungapped alignment segment we have to add an edge connecting two vertices of different input sequences. To do so we can use the function Graph#addEdge and specify the two vertices that should be connected. Since we do not have any vertices yet, we create them on the fly using the function Graph#addVertex addVertex(). The function addVertex gets as second parameter the id which points to the the correct input sequence within the strings object. We can use the function StringSet#positionToId positionToId() to receive the id that corresponds to a certain position within the underlying Dependent StringSet of the AlignmentGraph.

We can access the Dependent StringSet using the function Align#stringSet stringSet(). The third parameter of addVertex specifies the begin position of the segment within the respective input sequence and the fourth parameter specifies its length. Now, we add an edge between the two vertices of each input sequence which covers the first two positions. In the next step we have to add a gap. We can do this simply by just adding a vertex that covers the inserted string. Finally we have to add the second edge to represent the last ungapped sequence and then we print the constructed alignment.

Note that we use AlignmentGraph#findVertex findVertex() to find the the last two inserted vertices. The syntax is the same as Graph#addVertex addVertex(), but omits the length parameter.

demos/tutorial/alignment/graph.cpp

Now align prints the desired alignment.

demos/tutorial/alignment/graph.cpp.stdout

The general usage of graphs is explained in the tutorial-datastructures-graphs tutorial.

Assignment 1

Type

Review

Objective

Construct a multiple sequence alignment using the Alignment Graph data structure. Use the three sequences GARFIELDTHECAT, GARFIELDTHEBIGCAT and THEBIGCAT and align them such that you obtain the maximal number of matches.

Hints
Solution