ToC
In SeqAn, alphabets are value types that can take a limited number of values and which hence can be mapped to a range of natural numbers. We can retrieve the number of different values of an alphabet, the alphabet size, by the metafunction FiniteOrderedAlphabetConcept#ValueSize
. Another useful metafunction called AlphabetConcept#BitsPerValue
can be used to determine the number of bits needed to store a value of a given alphabet. The order of a character in the alphabet (i.e. its corresponding natural number) can be retrieved by calling the function FiniteOrderedAlphabetConcept#ordValue
. In SeqAn, several standard alphabets are already predefined, for example Dna
Dna5
, Rna
, Rna5
, Iupac
, AminoAcid
, ....
Let's start with a simple task. We want to write a program that outputs all letters of the predefined AminoAcid
alphabet. First we include the corresponding header files and specify that we are using the namespace seqan
.
demos/tutorial/basics/show_alphabets.cpp
Next, we will define a template function template<typename TAlphabet> void showAllLettersOfMyAlphabet(TAlphabet const&)
which will iterate over all the characters of the alphabet and outputs them. For this, we need to determine the alphabet size using the metafunction FiniteOrderedAlphabetConcept#ValueSize ValueSize<TAlphabet>::VALUE
. Then we increment a counter from 0 to the alphabet size minus one and output the counter as well as the corresponding letter of the alphabet using a conversion constructor.
demos/tutorial/basics/show_alphabets.cpp
In the main program we simply call the above function using a number of alphabets that are predefined in SeqAn.
demos/tutorial/basics/show_alphabets.cpp
This program produces the following output:
demos/tutorial/basics/show_alphabets.cpp.stdout
An iterator is an object that is used to browse through the values of a container. The metafunction ContainerConcept#Iterator
can be used to determine an appropriate iterator type given a container. Some containers offer several kinds of iterators, which can be selected by an optional argument of Iterator. For example, the tag ContainerIteratorTags#Standard
can be used to get an iterator type that resembles the C++ standard random access iterator. The more elaborated RootedIteratorConcept Rooted\ Iterator
, i.e., an iterator that knows its container, can be selected by specifying the ContainerIteratorTags#Rooted
tag.
Rooted iterators offer some convenience for the user: They offer additional functions like RootedIteratorConcept#container
for determining the container on which the iterator works, and they simplify the interface for other functions like RootedIteratorConcept#atEnd
. Moreover, rooted iterators may change the container’s length or capacity, which makes it possible to implement a more intuitive variant of a remove algorithm.
While rooted iterators can usually be converted into standard iterators, it is not always possible to convert standard iterators back into rooted iterators, since standard iterators may lack the information about the container they work on. Therefore, many functions that return iterators like ContainerConcept#begin
or ContainerConcept#end
return rooted iterators instead of standard iterators; this way, they can be used to set both rooted and standard iterator variables. Alternatively it is possible to specify the returned iterator type explicitly by passing the iterator kind as a tag argument.
The following code piece shows examples for creating Iterators for ContainerConcept Containers
. If no iterator kind is specified, the metafunction ContainerConcept#Iterator
assumes ContainerIteratorTags#Standard
and the function ContainerConcept#begin
assumes ContainerIteratorTags#Rooted
. Both it1
and it2
are standard iterators, whereas it3
and it4
are rooted iterators.
demos/tutorial/basics/base.cpp
- Type
Transfer
Objective
- Write a program which does the following:
- Create an amino acid string of the following sequence: "MQDRVKRPMNAFIVWSRDQRRKMALEN".
- Iterate through the sequence and replace all ‘R’ with ‘A’.
- Create a second string where you count the number of occurrences of each amino acid.
- Iterate through the latter string and output the frequency table.
Hints
After a few hours browsing through the demos you should be able to solve this.
Solution
In this assignment we practice the use of alphabets, iterators and metafunctions in SeqAn. We start by including the seqan basic header and enter the namespace
seqan
to avoid writing it as a prefix (as we do with the namespacestd
in this example). In themain
function we first define a a typeTAmincoAcidString
which is aString<AminoAcid>
(Note the SeqAn naming conventions). Then we define a variablesourceSeq
of this type and initialize it with a string constant.demos/tutorial/basics/strings.cpp
Then we define an iterator type using the SeqAn metafunction
ContainerConcept#Iterator
. Using the correct iterator we iterate over our amino acid string using the SeqAn functionsContainerConcept#begin
,ContainerConcept#end
, andInputIteratorConcept#goNext
. In the body of the while loop we use the SeqAn functionIteratorAssociatedTypesConcept#value
to access the value the iterator is pointing to. Note that this function returns a reference which allows us to replace the occurrence of allR
's withA
's. So at this point we have solved parts a) and b) of the assignment.demos/tutorial/basics/strings.cpp
In the next part of the code we want to count, how often a specific letter of the alphabet occurs in the string. To obtain the size type of the used alphabet we call the SeqAn metafunction
ContainerConcept#Size Size
and define aString
of that type to hold the counters. TheString
has here basically the same functionality as a STLvector
. Since alphabets are mapped to a contiguous interval of the natural numbers, we can initialize the counter up to the size of the alphabet which we obtain by a call to the SeqAn metafunctionFiniteOrderedAlphabetConcept#ValueSize ValueSize
. We then iterate over the amino acid string and increment the counter for the corresponding letter of the alphabet. In order to know the corresponding natural number of an alphabet letter, we use the SeqAn functionFiniteOrderedAlphabetConcept#ordValue
. Note the use of theIteratorAssociatedTypesConcept#value
function. In this example one could also use theoperator[]
to writecounter[ordValue(value(it))]++
.demos/tutorial/basics/strings.cpp
Finally we iterate through the counter String and output the i-th aminoacid (by calling a constructor with the letter's ordinal value) ad its frequency.
demos/tutorial/basics/strings.cpp
The result looks like this:
demos/tutorial/basics/strings.cpp.stdout
Controlling memory allocation is one of the big advantages of C++ compared to other programming languages as for example Java. Depending on the size of objects and the pattern they are allocated during the program execution, certain memory allocation strategies have advantages compared to others. SeqAn supports a variety of memory allocation strategies.
The two functions Allocator#allocate
and Allocator#deallocate
are used in SeqAn to allocate and deallocate dynamic memory. Both functions take an allocator as an argument. An Allocator
is an object that is responsible for allocated memory. The default implementations of Allocator#allocate
and Allocator#deallocate
completely ignore the allocator but simply call the basic operators new
and delete
. Although in principle every kind of object can be used as allocator, typically the object that stores the pointer to the allocated memory is used as allocator. For example, if memory is allocated for an AllocString Alloc String
, this string itself acts as allocator. A memory block should be deallocated using the same allocator object as it was allocated for. The following allocators are available in SeqAn and support the Allocator#clear
function. This function deallocates at once all memory blocks that were previously allocated.
SimpleAllocator Simple Allocator
General purpose allocator.
SinglePoolAllocator Single Pool Allocator
Allocator that pools memory blocks of specific size. Blocks of different sizes are not pooled.
MultiPoolAllocator Multi Pool Allocator
Allocator that pools memory blocks. Only blocks up to a certain size are pooled. The user can specify the size limit in a template argument.
The function Allocator#allocate
has an optional argument to specify the intended allocator usage for the requested memory. The user can thereby specialize Allocator#allocate
for different allocator applications. For example, the tag AllocatorUsageTags#TagAllocateTemp
specifies that the memory will only be used temporarily, whereas AllocatorUsageTags#TagAllocateStorage
indicates that the memory will be used in the long run for storing values of a container.
SeqAn also offers more complex allocators which support the function Allocator#clear
. The library predefines some allocator specializations for different uses (see above). Most of these allocators are pool allocators. A pool allocator implements its own memory management. It reserves storage for multiple memory blocks at a time and recycles deallocated blocks. This reduces the number of expensive new
and delete
calls and speeds up the allocation and deallocation.
- Type
Application
Objective
Write a program which compares the runtimes of the
SimpleAllocator Simple Allocator
and theMultiPoolAllocator Multi Pool Allocator
for pool sizes (10,100,1000) for allocating and deallocating memory.
Hint
For timing the allocation you can use
sysTime
.
Solution
We start in this assignment by including the
basic.h
SeqAn header and defining two different allocators, oneMultiPoolAllocator Multi Pool Allocator
and oneSimpleAllocator Simple Allocator
.demos/tutorial/basics/allocator.cpp
Given these fixed allocators we allocate now various size blocks, namely of size 10, 100, and 1000. We repeat the allocation a number of times and then clear the allocated memory. For each of the block sizes we output the system time needed to allocate and clear the memory.
demos/tutorial/basics/allocator.cpp
Running this program results in the following output which shows the advantage of the
MultiPoolAllocator Multi Pool Allocator
:demos/tutorial/basics/allocator.cpp.stdout