Skip to content

Latest commit

 

History

History
212 lines (145 loc) · 12.6 KB

Basics.rst

File metadata and controls

212 lines (145 loc) · 12.6 KB

ToC

Basics

Alphabets

In SeqAn, alphabets are value types that can take a limited number of values and which hence can be mapped to a range of natural numbers. We can retrieve the number of different values of an alphabet, the alphabet size, by the metafunction FiniteOrderedAlphabetConcept#ValueSize. Another useful metafunction called AlphabetConcept#BitsPerValue can be used to determine the number of bits needed to store a value of a given alphabet. The order of a character in the alphabet (i.e. its corresponding natural number) can be retrieved by calling the function FiniteOrderedAlphabetConcept#ordValue. In SeqAn, several standard alphabets are already predefined, for example Dna Dna5, Rna, Rna5, Iupac, AminoAcid, ....

Let's start with a simple task. We want to write a program that outputs all letters of the predefined AminoAcid alphabet. First we include the corresponding header files and specify that we are using the namespace seqan.

demos/tutorial/basics/show_alphabets.cpp

Next, we will define a template function template<typename TAlphabet> void showAllLettersOfMyAlphabet(TAlphabet const&) which will iterate over all the characters of the alphabet and outputs them. For this, we need to determine the alphabet size using the metafunction FiniteOrderedAlphabetConcept#ValueSize ValueSize<TAlphabet>::VALUE. Then we increment a counter from 0 to the alphabet size minus one and output the counter as well as the corresponding letter of the alphabet using a conversion constructor.

demos/tutorial/basics/show_alphabets.cpp

In the main program we simply call the above function using a number of alphabets that are predefined in SeqAn.

demos/tutorial/basics/show_alphabets.cpp

This program produces the following output:

demos/tutorial/basics/show_alphabets.cpp.stdout

Iterators

An iterator is an object that is used to browse through the values of a container. The metafunction ContainerConcept#Iterator can be used to determine an appropriate iterator type given a container. Some containers offer several kinds of iterators, which can be selected by an optional argument of Iterator. For example, the tag ContainerIteratorTags#Standard can be used to get an iterator type that resembles the C++ standard random access iterator. The more elaborated RootedIteratorConcept Rooted\ Iterator, i.e., an iterator that knows its container, can be selected by specifying the ContainerIteratorTags#Rooted tag.

Rooted iterators offer some convenience for the user: They offer additional functions like RootedIteratorConcept#container for determining the container on which the iterator works, and they simplify the interface for other functions like RootedIteratorConcept#atEnd. Moreover, rooted iterators may change the container’s length or capacity, which makes it possible to implement a more intuitive variant of a remove algorithm.

While rooted iterators can usually be converted into standard iterators, it is not always possible to convert standard iterators back into rooted iterators, since standard iterators may lack the information about the container they work on. Therefore, many functions that return iterators like ContainerConcept#begin or ContainerConcept#end return rooted iterators instead of standard iterators; this way, they can be used to set both rooted and standard iterator variables. Alternatively it is possible to specify the returned iterator type explicitly by passing the iterator kind as a tag argument.

The following code piece shows examples for creating Iterators for ContainerConcept Containers. If no iterator kind is specified, the metafunction ContainerConcept#Iterator assumes ContainerIteratorTags#Standard and the function ContainerConcept#begin assumes ContainerIteratorTags#Rooted. Both it1 and it2 are standard iterators, whereas it3 and it4 are rooted iterators.

demos/tutorial/basics/base.cpp

Assignment 1

Type

Transfer

Objective

Write a program which does the following:
  1. Create an amino acid string of the following sequence: "MQDRVKRPMNAFIVWSRDQRRKMALEN".
  2. Iterate through the sequence and replace all ‘R’ with ‘A’.
  3. Create a second string where you count the number of occurrences of each amino acid.
  4. Iterate through the latter string and output the frequency table.

Hints

After a few hours browsing through the demos you should be able to solve this.

Solution

In this assignment we practice the use of alphabets, iterators and metafunctions in SeqAn. We start by including the seqan basic header and enter the namespace seqan to avoid writing it as a prefix (as we do with the namespace std in this example). In the main function we first define a a type TAmincoAcidString which is a String<AminoAcid> (Note the SeqAn naming conventions). Then we define a variable sourceSeq of this type and initialize it with a string constant.

demos/tutorial/basics/strings.cpp

Then we define an iterator type using the SeqAn metafunction ContainerConcept#Iterator. Using the correct iterator we iterate over our amino acid string using the SeqAn functions ContainerConcept#begin, ContainerConcept#end, and InputIteratorConcept#goNext. In the body of the while loop we use the SeqAn function IteratorAssociatedTypesConcept#value to access the value the iterator is pointing to. Note that this function returns a reference which allows us to replace the occurrence of all R's with A's. So at this point we have solved parts a) and b) of the assignment.

demos/tutorial/basics/strings.cpp

In the next part of the code we want to count, how often a specific letter of the alphabet occurs in the string. To obtain the size type of the used alphabet we call the SeqAn metafunction ContainerConcept#Size Size and define a String of that type to hold the counters. The String has here basically the same functionality as a STL vector. Since alphabets are mapped to a contiguous interval of the natural numbers, we can initialize the counter up to the size of the alphabet which we obtain by a call to the SeqAn metafunction FiniteOrderedAlphabetConcept#ValueSize ValueSize. We then iterate over the amino acid string and increment the counter for the corresponding letter of the alphabet. In order to know the corresponding natural number of an alphabet letter, we use the SeqAn function FiniteOrderedAlphabetConcept#ordValue. Note the use of the IteratorAssociatedTypesConcept#value function. In this example one could also use the operator[] to write counter[ordValue(value(it))]++.

demos/tutorial/basics/strings.cpp

Finally we iterate through the counter String and output the i-th aminoacid (by calling a constructor with the letter's ordinal value) ad its frequency.

demos/tutorial/basics/strings.cpp

The result looks like this:

demos/tutorial/basics/strings.cpp.stdout

Memory Allocation

Controlling memory allocation is one of the big advantages of C++ compared to other programming languages as for example Java. Depending on the size of objects and the pattern they are allocated during the program execution, certain memory allocation strategies have advantages compared to others. SeqAn supports a variety of memory allocation strategies.

The two functions Allocator#allocate and Allocator#deallocate are used in SeqAn to allocate and deallocate dynamic memory. Both functions take an allocator as an argument. An Allocator is an object that is responsible for allocated memory. The default implementations of Allocator#allocate and Allocator#deallocate completely ignore the allocator but simply call the basic operators new and delete. Although in principle every kind of object can be used as allocator, typically the object that stores the pointer to the allocated memory is used as allocator. For example, if memory is allocated for an AllocString Alloc String, this string itself acts as allocator. A memory block should be deallocated using the same allocator object as it was allocated for. The following allocators are available in SeqAn and support the Allocator#clear function. This function deallocates at once all memory blocks that were previously allocated.

SimpleAllocator Simple Allocator
   General purpose allocator.
SinglePoolAllocator Single Pool Allocator
   Allocator that pools memory blocks of specific size. Blocks of different sizes are not pooled.
MultiPoolAllocator Multi Pool Allocator
   Allocator that pools memory blocks. Only blocks up to a certain size are pooled. The user can specify the size limit in a template argument.

The function Allocator#allocate has an optional argument to specify the intended allocator usage for the requested memory. The user can thereby specialize Allocator#allocate for different allocator applications. For example, the tag AllocatorUsageTags#TagAllocateTemp specifies that the memory will only be used temporarily, whereas AllocatorUsageTags#TagAllocateStorage indicates that the memory will be used in the long run for storing values of a container.

SeqAn also offers more complex allocators which support the function Allocator#clear. The library predefines some allocator specializations for different uses (see above). Most of these allocators are pool allocators. A pool allocator implements its own memory management. It reserves storage for multiple memory blocks at a time and recycles deallocated blocks. This reduces the number of expensive new and delete calls and speeds up the allocation and deallocation.

Assignment 2

Type

Application

Objective

Write a program which compares the runtimes of the SimpleAllocator Simple Allocator and the MultiPoolAllocator Multi Pool Allocator for pool sizes (10,100,1000) for allocating and deallocating memory.

Hint

For timing the allocation you can use sysTime.

Solution

We start in this assignment by including the basic.h SeqAn header and defining two different allocators, one MultiPoolAllocator Multi Pool Allocator and one SimpleAllocator Simple Allocator.

demos/tutorial/basics/allocator.cpp

Given these fixed allocators we allocate now various size blocks, namely of size 10, 100, and 1000. We repeat the allocation a number of times and then clear the allocated memory. For each of the block sizes we output the system time needed to allocate and clear the memory.

demos/tutorial/basics/allocator.cpp

Running this program results in the following output which shows the advantage of the MultiPoolAllocator Multi Pool Allocator:

demos/tutorial/basics/allocator.cpp.stdout