Skip to content

er1009/Parallel-implementation-of-Sequence-Alignment-MPI-OpenMP-CUDA

Repository files navigation

Parallel-implementation-of-Sequence-Alignment-MPI-OpenMP-CUDA

Sequence Alignment – a way to estimate a similarity of two strings of letters - is an important field in bioinformatics . Sequence is a string of capital letters including hyphen sign (-), for example

image

Each letter in the sequence represents DNA, RNA, or protein. Identification of region of similarity of set of Sequences is extremely time consuming.

Alignment Score Definition with pair-wise comparison

  1. Similarity of two sequences Seq1 and Seq2 of equal length is defined as follows:

• Two Sequences are places one under another:

image

• Each letter from Seq1 is compared with the correspondent letter from Seq2. If these letters are identical the pair is marked with Star sign (*).

• Otherwise the additional check is provided. The letters are checked if they both present at least in one of 9 groups called Conservative Groups:

image

In case that the pair is found in one of Conservative Group it is marked with Colon sign (:). For example, the pair (E, K) is marked with sign : because they both were found in group NEQK

• If no Conservative Group is found, the pair is checked against 15 Semi-Conservative Groups

image

If the pair do presents in one of Semi-Conservative Groups, it is marked with Point sign (.). For example, the pair (K,S) is marked with sign . because they both were found in group STNK

• If the letters in the pair are not equal, do not present both not in Conservative nor in Semi-Conservative groups – the pair is marked with Space sign (‘ ‘).

  At the end of the check process the whole Sequence of Signs is obtained. This Sequence is used to estimate the similarity of two sequences – Seq1 and Seq2. For this project following formula is used to estimate the Alignment Score: S = W1NumberOfStars - W2NumberOfColons – W3NumberOfPoints - W4NumberOfSpaces where Wi are the given weight coefficients.

  1. Similarity of two sequences Seq1 and Seq2 in case that Seq2 is shorter than Seq1, is defined as follows:

• The Sequence Seq2 is places under the Sequence Seq1 with offset n from the start of the Sequence Seq1. The Sequence Seq2 do not allowed to pass behind the end of Seq1. • The letters from Seq1 that do not have a corresponding letter from Seq2 are ignored. • The Alignment Score is calculated according the pair-wise procedure described above.

For example, Sequence Seq2 is placed at different offsets under Seq1: Score = -27 Offset n = 5 image

Score = 17 Offset n = 15 image

Mutant Sequence Definition

For a given Sequence S we define a Mutant Sequence MS(n) which is received by substitution of one or more letter by other letter defined by Substitution Rule. We will define the Substitution Rules as follows:

  1. The original letter is allowed to be substituted by another letter if there is no Conservative Group that contains both letters. For example, • N is not allowed to be substituted by H because both letters present in Conservative Group NHQK • N may be substituted by W because there is now Conservative Group that contains both N and W
  2. It is not mandatory to substitute all instances of some letter by same substitution letter, for example the sequence PSHLSPSQ has Mutant Sequence PFHLSPLQ

in this repository we will perform the following task - For two given sequences Seq1 and Seq2, find a mutant of Seq2 and its offset that produce a maximum / minimum Alignment Score.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages