# DNA Sequences Matching Analysis

In this project, I will implement **strand-aware naive exact matching algorithm**, coded as function `naive_with_rc` in module `dnautil`. That is, instead of looking only for occurrences of read (p) in genome (t), additionally look for occurrences of the reverse complement of p in t. For example, if p is ACT, we should find occurrences of both ACT and its reverse complement AGT in t. If p and its reverse complement are identical (e.g. AACGTT), then a given match offset should be reported only once. 

In [2]:
import dnautil as dna

First of all, we parse the lambda virus genome.

In [5]:
genome = dna.readGenome('lambda_virus.fa') 

### 1. How many times does **AGGT** or its reverse complement (**ACCT**) occur in the lambda virus genome?

In [7]:
occurrences = dna.naive_with_rc('AGGT', genome)

In [8]:
print("AGGT or its reverse complement occurs %d times in the lambda virus genome." % len(occurrences))

AGGT or its reverse complement occurs 306 times in the lambda virus genome.


## 2. How many times does TTAA or its reverse complement occur in the lambda virus genome? 

In [10]:
p = 'TTAA'
occurrences = dna.naive_with_rc(p, genome)

In [12]:
print("%s or its reverse complement occurs %d times in the lambda virus genome." % (p, len(occurrences)) )

TTAA or its reverse complement occurs 195 times in the lambda virus genome.


Note that TTAA and its reverse complement are equal, so we should not double count.

## 3. What is the offset of the leftmost occurrence of ACTAAGT or its reverse complement in the Lambda virus genome?

In [13]:
p = 'ACTAAGT'
occurrences = dna.naive_with_rc(p, genome)

In [14]:
print('offset of leftmost occurrence of %s: %d' % (p, min(occurrences)))

offset of leftmost occurrence of ACTAAGT: 26028


## 4. What is the offset of the leftmost occurrence of AGTCGA or its reverse complement in the Lambda virus genome?

In [16]:
p = 'AGTCGA'
occurrences = dna.naive_with_rc(p, genome)

In [17]:
print('offset of leftmost occurrence of %s: %d' % (p, min(occurrences)))

offset of leftmost occurrence of AGTCGA: 450
