# Matching Comparisons

Implementing versions of the naive exact matching and Boyer-Moore algorithms that also 
count and return 

-  character comparisons performed

-  alignments tried

We will consider these measures as approximately how efficient the algorithms are.

---
### Boyer Moore with Pattern Preprocessing and Counts

In [1]:
from PyScripts.bm_preproc import BoyerMoore

from PyScripts.bm_with_counts import boyer_moore_with_counts

from PyScripts.geneReader import geneReader

In [2]:
filename = 'SeqFiles/chr1.GRCh38.excerpt.fasta'

data = open ( filename, 'r' )

reads = geneReader ( filename )

data.close ()

In [3]:
t = reads

p = 'GGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGG'

uppercase_alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

p_bm = BoyerMoore ( p, uppercase_alphabet )

print ( 'Boyer Moore occurrences:', boyer_moore_with_counts ( p, p_bm, t ) [ 0 ] )

print ( 'Boyer Moore alignments:', boyer_moore_with_counts ( p, p_bm, t ) [ 1 ] )

print ( 'Boyer Moore character comparisons:', boyer_moore_with_counts ( p, p_bm, t ) [ 2 ] )

Boyer Moore occurrences: [56922]
Boyer Moore alignments: 127974
Boyer Moore character comparisons: 165191


---
### Naive Exact Matching Counts

In [4]:
from PyScripts.naive_with_counts import naive_with_counts

from PyScripts.geneReader import geneReader

In [5]:
filename = 'SeqFiles/chr1.GRCh38.excerpt.fasta'

data = open ( filename, 'r' )

reads = geneReader ( filename )

data.close ()

In [6]:
t = reads

p = 'GGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGG'

print ( 'Naive exact matching occurrences:', naive_with_counts ( p, t ) [ 0 ] )

print ( 'Naive exact matching alignments:', naive_with_counts ( p, t ) [ 1 ] )

print ( 'Naive exact matching character comparisons:', naive_with_counts ( p, t ) [ 2 ] )

Naive exact matching occurrences: [56922]
Naive exact matching alignments: 799954
Naive exact matching character comparisons: 984143


---
### Naive Allowing Up to Two Mismatches

In [7]:
from PyScripts.naive_mismatches import naive_mismatches

from PyScripts.geneReader import geneReader

In [8]:
filename = 'SeqFiles/chr1.GRCh38.excerpt.fasta'

data = open ( filename, 'r' )

reads = geneReader ( filename )

data.close ()

In [9]:
p = 'GGCGCGGTGGCTCACGCCTGTAAT'

t = reads
    
print ( 'Occurences using naive matching with up to 2 mismatches:', len ( naive_mismatches ( p, t ) ) )

Occurences using naive matching with up to 2 mismatches: 19


---

### Approximate Matching an Indexed Object, Allowing 2 Mismatches

In [10]:
from PyScripts.kmer_index import Index

from PyScripts.approximate_match_idx import approximate_match_idx

from PyScripts.geneReader import geneReader

In [11]:
filename = 'SeqFiles/chr1.GRCh38.excerpt.fasta'

data = open ( filename, 'r' )

reads = geneReader ( filename )

data.close ()

In [12]:
p = 'GGCGCGGTGGCTCACGCCTGTAAT'

t = reads

print ( 'Approximate Matches of an Indexed Object:', approximate_match_idx ( p, t, 2 ) )

Approximate Matches of an Indexed Object: 90


---
### Approximate Matching with a Boyer Moore Object, Up to 2 Mismatches

In [13]:
from PyScripts.approximate_match import approximate_match

from PyScripts.geneReader import geneReader

In [14]:
filename = 'SeqFiles/chr1.GRCh38.excerpt.fasta'

data = open ( filename, 'r' )

reads = geneReader ( filename )

data.close ()

In [15]:
p = 'GGCGCGGTGGCTCACGCCTGTAAT'

t = reads

print ( 'Matches using Approximate Matching with Boyer Moore Object:', len ( approximate_match ( p, t, 2 ) ) )

Matches using Approximate Matching with Boyer Moore Object: 61


---

### Occurences Using a SubseqIndex Object, Up to 2 Mismatches

In [16]:
from PyScripts.subSequenceIdx import SubSeqIndex

from PyScripts.approximate_match_subseq import approximate_match_subseq

from PyScripts.geneReader import geneReader

In [17]:
filename = 'SeqFiles/chr1.GRCh38.excerpt.fasta'

data = open ( filename, 'r' )

reads = geneReader ( filename )

data.close ()

In [18]:
p = 'GGCGCGGTGGCTCACGCCTGTAAT'

t = reads

n = 2

ival = 3

print ( 'Matches with an indexed object using subsequences:', approximate_match_subseq ( p, t, n, ival ) [ 1 ] )

Matches with an indexed object using subsequences: 79
