# Week 2

What is not examined in this notebook is the Boyer-Moore algorithm, which is explored as stand-alone, buildable executables from C and C++. It is neat to see that Boyer-Moore is built into C++17 by default and the C implementation is not too hard either.

What will be used in this notebook is an implementation from the instructors of the course. The nice feature of the module is that we can run its unit tests ourselves!

In [73]:
from src.bm_preproc import BoyerMoore

In [None]:
# This is a part of the assignment. The BoyerMoore object preprocesses the pattern

def boyer_moore(
    pat : str,
    p_bm : BoyerMoore,
    tex : str
):
    """Run a pattern search using the Boyer-Moore algorithm

    Parameters
    ----------
    pat : str
        Pattern
    p_bm : BoyerMoore
        Preprocessor for the pattern
    tex : str
        Text to search

    Returns
    -------
    list[int]
        Verified occurrences of the pattern
    """
    index_i = 0
    occurrences: list[int] = []
    while index_i < len(tex) - len(pat) + 1:
        shift = 1
        mismatched = False
        for index_j in range(len(pat) - 1, -1, -1):
            if pat[index_j] != tex[index_i + index_j]:
                skip_bc = p_bm.bad_character_rule(index_j, tex[index_i + index_j])
                skip_gs = p_bm.good_suffix_rule(index_j)
                shift = max(shift, skip_bc, skip_gs)
                mismatched = True
                break
        if not mismatched:
            occurrences.append(index_i)
            skip_gs = p_bm.match_skip()
            shift = max(shift, skip_gs)
        index_i += shift
    return occurrences

In [81]:
!pytest src/bm_preproc.py::TestBoyerMoorePreproc -v

platform darwin -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0 -- /Users/mhogan/Documents/algorithms-genomic-sequencing/.venv/bin/python
cachedir: .pytest_cache
rootdir: /Users/mhogan/Documents/algorithms-genomic-sequencing
plugins: anyio-3.6.2
collected 12 items                                                             [0m

src/bm_preproc.py::TestBoyerMoorePreproc::test_big_l_prime_1 [32mPASSED[0m[32m      [  8%][0m
src/bm_preproc.py::TestBoyerMoorePreproc::test_big_l_prime_2 [32mPASSED[0m[32m      [ 16%][0m
src/bm_preproc.py::TestBoyerMoorePreproc::test_good_suffix_match_mismatch_1 [32mPASSED[0m[32m [ 25%][0m
src/bm_preproc.py::TestBoyerMoorePreproc::test_good_suffix_table_1 [32mPASSED[0m[32m [ 33%][0m
src/bm_preproc.py::TestBoyerMoorePreproc::test_good_suffix_table_2 [32mPASSED[0m[32m [ 41%][0m
src/bm_preproc.py::TestBoyerMoorePreproc::test_n_1 [32mPASSED[0m[32m                [ 50%][0m
src/bm_preproc.py::TestBoyerMoorePreproc::test_n_2 [32mPASS

Neat! All the unit tests passed meaning we do not have to worry about Python2 or Python3 version differences.