# Finite Automata pattern searching algorithm

In finite automata algorithm we want to create a method that finds all occurences of a pattern in text in O(n) time.

Do dothat we use a delta function - a function that defines a state machine for given pattern.

The complexity of delta function creation is O(m\*m\*s) where m is the pattern length and s is the alphabet size

In [25]:
# get delta dict with state change mapping
def get_delta(pattern):
    result = [{}] + [{} for _ in pattern]

    unique_letters = set(pattern)

    # create state mapping for each of the states
    # Each state corresponds to a pattern match
    # We are in the state N only if the last N letters of currently read text T,
    # match the first N letters of the pattern P
    for i, state_map in enumerate(result):
        current_match = pattern[:i]

        # a state change needs to be defined for every letter
        for letter in unique_letters:

            # Iterate backwards over prefixes of already matched pattern.
            # We're searching for the best (longest) prefix of the pattern,
            # that matches the `current_match + letter` string
            for k in range(len(current_match) + 1, -1, -1):
                if pattern[:k] == (current_match + letter)[-k:]:
                    break

            state_map[letter] = k

    return result

We can finally define the finite automata function for matching pattern in text.

In [26]:
def finite_automata(pattern, text):
    pattern_length = len(pattern)

    delta = get_delta(pattern)
    current_state = 0

    result = []

    # iterate over all letters in text
    for i, letter in enumerate(text):

        # get state change from delta function or go back to state 0 if no good match can be found
        current_state = delta[current_state].get(letter, 0)

        # if we found the pattern
        if current_state == pattern_length:
            result.append(i - current_state + 1)

    return result

In [27]:
test_cases = [
    (("abc", "xabcyabcabc"), [1, 5, 8]),
    (("hello", "hello world, hello again!"), [0, 13]),
    (("aa", "aaaaa"), [0, 1, 2, 3]),
    (("xyz", "abcdefg"), []),
    (("m", "mommy mammal"), [0, 2, 3, 6, 8, 9]),
]

for (pattern, text), result in test_cases:
    print(finite_automata(pattern, text))
    print(result)

[1, 5, 8]
[1, 5, 8]
[0, 13]
[0, 13]
[0, 1, 2, 3]
[0, 1, 2, 3]
[]
[]
[0, 2, 3, 6, 8, 9]
[0, 2, 3, 6, 8, 9]
