# Building a regular expression engine in Python

### Introduction

The goal of this tutorial is to build out a regular expression engine from scratch. Our *MiniRegex* class will allow the client to 

### Basic Architecture: 
Our regex software will have three seperate parts: A recursive descent parser that takes a string and returns an 

A simple regular expression engine that parses rexeps, converts them into an
NFA(non-deterministic finite automata) and runs it on input strings to find
matches. 

Patterns supported: 
ab -- union
a|b -- or
a* -- Kleene star
. -- (Metachar)
(ab (c|d)) | e -- nested expressions

In [1]:
### Lets start with the interface for the MiniRegex. This will be the main interface that the client interacts with

class MiniRegex:
    def _build_nfa(self, pattern_str):
        tokenizer = Tokenizer(pattern_str)
        parser = RegexParser(tokenizer)
        return parser.construct_nfa()

    def find_match_at(self, search_space):
        """ Returns True iff there is a match starting at the first char of the
        search_space argument """
        runner = DFASimulator(self._nfa)
        for c in search_space:
            if runner.advance_multi_state(c):
                return True
            if not runner.is_active():
                return False
        return False

    def find_all_matches(self, search_space):
        runner = DFASimulator(self._nfa)
        matches = {}  # start_index: end_index
        for c in search_space:
            match = runner.advance_multi_state(c)
            if match:
                start_idx, end_idx = match
                matches[start_idx] = end_idx
        return [(start, end) for start, end in matches.items()]

    # def is_match(self, search_space):
    #     search_space argument """
    #     runner = DFASimulator(self._nfa)
    #     for c in search_space:
    #         if runner.advance_state(c):
    #             return True
    #         if not runner.is_active():
    #             return False
    #     return False

    def first_match(self, search_space):
        for i in range(len(search_space)):
            if self.is_match(search_space[i:]):
                return i
        return None



Hello
