# Building a RegEx engine by using Finite Automata based approach

# Finite Automata & Regular Expressions are just **Pattern Matching**      
Regular expressions *describe* patterns in text.  
Finite automata (DFAs and NFAs) are abstract machines that can *recognize* those patterns.       
A finite automaton is, at its core, a graph. It has nodes (**states**) and directed edges (**transitions**).      
Our goal is to write a Python program that can "walk" this graph based on an input string.

We go:
**Regular Expression** `(a|b)*c` → **Thompson's Algorithm** → **NFA** (Nondeterministic Finite Automaton) → **Subset Construction** → **DFA** (Deterministic Finite Automaton)

## DFA & NFA - [**Finite State Machines**](https://en.wikipedia.org/wiki/Finite-state_machine)           
           
[Finite automata](https://www.geeksforgeeks.org/theory-of-computation/introduction-of-finite-automata/) are simple "machines".             
They have a finite number of **states** and move between them based on input symbols.              
Kind of like a Monopoly or snakes & ladders or similar board game, where each space is a state and the input tells you which path to take.             
           
[Here's a course I've enjoyed](https://www.youtube.com/playlist?list=PL7HjUNIdk93ThXvz2Oa_g30Jt3Owwm4HZ), hopefully more folks do too.           
         

  
### [Deterministic Finite Automaton (DFA)](https://en.wikipedia.org/wiki/Deterministic_finite_automaton)           
           
A DFA is strict, predictable.   
For any given state and any given input symbol, there is **exactly one** state it can move to.    
No ambiguity.           

                      
           

1.  **One Path:** For any input string, there's only one possible path through the machine.           
2.  **No $\epsilon$-transitions:** The machine can't change state without consuming an input symbol (no "free" moves).           
3.  **Every state must have a transition for every symbol** in the alphabet.           
           

A DFA has 5 parts - States, Alphabet, Transitions, Start State, Accept States:            
           

  * $Q$: A finite set of *states*.           
  * $\Sigma$: A finite set of input *symbols* (aka the alphabet).           
  * $\delta$: The transition *function*, $\delta: Q \times \Sigma \to Q$. (Given a state and a symbol, it returns *one* next state).           
  * $q_0$: The initial start state.           
  * $F$: A set of *final* or "*accept*" states.           

*(This Greek thing is what's formally used, IMO its just dissonance, takes longer for our brains to grok the main idea)*

**Example: A DFA that accepts strings ending in 'a'**           
           
  * $Q$ - The set of **states:** {$State_0$, $State_1$}           
  * $q_0$ - **Start State:** $State_0$           
  * **Accept State:** {$State_1$}           
  * **Transitions:**           
      * From $State_0$, on 'a' go to $State_1$.           
      * From $State_0$, on 'b' go to $State_0$.           
      * From $State_1$, on 'a' go to $State_1$.           
      * From $State_1$, on 'b' go to $State_0$.          
           

tracing the execution of the state machine for the input string "**bab**a":           
           
1.  Start at $State_0$. Input 'b' -> Stay at $State_0$.           
2.  At $State_0$. Input 'a' -> Move to $State_1$.           
3.  At $State_1$. Input 'b' -> Move to $State_0$.           
4.  At $State_0$. Input 'a' -> Move to $State_1$.           
       
...end of string. We are in S1, which is an accept state. **The string is accepted!**            
          

#### The *trick* for a DFA is: from any given state, for any given input symbol, there is **exactly one** place to go next.     
Another heuristic is to draw the machine as a graph, where as many arrows exit the node as enter.   
Also, every symbol in the alphabet is addressed in each node.

#### DFA State Machine **Execution Engine**  

`Engine` is a big word for a simple function.   
It expects a dictionary of the following format and cycles through any input string to `execute`...   
```Python
dfa = {
    "states":{},         # dict of states - "state_name"
    "alphabet":{},       # dict of alphabets - "alphabet"
    "transitions":{},    # dict of transitions - {"state_name": {"input_alphabet":"next_state", "input_alphabet":"next_state"...}, ...}
    "start_state":"",    # string - the "state_name" of the start state
    "accept_states":{}   # dict of accept states
}
```

In [1]:
# RUN DA DFA - a method to run the DFA and provide a trace of how the machine processed things
# It expects a DFA with states, alphabet, transitions, start_state, accept_states
def run_da_dfa(machine, input_string):

    current_state = machine["start_state"]
    print(f"Start: state = {current_state}")

    # Process the string one symbol at a time.
    for symbol in input_string:
        # Check if the symbol is valid for this machine.
        if symbol not in machine["alphabet"]:
            print(f"  Error: Symbol '{symbol}' not in alphabet. Rejecting.")
            return False

        # Look up the next state from the transitions table.
        next_state = machine["transitions"][current_state][symbol]
        print(f"  Input '{symbol}': {current_state} -> {next_state}")
        current_state = next_state

    # After the loop, check if the final state is an accept state.
    is_accepted = current_state in machine["accept_states"]
    print(f"End: final state = {current_state}. Accepted: {is_accepted}\n")
    return is_accepted

#### Example 01:  **strings that contain an even number of 'a's**.   
The alphabet is {'a', 'b'}.

  * `"aba"` -\> accepted (2 'a's)
  * `"b"` -\> accepted (0 'a's)
  * `"a"` -\> rejected (1 'a')
  * `"aaab"` -\> rejected (3 'a's)

We build a machine with two states:
  
1.  `q0`: even number of 'a's seen in the string so far. This is our **start state**. Since 0 is even, it is also an **accept state**.
2.  `q1`: encountered an odd number of 'a's so far.
   

represent this entire machine as a Python dictionary.   
This is just a data structure - there's no complex logic in it yet.

```mermaid
stateDiagram-v2
    direction LR
    [*] --> q0
    q0 --> q1: a
    q0 --> q0: b
    q1 --> q0: a
    q1 --> q1: b
    q0: q0 (both, start state and accept state)
```

In [2]:
# DFA = {states, alphabets, transitions, start_state, accept_states}
even_a_machine = {
    # A set of all states for this machine.
    "states": {"q0", "q1"},
    # The set of allowed input symbols.
    "alphabet": {"a", "b"},
    # The transition function, mapping (state, symbol) to a next state.
    # For a DFA, the next state is a single state.
    "transitions": {
        "q0": {"a": "q1", "b": "q0"},
        "q1": {"a": "q0", "b": "q1"},
    },
    # The single state where the machine begins.
    "start_state": "q0",
    # A set of states that mean "accept" if the machine finishes in one of them.
    "accept_states": {"q0"},
}

In [3]:
print("Testing '':")
run_da_dfa(even_a_machine, "")     # Expected: True (0 is even)

Testing '':
Start: state = q0
End: final state = q0. Accepted: True



True

In [4]:
print("Testing 'aba':")
run_da_dfa(even_a_machine, "aba")  # Expected: True

Testing 'aba':
Start: state = q0
  Input 'a': q0 -> q1
  Input 'b': q1 -> q1
  Input 'a': q1 -> q0
End: final state = q0. Accepted: True



True

In [5]:
print("Testing 'a':")
run_da_dfa(even_a_machine, "a")    # Expected: False

Testing 'a':
Start: state = q0
  Input 'a': q0 -> q1
End: final state = q1. Accepted: False



False

In [6]:
print("Testing 'bbab':")
run_da_dfa(even_a_machine, "bbab") # Expected: False

Testing 'bbab':
Start: state = q0
  Input 'b': q0 -> q0
  Input 'b': q0 -> q0
  Input 'a': q0 -> q1
  Input 'b': q1 -> q1
End: final state = q1. Accepted: False



False

**Data and Logic**     
The `even_a_machine` dictionary is pure data.       
The `run_da_dfa` function is pure logic.      
This is a fundamental concept.        
The same `run_dfa` function could run *any* DFA we define, as long as it follows the same dictionary structure.                

**State**    
The `current_state` variable is the machine's entire memory. It's a very limited memory, which is why it's a "finite" automaton. It only knows which state it's in; it doesn't remember the path it took to get there.                

**Determinism**      
Look at the `transitions` dictionary.     
For every state (`q0`, `q1`) and every symbol (`a`, `b`), there is **one and only one** resulting state specified.     
This certainty is what makes it deterministic.                 

#### Example 02: **accepts strings ending in '1'**    

For the binary alphabet - "0" and "1", we need  
a machine that *accepts* binary strings like "001", "11", and "1" but *rejects* "10", "00", and ""   



  * `q0`: The initial state. We are in this state if we haven't seen a '1' yet, or the last symbol seen was a '0'. This is a **reject** state.                
  * `q1`: The state we are in if the most recent symbol was a '1'. This is an **accept** state.                

```mermaid
stateDiagram-v2
    direction LR
    [*] --> q0
    q0 --> q0: 0
    q0 --> q1: 1
    q1 --> q0: 0
    q1 --> q1: 1
    q1: q1 (accept state)
    q0: q0 (reject state)
```

In [7]:
dfa_ends_with_1 = {                
    "states": {"q0", "q1"},                
    "alphabet": {"0", "1"},                
    "transitions": {                
        # From state q0: if we read a '0', we stay in q0. If '1', we go to q1.                
        "q0": {"0": "q0", "1": "q1"},                
        # From state q1: if we read a '0', we go back to q0. If '1', we stay in q1.                
        "q1": {"0": "q0", "1": "q1"},                
    },                
    "start_state": "q0",                
    "accept_states": {"q1"},                
}                

In [8]:
print("Testing '101':")
run_da_dfa(dfa_ends_with_1, "101")                

Testing '101':
Start: state = q0
  Input '1': q0 -> q1
  Input '0': q1 -> q0
  Input '1': q0 -> q1
End: final state = q1. Accepted: True



True

In [9]:
print("Testing '100':")
run_da_dfa(dfa_ends_with_1, "100")

Testing '100':
Start: state = q0
  Input '1': q0 -> q1
  Input '0': q1 -> q0
  Input '0': q0 -> q0
End: final state = q0. Accepted: False



False

#### Example 03: strings that **start with "ab"**  
The alphabet is {'a', 'b'}.

```mermaid
stateDiagram-v2
    direction LR
    [*] --> q0 : Start
    q0 --> q1: a
    q0 --> q_dead: b
    q1 --> q2: b
    q1 --> q_dead: a
    q2 --> q2: a
    q2 --> q2: b
    q_dead --> q_dead: a
    q_dead --> q_dead: b
    q2: Accept
    q_dead: Reject
```

In [10]:
dfa_starts_with_ab = {                
    "states": {"q0", "q1", "q2", "q_dead"},                
    "alphabet": {"a", "b"},                
    "start_state": "q0",                
    "accept_states": {"q2"},                
    "transitions": {                
        "q0": {"a": "q1", "b": "q_dead"},         # Must start with 'a' - if 'a' is found at start, go to 'q1', where we check for 'b'                
        "q1": {"a": "q_dead", "b": "q2"},         # Then must have 'b' - if 'b' is also found, go to 'q2' which is our 'accept_state'...                
        "q2": {"a": "q2", "b": "q2"},             # It started with "ab", so accept                
        "q_dead": {"a": "q_dead", "b": "q_dead"}, # Any other path fails                
    },                
}

In [11]:
run_da_dfa(dfa_starts_with_ab, "abbab") # Should accept

Start: state = q0
  Input 'a': q0 -> q1
  Input 'b': q1 -> q2
  Input 'b': q2 -> q2
  Input 'a': q2 -> q2
  Input 'b': q2 -> q2
End: final state = q2. Accepted: True



True

In [12]:
run_da_dfa(dfa_starts_with_ab, "aab")   # Should reject

Start: state = q0
  Input 'a': q0 -> q1
  Input 'a': q1 -> q_dead
  Input 'b': q_dead -> q_dead
End: final state = q_dead. Accepted: False



False

#### Example 04: strings of **exactly length 2**     
    The alphabet is {'a', 'b'}.

```mermaid
stateDiagram-v2
    direction LR
    [*] --> q0: Start
    q0 --> q1: a
    q0 --> q1: b
    q1 --> q2: a
    q1 --> q2: b
    q2 --> q_dead: a
    q2 --> q_dead: b
    q_dead --> q_dead: a
    q_dead --> q_dead: b
    q2: Accept
    q_dead: Reject
```

In [13]:
dfa_len_2 = {                
    "states": {"q0", "q1", "q2", "q_dead"},                
    "alphabet": {"a", "b"},                
    "start_state": "q0",                
    "accept_states": {"q2"},                
    "transitions": {                
        "q0": {"a": "q1", "b": "q1"},           # First character                
        "q1": {"a": "q2", "b": "q2"},           # Second character                
        "q2": {"a": "q_dead", "b": "q_dead"},   # Third character (too long)                
        "q_dead": {"a": "q_dead", "b": "q_dead"},                
    },                
}

In [14]:
run_da_dfa(dfa_len_2, "ab") # Should accept

Start: state = q0
  Input 'a': q0 -> q1
  Input 'b': q1 -> q2
End: final state = q2. Accepted: True



True

In [15]:
run_da_dfa(dfa_len_2, "a")  # Should reject

Start: state = q0
  Input 'a': q0 -> q1
End: final state = q1. Accepted: False



False

#### Example 05: strings with an **odd number of '1's**  

The alphabet is {'1', '0'}.

```mermaid
stateDiagram-v2
    direction LR
    [*] --> q_even: Start
    q_even --> q_even: 0
    q_even --> q_odd: 1
    q_odd --> q_odd: 0
    q_odd --> q_even: 1
    q_odd: Accept
```

In [16]:
dfa_odd_ones = {                
    "states": {"q_even", "q_odd"},                
    "alphabet": {"0", "1"},                
    "start_state": "q_even",                
    "accept_states": {"q_odd"},                
    "transitions": {                
        "q_even": {"0": "q_even", "1": "q_odd"},                
        "q_odd": {"0": "q_odd", "1": "q_even"},                
    },                
}

In [17]:
run_da_dfa(dfa_odd_ones, "01101") # Should accept

Start: state = q_even
  Input '0': q_even -> q_even
  Input '1': q_even -> q_odd
  Input '1': q_odd -> q_even
  Input '0': q_even -> q_even
  Input '1': q_even -> q_odd
End: final state = q_odd. Accepted: True



True

In [18]:
run_da_dfa(dfa_odd_ones, "11")    # Should reject

Start: state = q_even
  Input '1': q_even -> q_odd
  Input '1': q_odd -> q_even
End: final state = q_even. Accepted: False



False

#### Example 06: accepting binary numbers that are a **multiple of 3**     
   
This is both a bad example and a clever one.    
Relies on how multiples of 3 look like in binary.

```mermaid
stateDiagram-v2
    direction LR
    [*] --> q0: Start
    q0 --> q0: 0
    q0 --> q1: 1
    q1 --> q2: 0
    q1 --> q0: 1
    q2 --> q1: 0
    q2 --> q2: 1
    q0: Accept
```

In [19]:
dfa_multiple_of_3 = {                
    "states": {"q0", "q1", "q2"},                
    "alphabet": {"0", "1"},                
    "start_state": "q0", # Remainder 0                
    "accept_states": {"q0"},                
    "transitions": {                
        # If current value is `x`, reading '0' makes it `2x`. Reading '1' makes it `2x+1`.                
        # All calculations are modulo 3.                
        "q0": {"0": "q0", "1": "q1"}, # (2*0)%3=0, (2*0+1)%3=1                
        "q1": {"0": "q2", "1": "q0"}, # (2*1)%3=2, (2*1+1)%3=0                
        "q2": {"0": "q1", "1": "q2"}, # (2*2)%3=1, (2*2+1)%3=2                
    },                
}

In [20]:
run_da_dfa(dfa_multiple_of_3, "110") # 6 in binary, should accept

Start: state = q0
  Input '1': q0 -> q1
  Input '1': q1 -> q0
  Input '0': q0 -> q0
End: final state = q0. Accepted: True



True

In [21]:
run_da_dfa(dfa_multiple_of_3, "101") # 5 in binary, should reject

Start: state = q0
  Input '1': q0 -> q1
  Input '0': q1 -> q2
  Input '1': q2 -> q2
End: final state = q2. Accepted: False



False

#### Example 07: strings with **no consecutive 'a's**    

```mermaid
stateDiagram-v2
    direction LR
    [*] --> q_start: Start
    q_start --> q_saw_a: a
    q_start --> q_start: b
    q_saw_a --> q_dead: a
    q_saw_a --> q_start: b
    q_dead --> q_dead: a
    q_dead --> q_dead: b
    q_start: Accept
    q_saw_a: Accept
    q_dead: Reject

```

In [22]:
dfa_no_consecutive_as = {                
    "states": {"q_start", "q_saw_a", "q_dead"},                
    "alphabet": {"a", "b"},                
    "start_state": "q_start",                
    "accept_states": {"q_start", "q_saw_a"},                
    "transitions": {                
        "q_start": {"a": "q_saw_a", "b": "q_start"},                
        "q_saw_a": {"a": "q_dead", "b": "q_start"},                
        "q_dead": {"a": "q_dead", "b": "q_dead"},                
    },                
}

In [23]:
run_da_dfa(dfa_no_consecutive_as, "ababa") # Should accept

Start: state = q_start
  Input 'a': q_start -> q_saw_a
  Input 'b': q_saw_a -> q_start
  Input 'a': q_start -> q_saw_a
  Input 'b': q_saw_a -> q_start
  Input 'a': q_start -> q_saw_a
End: final state = q_saw_a. Accepted: True



True

In [24]:
run_da_dfa(dfa_no_consecutive_as, "bababa") # Should accept

Start: state = q_start
  Input 'b': q_start -> q_start
  Input 'a': q_start -> q_saw_a
  Input 'b': q_saw_a -> q_start
  Input 'a': q_start -> q_saw_a
  Input 'b': q_saw_a -> q_start
  Input 'a': q_start -> q_saw_a
End: final state = q_saw_a. Accepted: True



True

In [25]:
run_da_dfa(dfa_no_consecutive_as, "aab")  # Should reject

Start: state = q_start
  Input 'a': q_start -> q_saw_a
  Input 'a': q_saw_a -> q_dead
  Input 'b': q_dead -> q_dead
End: final state = q_dead. Accepted: False



False

In [26]:
run_da_dfa(dfa_no_consecutive_as, "baab")  # Should reject

Start: state = q_start
  Input 'b': q_start -> q_start
  Input 'a': q_start -> q_saw_a
  Input 'a': q_saw_a -> q_dead
  Input 'b': q_dead -> q_dead
End: final state = q_dead. Accepted: False



False

### [Nondeterministic Finite Automaton (NFA)](https://en.wikipedia.org/wiki/Nondeterministic_finite_automaton)           
           
An NFA is flexible.    
It can have **multiple possible moves** from a single state on the same input symbol.  
           

Two ways in which we break the DFA rules:  
1.  **Multiple Paths:** From a state, an input symbol can lead to zero, one, or multiple states.           
2.  **$\epsilon$-transitions (Epsilon-moves):** Fancy way of saying an NFA can change its state *without* consuming an input symbol.  
        * This is a "free" move, compare that with DFA - where we can only move to another state (or stay in the same one) based on the next input symbol.           
       

The NFA accepts an input string if **at least one** of the possible paths ends in an accept state.           
           

*
##### *Formally*
Look, this is just me trying to look smart, ignore this bit please, I've got to find a better way to express this stuff...  
  
The formal definition is almost the same, but the transition function is different:           
           
  * $\delta$: The transition function, $\delta: Q \times (\Sigma \cup {\epsilon}) \to \mathcal{P}(Q)$.
      * (Given a state and a symbol (either symbol or $\epsilon$), it returns a *set* of possible next states).
      * $\mathcal{P}(Q)$ is the power set of $Q$.           
*           

NFAs are often much simpler and smaller than their equivalent DFAs.    
I think of these as 'higher order' than DFAs, so you can write in NFA, let it compile to a DFA etc.  
  

NFAs easy to build from regular expressions, which is where Thompson's Algorithm comes in.           
 

#### The *trick* for a NFA is: an NFA accepts a string if **any one** of the possible paths it could take ends in an accept state.

#### NFA State Machine **Execution Engine**  

Like the DFA one, `Engine` here is also big word for a simple function.   
To review, our *engine* expects a dictionary of the following format and cycles through any input string to `execute`...   
```Python
nfa = {
    "states":{},         # dict of states - "state_name"
    "alphabet":{},       # dict of alphabets - "alphabet"
    "transitions":{},    # dict of transitions - {"state_name": {"input_alphabet":"next_state", "input_alphabet":"next_state"...}, ...}
    "start_state":"",    # string - the "state_name" of the start state
    "accept_states":{}   # dict of accept states
}
```

In [27]:
# The engine must now track a SET of current states.
def run_da_nfa(machine, input_string):
    # Start with a set containing only the start state.
    current_states = {machine["start_state"]}
    print(f"Input: '{input_string}'")
    print(f"  Start at states {current_states}")

    # Process each symbol.
    for symbol in input_string:
        next_states = set()
        # Find all possible next states from all current states.
        for state in current_states:
            # .get(state, {}) handles states with no outgoing transitions.
            # .get(symbol, set()) handles states that don't transition on this symbol.
            found_states = machine["transitions"].get(state, {}).get(symbol, set())
            next_states.update(found_states)
        current_states = next_states
        print(f"Read '{symbol}', new states are {current_states}")
    
    # Check if the intersection of final states and accept states is non-empty.
    if current_states & machine["accept_states"]:
        print("Finished with at least one path in an accept state. ACCEPTED.")
        return True
    else:
        print("Finished with no paths in an accept state. REJECTED.")
        return False

The above is a direct comparision to our DFA engine, but we'll need another that supports the 'epsilon' moves - where the engine can move to another state without an input.  
Let's build that too.  

Set of all states reachable by free ($\epsilon$) moves is called the **epsilon $\epsilon$ closure**

In [28]:
# A helper function to find all states reachable from a set of states via epsilon moves.
# 
def get_epsilon_closure(states, transitions):
    closure = set(states)
    stack = list(states)
    while stack:
        state = stack.pop()
        # Find all states reachable from the current state on an epsilon move.
        epsilon_states = transitions.get(state, {}).get("ε", set())
        for s in epsilon_states:
            if s not in closure:
                closure.add(s)
                stack.append(s)
    return closure

In [29]:
# A NFA engine that handles epsilon transitions.
def run_da_nfa_with_epsilon(machine, input_string):
    # The initial states are the closure of the start state.
    current_states = get_epsilon_closure({machine["start_state"]}, machine["transitions"])
    print(f"Input: '{input_string}'")
    print(f"  Start states (after ε-closure): {current_states}")
    
    for symbol in input_string:
        next_states_after_symbol = set()
        for state in current_states:
            # Find where the actual symbol takes us.
            found_states = machine["transitions"].get(state, {}).get(symbol, set())
            next_states_after_symbol.update(found_states)
        
        # The new current states are the epsilon closure of where the symbol took us.
        current_states = get_epsilon_closure(next_states_after_symbol, machine["transitions"])
        print(f"  Read '{symbol}', new states (after ε-closure) are {current_states}")

    if current_states & machine["accept_states"]:
        print("  Finished with at least one path in an accept state. ACCEPTED")
        return True
    else:
        print("  Finished with no paths in an accept state. REJECTED")
        return False

#### Example 01: An NFA that accepts strings containing "101"

This is tricky for a DFA but intuitive for an NFA.   
The NFA can "guess" when the "101" substring starts.

  * `q0`: Start state. The machine waits here, looking for the start of the pattern.
  * `q1`: "I might have just seen the first '1' of '101'".
  * `q2`: "I might have just seen '10'".
  * `q3`: "I have seen '101'". This is the **accept** state.

In [30]:
# The transition values are now sets of states.
nfa_contains_101 = {
    "states": {"q0", "q1", "q2", "q3"},
    "alphabet": {"0", "1"},
    "transitions": {
        # Transitions for q0, q1, and q2 are unchanged.
        "q0": {"0": {"q0"}, "1": {"q0", "q1"}}, # reading the first '1', the machine is in both states `q0` and `q1` simultaneously, exploring both possibilities.
        "q1": {"0": {"q2"}},
        "q2": {"1": {"q3"}},
        # The accept state now loops on any symbol.
        # This ensures that once "101" is found, the machine stays in an accept state.
        "q3": {"0": {"q3"}, "1": {"q3"}}, 
    },
    "start_state": "q0",
    "accept_states": {"q3"},
}

In [31]:
run_da_nfa(nfa_contains_101, "01010")

Input: '01010'
  Start at states {'q0'}
Read '0', new states are {'q0'}
Read '1', new states are {'q1', 'q0'}
Read '0', new states are {'q0', 'q2'}
Read '1', new states are {'q3', 'q1', 'q0'}
Read '0', new states are {'q3', 'q0', 'q2'}
Finished with at least one path in an accept state. ACCEPTED.


True

In [32]:
run_da_nfa(nfa_contains_101, "01001")

Input: '01001'
  Start at states {'q0'}
Read '0', new states are {'q0'}
Read '1', new states are {'q1', 'q0'}
Read '0', new states are {'q0', 'q2'}
Read '0', new states are {'q0'}
Read '1', new states are {'q1', 'q0'}
Finished with no paths in an accept state. REJECTED.


False

#### Example 02: An NFA with Epsilon ($\epsilon$) Moves

Let's model `a*b*` (any number of 'a's followed by any number of 'b's).   
Epsilon moves make this easy by "gluing" an 'a'-looping machine to a 'b'-looping machine.


In [33]:
nfa_a_star_b_star = {
    "states": {"q0", "q1", "q2", "q3"},
    "alphabet": {"a", "b"},
    "transitions": {
        "q0": {"ε": {"q1", "q2"}}, # Start by choosing to match a's or b's
        "q1": {"a": {"q1"}, "ε": {"q2"}}, # Match a's, then optionally move to b's
        "q2": {"b": {"q2"}, "ε": {"q3"}}, # Match b's, then optionally finish
    },
    "start_state": "q0",
    "accept_states": {"q3"},
}

In [34]:
# --- Testing our epsilon-NFA ---
run_da_nfa_with_epsilon(nfa_a_star_b_star, "aaabb")

Input: 'aaabb'
  Start states (after ε-closure): {'q3', 'q1', 'q0', 'q2'}
  Read 'a', new states (after ε-closure) are {'q3', 'q1', 'q2'}
  Read 'a', new states (after ε-closure) are {'q3', 'q1', 'q2'}
  Read 'a', new states (after ε-closure) are {'q3', 'q1', 'q2'}
  Read 'b', new states (after ε-closure) are {'q3', 'q2'}
  Read 'b', new states (after ε-closure) are {'q3', 'q2'}
  Finished with at least one path in an accept state. ACCEPTED


True

In [35]:
run_da_nfa_with_epsilon(nfa_a_star_b_star, "aab_ba") # This will fail on the invalid '_'


Input: 'aab_ba'
  Start states (after ε-closure): {'q3', 'q1', 'q0', 'q2'}
  Read 'a', new states (after ε-closure) are {'q3', 'q1', 'q2'}
  Read 'a', new states (after ε-closure) are {'q3', 'q1', 'q2'}
  Read 'b', new states (after ε-closure) are {'q3', 'q2'}
  Read '_', new states (after ε-closure) are set()
  Read 'b', new states (after ε-closure) are set()
  Read 'a', new states (after ε-closure) are set()
  Finished with no paths in an accept state. REJECTED


False

#### Example 03: NFA accepting strings that **end in "01"**  
The NFA "guesses" when the final "01" sequence begins.   
It can stay in `q0` or jump to `q1` upon seeing a '0'.

In [36]:
nfa_ends_in_01 = {
    "states": {"q0", "q1", "q2"},
    "alphabet": {"0", "1"},
    "start_state": "q0",
    "accept_states": {"q2"},
    "transitions": {
        "q0": {"0": {"q0", "q1"}, "1": {"q0"}}, # Stay or start guessing
        "q1": {"1": {"q2"}},                    # Must be followed by '1'
        "q2": {},
    },
}

In [37]:
run_da_nfa_with_epsilon(nfa_ends_in_01, "1101") # Should accept

Input: '1101'
  Start states (after ε-closure): {'q0'}
  Read '1', new states (after ε-closure) are {'q0'}
  Read '1', new states (after ε-closure) are {'q0'}
  Read '0', new states (after ε-closure) are {'q1', 'q0'}
  Read '1', new states (after ε-closure) are {'q0', 'q2'}
  Finished with at least one path in an accept state. ACCEPTED


True

In [38]:
run_da_nfa_with_epsilon(nfa_ends_in_01, "010")  # Should reject

Input: '010'
  Start states (after ε-closure): {'q0'}
  Read '0', new states (after ε-closure) are {'q1', 'q0'}
  Read '1', new states (after ε-closure) are {'q0', 'q2'}
  Read '0', new states (after ε-closure) are {'q1', 'q0'}
  Finished with no paths in an accept state. REJECTED


False

#### Example 04: NFA where the **third-to-last symbol is 'a'**

This is famously difficult for a DFA but easy for an NFA.   
The machine guesses it has seen the third-to-last 'a' and then just counts two more symbols.

In [39]:
nfa_third_last_is_a = {
    "states": {"q0", "q1", "q2", "q3"},
    "alphabet": {"a", "b"},
    "start_state": "q0",
    "accept_states": {"q3"},
    "transitions": {
        "q0": {"a": {"q0", "q1"}, "b": {"q0"}},
        "q1": {"a": {"q2"}, "b": {"q2"}}, # Count 1 after 'a'
        "q2": {"a": {"q3"}, "b": {"q3"}}, # Count 2 after 'a'
        "q3": {},
    },
}

In [40]:
run_da_nfa_with_epsilon(nfa_third_last_is_a, "baba") # Should accept

Input: 'baba'
  Start states (after ε-closure): {'q0'}
  Read 'b', new states (after ε-closure) are {'q0'}
  Read 'a', new states (after ε-closure) are {'q1', 'q0'}
  Read 'b', new states (after ε-closure) are {'q0', 'q2'}
  Read 'a', new states (after ε-closure) are {'q3', 'q1', 'q0'}
  Finished with at least one path in an accept state. ACCEPTED


True

In [41]:
run_da_nfa_with_epsilon(nfa_third_last_is_a, "bbba") # Should reject

Input: 'bbba'
  Start states (after ε-closure): {'q0'}
  Read 'b', new states (after ε-closure) are {'q0'}
  Read 'b', new states (after ε-closure) are {'q0'}
  Read 'b', new states (after ε-closure) are {'q0'}
  Read 'a', new states (after ε-closure) are {'q1', 'q0'}
  Finished with no paths in an accept state. REJECTED


False

#### Example 05: NFA accepting strings that **start with 'a' OR end with 'b'**  
  
This NFA uses an ε-move ($\epsilon$ - move) from the start to branch into two independent sub-machines: one that checks for a starting 'a' and one that checks for an ending 'b'.

In [42]:
# NFA for "starts with 'a' OR ends with 'b'"
nfa_starts_a_or_ends_b = {
    "states": {"q0", "q_a1", "q_a2", "q_b1", "q_b2"},
    "alphabet": {"a", "b", "x"}, # Added 'x' to handle test cases
    "start_state": "q0",
    "accept_states": {"q_a2", "q_b2"},
    "transitions": {
        # Initial branch into two independent machines.
        "q0": {"ε": {"q_a1", "q_b1"}},

        # --- Sub-machine 1: Starts with 'a' ---
        # Must see 'a' first.
        "q_a1": {"a": {"q_a2"}}, 
        # Once it has started with 'a', it accepts. This state consumes the rest of the string.
        "q_a2": {"a": {"q_a2"}, "b": {"q_a2"}, "x": {"q_a2"}},

        # --- Sub-machine 2: Ends with 'b' ---
        # This state consumes any character.
        "q_b1": {"a": {"q_b1"}, "x": {"q_b1"}, 
                 # On 'b', it can either stay or guess this is the end.
                 "b": {"q_b1", "q_b2"}},
        # This is a temporary accept state. If more input arrives, the path must die.
        "q_b2": {},
    },
}

In [43]:
run_da_nfa_with_epsilon(nfa_starts_a_or_ends_b, "ax") # Should accept

Input: 'ax'
  Start states (after ε-closure): {'q_b1', 'q0', 'q_a1'}
  Read 'a', new states (after ε-closure) are {'q_b1', 'q_a2'}
  Read 'x', new states (after ε-closure) are {'q_b1', 'q_a2'}
  Finished with at least one path in an accept state. ACCEPTED


True

In [44]:
run_da_nfa_with_epsilon(nfa_starts_a_or_ends_b, "xa") # Should reject

Input: 'xa'
  Start states (after ε-closure): {'q_b1', 'q0', 'q_a1'}
  Read 'x', new states (after ε-closure) are {'q_b1'}
  Read 'a', new states (after ε-closure) are {'q_b1'}
  Finished with no paths in an accept state. REJECTED


False

In [45]:
run_da_nfa_with_epsilon(nfa_starts_a_or_ends_b, "xb") # Should accept

Input: 'xb'
  Start states (after ε-closure): {'q_b1', 'q0', 'q_a1'}
  Read 'x', new states (after ε-closure) are {'q_b1'}
  Read 'b', new states (after ε-closure) are {'q_b1', 'q_b2'}
  Finished with at least one path in an accept state. ACCEPTED


True

In [46]:
run_da_nfa_with_epsilon(nfa_starts_a_or_ends_b, "bx") # Should reject

Input: 'bx'
  Start states (after ε-closure): {'q_b1', 'q0', 'q_a1'}
  Read 'b', new states (after ε-closure) are {'q_b1', 'q_b2'}
  Read 'x', new states (after ε-closure) are {'q_b1'}
  Finished with no paths in an accept state. REJECTED


False

In [47]:
run_da_nfa_with_epsilon(nfa_starts_a_or_ends_b, "abba") # Should accept

Input: 'abba'
  Start states (after ε-closure): {'q_b1', 'q0', 'q_a1'}
  Read 'a', new states (after ε-closure) are {'q_b1', 'q_a2'}
  Read 'b', new states (after ε-closure) are {'q_b1', 'q_b2', 'q_a2'}
  Read 'b', new states (after ε-closure) are {'q_b1', 'q_b2', 'q_a2'}
  Read 'a', new states (after ε-closure) are {'q_b1', 'q_a2'}
  Finished with at least one path in an accept state. ACCEPTED


True

In [48]:
run_da_nfa_with_epsilon(nfa_starts_a_or_ends_b, "baba") # Should reject

Input: 'baba'
  Start states (after ε-closure): {'q_b1', 'q0', 'q_a1'}
  Read 'b', new states (after ε-closure) are {'q_b1', 'q_b2'}
  Read 'a', new states (after ε-closure) are {'q_b1'}
  Read 'b', new states (after ε-closure) are {'q_b1', 'q_b2'}
  Read 'a', new states (after ε-closure) are {'q_b1'}
  Finished with no paths in an accept state. REJECTED


False

In [49]:
run_da_nfa_with_epsilon(nfa_starts_a_or_ends_b, "aabb") # Should accept

Input: 'aabb'
  Start states (after ε-closure): {'q_b1', 'q0', 'q_a1'}
  Read 'a', new states (after ε-closure) are {'q_b1', 'q_a2'}
  Read 'a', new states (after ε-closure) are {'q_b1', 'q_a2'}
  Read 'b', new states (after ε-closure) are {'q_b1', 'q_b2', 'q_a2'}
  Read 'b', new states (after ε-closure) are {'q_b1', 'q_b2', 'q_a2'}
  Finished with at least one path in an accept state. ACCEPTED


True

In [50]:
run_da_nfa_with_epsilon(nfa_starts_a_or_ends_b, "bbaa") # Should reject

Input: 'bbaa'
  Start states (after ε-closure): {'q_b1', 'q0', 'q_a1'}
  Read 'b', new states (after ε-closure) are {'q_b1', 'q_b2'}
  Read 'b', new states (after ε-closure) are {'q_b1', 'q_b2'}
  Read 'a', new states (after ε-closure) are {'q_b1'}
  Read 'a', new states (after ε-closure) are {'q_b1'}
  Finished with no paths in an accept state. REJECTED


False

#### Example 06: NFA accepting strings with an **optional "ab" substring** (`c(ab)?d`)

This machine must see a 'c', can optionally see "ab", and must end with a 'd'.   
The optional part is handled with an ε-move that bypasses the "ab" states.

In [51]:
nfa_optional_ab = {
    "states": {"q0", "q1", "q2", "q3", "q4"},
    "alphabet": {"a", "b", "c", "d"},
    "start_state": "q0",
    "accept_states": {"q4"},
    "transitions": {
        "q0": {"c": {"q1"}},
        "q1": {"a": {"q2"}, "ε": {"q3"}}, # Must see 'a' or skip
        "q2": {"b": {"q3"}},
        "q3": {"d": {"q4"}},
        "q4": {},
    },
}

In [52]:
run_da_nfa_with_epsilon(nfa_optional_ab, "cabd") # Should accept

Input: 'cabd'
  Start states (after ε-closure): {'q0'}
  Read 'c', new states (after ε-closure) are {'q3', 'q1'}
  Read 'a', new states (after ε-closure) are {'q2'}
  Read 'b', new states (after ε-closure) are {'q3'}
  Read 'd', new states (after ε-closure) are {'q4'}
  Finished with at least one path in an accept state. ACCEPTED


True

In [53]:
run_da_nfa_with_epsilon(nfa_optional_ab, "cd")   # Should accept

Input: 'cd'
  Start states (after ε-closure): {'q0'}
  Read 'c', new states (after ε-closure) are {'q3', 'q1'}
  Read 'd', new states (after ε-closure) are {'q4'}
  Finished with at least one path in an accept state. ACCEPTED


True

In [54]:
run_da_nfa_with_epsilon(nfa_optional_ab, "cad")  # Should reject

Input: 'cad'
  Start states (after ε-closure): {'q0'}
  Read 'c', new states (after ε-closure) are {'q3', 'q1'}
  Read 'a', new states (after ε-closure) are {'q2'}
  Read 'd', new states (after ε-closure) are set()
  Finished with no paths in an accept state. REJECTED


False

#### Example 07: NFA accepting strings containing either **"aa" or "bb"**

This NFA branches from the start and loops, but upon seeing an 'a' or a 'b', it can jump to a path that checks for a consecutive identical character.

In [55]:
nfa_aa_or_bb = {
    "states": {"q0", "q_a1", "q_a2", "q_b1", "q_b2"},
    "alphabet": {"a", "b"},
    "start_state": "q0",
    "accept_states": {"q_a2", "q_b2"},
    "transitions": {
        "q0": {"a": {"q0", "q_a1"}, "b": {"q0", "q_b1"}},
        "q_a1": {"a": {"q_a2"}},
        "q_b1": {"b": {"q_b2"}},
        # Once accepted, stay accepted
        "q_a2": {"a": {"q_a2"}, "b": {"q_a2"}},
        "q_b2": {"a": {"q_b2"}, "b": {"q_b2"}},
    },
}

In [56]:
run_da_nfa_with_epsilon(nfa_aa_or_bb, "baa") # Should accept

Input: 'baa'
  Start states (after ε-closure): {'q0'}
  Read 'b', new states (after ε-closure) are {'q_b1', 'q0'}
  Read 'a', new states (after ε-closure) are {'q0', 'q_a1'}
  Read 'a', new states (after ε-closure) are {'q_a2', 'q0', 'q_a1'}
  Finished with at least one path in an accept state. ACCEPTED


True

In [57]:
run_da_nfa_with_epsilon(nfa_aa_or_bb, "aba") # Should reject

Input: 'aba'
  Start states (after ε-closure): {'q0'}
  Read 'a', new states (after ε-closure) are {'q0', 'q_a1'}
  Read 'b', new states (after ε-closure) are {'q_b1', 'q0'}
  Read 'a', new states (after ε-closure) are {'q0', 'q_a1'}
  Finished with no paths in an accept state. REJECTED


False