# Regular Languages and Finite Automata

This notebook explores regular languages through deterministic and non-deterministic finite automata.

## Learning Objectives
- Understand DFA and NFA construction
- Explore regular language properties
- Practice automata design patterns
- Visualize state transitions

In [None]:
import sys
sys.path.append('../src')
from automata import DFA, NFA

## 1. Basic DFA Construction

Let's start with a simple DFA that accepts strings with an even number of 0s.

In [None]:
# DFA for even number of 0s
even_zeros = DFA(
    states={'q0', 'q1'},
    alphabet={'0', '1'},
    transitions={
        ('q0', '0'): 'q1',
        ('q0', '1'): 'q0',
        ('q1', '0'): 'q0',
        ('q1', '1'): 'q1'
    },
    start_state='q0',
    accept_states={'q0'}
)

# Test the DFA
test_strings = ['', '0', '00', '000', '0110', '1010', '111']
print("Testing DFA for even number of 0s:")
for s in test_strings:
    result = even_zeros.accepts(s)
    zeros = s.count('0')
    expected = (zeros % 2 == 0)
    status = "✓" if result == expected else "✗"
    print(f"'{s}' ({zeros} zeros): {'ACCEPT' if result else 'REJECT'} {status}")

## 2. DFA for Divisibility by 3

A more complex example: DFA that accepts binary strings representing numbers divisible by 3.

In [None]:
# DFA for binary numbers divisible by 3
div_by_3 = DFA(
    states={'q0', 'q1', 'q2'},  # remainders 0, 1, 2
    alphabet={'0', '1'},
    transitions={
        ('q0', '0'): 'q0',  # 0*2 + 0 = 0 mod 3
        ('q0', '1'): 'q1',  # 0*2 + 1 = 1 mod 3
        ('q1', '0'): 'q2',  # 1*2 + 0 = 2 mod 3
        ('q1', '1'): 'q0',  # 1*2 + 1 = 3 = 0 mod 3
        ('q2', '0'): 'q1',  # 2*2 + 0 = 4 = 1 mod 3
        ('q2', '1'): 'q2'   # 2*2 + 1 = 5 = 2 mod 3
    },
    start_state='q0',
    accept_states={'q0'}
)

# Test divisibility by 3
test_cases = ['0', '11', '110', '1001', '1100', '1111']
print("\nTesting DFA for divisibility by 3:")
for binary in test_cases:
    decimal = int(binary, 2)
    result = div_by_3.accepts(binary)
    expected = (decimal % 3 == 0)
    status = "✓" if result == expected else "✗"
    print(f"'{binary}' (decimal {decimal}): {'DIVISIBLE' if result else 'NOT DIVISIBLE'} {status}")

## 3. Non-deterministic Finite Automata (NFA)

NFAs allow multiple transitions and epsilon moves, making some languages easier to express.

In [None]:
# NFA that accepts strings ending with "01"
ends_with_01 = NFA(
    states={'q0', 'q1', 'q2'},
    alphabet={'0', '1'},
    transitions={
        ('q0', '0'): {'q0', 'q1'},  # Non-deterministic choice
        ('q0', '1'): {'q0'},
        ('q1', '1'): {'q2'}         # Accept after seeing "01"
    },
    start_state='q0',
    accept_states={'q2'}
)

# Test NFA
test_strings = ['01', '001', '101', '1001', '0', '1', '10', '00']
print("\nTesting NFA for strings ending with '01':")
for s in test_strings:
    result = ends_with_01.accepts(s)
    expected = s.endswith('01')
    status = "✓" if result == expected else "✗"
    print(f"'{s}': {'ACCEPT' if result else 'REJECT'} {status}")

## 4. NFA with Epsilon Transitions

Epsilon transitions allow state changes without consuming input.

In [None]:
# NFA with epsilon transitions: accepts "a*b*" (zero or more a's followed by zero or more b's)
a_star_b_star = NFA(
    states={'q0', 'q1', 'q2'},
    alphabet={'a', 'b'},
    transitions={
        ('q0', 'a'): {'q0'},        # Stay in q0 for more a's
        ('q0', ''): {'q1'},         # Epsilon transition to b's section
        ('q1', 'b'): {'q1'},        # Stay in q1 for more b's
        ('q1', ''): {'q2'}          # Epsilon transition to accept
    },
    start_state='q0',
    accept_states={'q0', 'q1', 'q2'}  # Accept at any stage
)

# Test epsilon NFA
test_cases = ['', 'a', 'aa', 'b', 'bb', 'ab', 'aab', 'abb', 'aabb', 'ba', 'aba']
print("\nTesting NFA with ε-transitions for a*b*:")
for s in test_cases:
    result = a_star_b_star.accepts(s)
    # Check if string matches a*b* pattern
    valid = True
    seen_b = False
    for char in s:
        if char == 'b':
            seen_b = True
        elif char == 'a' and seen_b:
            valid = False
            break
    status = "✓" if result == valid else "✗"
    print(f"'{s}': {'ACCEPT' if result else 'REJECT'} {status}")

## 5. Regular Language Properties

Regular languages are closed under union, intersection, and complement.

In [None]:
# Demonstrate closure properties

# Language 1: strings with even number of 0s (already defined)
# Language 2: strings with even number of 1s
even_ones = DFA(
    states={'p0', 'p1'},
    alphabet={'0', '1'},
    transitions={
        ('p0', '0'): 'p0',
        ('p0', '1'): 'p1',
        ('p1', '0'): 'p1',
        ('p1', '1'): 'p0'
    },
    start_state='p0',
    accept_states={'p0'}
)

# Test both languages
test_strings = ['', '0', '1', '01', '10', '11', '00', '0011', '1100']
print("\nClosure Properties Demonstration:")
print("String\tEven 0s\tEven 1s\tBoth\tEither")
print("-" * 45)

for s in test_strings:
    even_0s = even_zeros.accepts(s)
    even_1s = even_ones.accepts(s)
    both = even_0s and even_1s  # Intersection
    either = even_0s or even_1s  # Union
    
    print(f"'{s}'\t{even_0s}\t{even_1s}\t{both}\t{either}")

## 6. Pumping Lemma Demonstration

The pumping lemma helps prove that certain languages are not regular.

In [None]:
def pumping_lemma_test(dfa, string, pump_length=None):
    """
    Demonstrate pumping lemma by showing that a string can be pumped.
    For regular languages, there exists a pumping length p such that
    any string s with |s| >= p can be divided into xyz where |xy| <= p,
    |y| > 0, and xy^i z is in the language for all i >= 0.
    """
    if pump_length is None:
        pump_length = len(dfa.states)
    
    if len(string) < pump_length:
        return f"String too short for pumping (length {len(string)} < {pump_length})"
    
    # Try different divisions
    for i in range(1, min(pump_length + 1, len(string))):
        for j in range(i, min(pump_length + 1, len(string))):
            x = string[:i]
            y = string[i:j]
            z = string[j:]
            
            if len(y) == 0:
                continue
                
            # Test pumping: xy^0z, xy^1z, xy^2z
            pumped_strings = [
                x + z,           # i=0 (remove y)
                x + y + z,       # i=1 (original)
                x + y + y + z    # i=2 (pump once)
            ]
            
            results = [dfa.accepts(s) for s in pumped_strings]
            
            if all(results) or not any(results):  # All same result
                return f"Pumpable: x='{x}', y='{y}', z='{z}' -> {results}"
    
    return "No valid pumping found"

# Test pumping lemma on our DFAs
print("\nPumping Lemma Demonstrations:")
test_cases = [
    (even_zeros, '000000'),
    (div_by_3, '110110'),
    (even_zeros, '101010')
]

for dfa, test_string in test_cases:
    result = pumping_lemma_test(dfa, test_string)
    print(f"String '{test_string}': {result}")

## 7. Pattern Matching Applications

Regular languages are fundamental to pattern matching and lexical analysis.

In [None]:
# DFA for simple email validation (simplified pattern)
# Accepts strings of form: letters@letters.letters
email_validator = DFA(
    states={'start', 'username', 'at', 'domain', 'dot', 'tld', 'accept'},
    alphabet=set('abcdefghijklmnopqrstuvwxyz@.'),
    transitions={
        **{('start', c): 'username' for c in 'abcdefghijklmnopqrstuvwxyz'},
        **{('username', c): 'username' for c in 'abcdefghijklmnopqrstuvwxyz'},
        ('username', '@'): 'at',
        **{('at', c): 'domain' for c in 'abcdefghijklmnopqrstuvwxyz'},
        **{('domain', c): 'domain' for c in 'abcdefghijklmnopqrstuvwxyz'},
        ('domain', '.'): 'dot',
        **{('dot', c): 'tld' for c in 'abcdefghijklmnopqrstuvwxyz'},
        **{('tld', c): 'tld' for c in 'abcdefghijklmnopqrstuvwxyz'}
    },
    start_state='start',
    accept_states={'tld'}
)

# Test email patterns
email_tests = [
    'user@domain.com',
    'test@example.org',
    'invalid@',
    '@domain.com',
    'user@domain',
    'user.domain.com'
]

print("\nEmail Pattern Validation:")
for email in email_tests:
    result = email_validator.accepts(email)
    print(f"'{email}': {'VALID' if result else 'INVALID'}")

## 8. State Minimization Concept

Every DFA has an equivalent minimal DFA with the fewest possible states.

In [None]:
# Example of a non-minimal DFA (redundant states)
redundant_dfa = DFA(
    states={'q0', 'q1', 'q2', 'q3'},
    alphabet={'0', '1'},
    transitions={
        ('q0', '0'): 'q1',
        ('q0', '1'): 'q2',
        ('q1', '0'): 'q0',
        ('q1', '1'): 'q3',
        ('q2', '0'): 'q3',  # q2 and q3 behave identically
        ('q2', '1'): 'q0',
        ('q3', '0'): 'q2',
        ('q3', '1'): 'q0'
    },
    start_state='q0',
    accept_states={'q0'}
)

# Equivalent minimal DFA
minimal_dfa = DFA(
    states={'q0', 'q1', 'q2'},
    alphabet={'0', '1'},
    transitions={
        ('q0', '0'): 'q1',
        ('q0', '1'): 'q2',
        ('q1', '0'): 'q0',
        ('q1', '1'): 'q2',
        ('q2', '0'): 'q2',
        ('q2', '1'): 'q0'
    },
    start_state='q0',
    accept_states={'q0'}
)

# Verify equivalence
test_strings = ['', '0', '1', '00', '01', '10', '11', '000', '111']
print("\nDFA Minimization - Verifying Equivalence:")
print("String\tRedundant\tMinimal\tEquivalent")
print("-" * 40)

all_equivalent = True
for s in test_strings:
    result1 = redundant_dfa.accepts(s)
    result2 = minimal_dfa.accepts(s)
    equivalent = result1 == result2
    all_equivalent &= equivalent
    
    print(f"'{s}'\t{result1}\t\t{result2}\t{equivalent}")

print(f"\nAll tests equivalent: {all_equivalent}")
print(f"Original states: {len(redundant_dfa.states)}, Minimal states: {len(minimal_dfa.states)}")

## 9. Conclusion

### Key Concepts Learned:

1. **DFA Construction**: Systematic approach to building deterministic automata
2. **NFA Power**: Non-determinism and epsilon transitions for easier design
3. **Regular Properties**: Closure under union, intersection, complement
4. **Pumping Lemma**: Tool for proving non-regularity
5. **Practical Applications**: Pattern matching, lexical analysis
6. **Minimization**: Every regular language has a unique minimal DFA

### Next Steps:
- Explore context-free languages and pushdown automata
- Study regular expressions and their equivalence to finite automata
- Learn about more complex closure properties
- Practice with real-world pattern matching problems