- **Alphabet** is finite set of symbols ( or characters )

In [1]:
from typing import Set

Symbol = str  # in other languages it would be char ( here I mean str of length 1 )
Alphabet = Set[ Symbol ]

Example of alphabet:

In [2]:
alphabet = { "a", "b", "c" }

- **Word** is finite sequence of symbols
- |w| is the length of word w
- #a(w) is count of symbols "a" in word w

In [3]:
Word = str

# |w| is len( w )
# #a(w) is w.count( a )

- Set of all words generated by alphabet A is A*
- Set of all **non-empty** words generated by alphabet A is A+

In [4]:
from typing import Iterator
from collections import deque  # For storing already derived words

def all_words( alphabet: Alphabet ) -> Iterator[ Word ]:
    yield from all_words_help( alphabet, non_empty=False )

def all_words_help( alphabet: Alphabet, non_empty: bool ) -> Iterator[ Word ]:

    q = deque( [ "" ] )

    while q:
        current = q.popleft( )
        if len( current ) > 0 or ( not non_empty ):
            yield current

        for symbol in alphabet:
            q.append( current + symbol )

def all_nonempty_words( alphabet: Alphabet ) -> Iterator[ Word ]:
    yield from all_words_help( alphabet, non_empty=True )


Couple of words generated by our alphabet

In [5]:

all_w = all_nonempty_words( alphabet )
for _ in range( 13 ):
    print( next( all_w ) )


b
c
a
bb
bc
ba
cb
cc
ca
ab
ac
aa
bbb


- Concatenation of words "." ( chaining ) is putting two words together
- Power of word in defined as: w ^(i+1) = w.w^i

In [6]:
def power_word( word: Word, i: int ) -> Word:
    if i == 0:
        return ""

    return word + power_word( word, i - 1 )

Example again:

In [7]:
power_word( "abc", 2 )

'abcabc'

- Word a is **subword** of word b if there are words c, d such b = c.b.d
- if c is an empty word, then we call a "prefix" of b
- if d is an empty word, then we call a "suffix" of b

In [8]:
def is_subword( a: Word, b: Word ) -> bool:
    return a in b

- **Language** is an arbitrary set of words on the given alphabet

In [9]:
Language = Set[ Word ]

- We can perform standard set operations over language, such as union, intersection and difference

In [10]:
# in python already provided by standard library

l1 = { "abc", "a", "b" }
l2 = { "aa", "ba" }

# l1.union( l2 )
# l1.intersection( l2 )
# l1 - l2

**Concatenation** of languages ( chaining ) is defined as follows:
- K.L = { u.v | u belongs to K, v belongs to L }

In [11]:
def concat( l1: Language, l2: Language ) -> Language:
    res = set()
    
    for u in l1:
        for v in l2:
            res.add( u + v )
    
    return res

Example:

In [12]:
concat( l1, l2 )

{'aaa', 'aba', 'abcaa', 'abcba', 'baa', 'bba'}

Powering of languages is defined as follows:
- L^0 = { "" }
- L^(i+1) = L.L^i

In [13]:
def power_language( l: Language, i: int ) -> Language:
    if i == 0:
        return { "" }

    return concat( l, power_language( l, i - 1 ) )

Example:

In [14]:
power_language( l1, 2 )

{'aa', 'aabc', 'ab', 'abca', 'abcabc', 'abcb', 'ba', 'babc', 'bb'}

Iteration over language L is defined as follows:
- L* = Union over i = 0 to infinity of L^i ( iteration )
- L+ = Union over i = 1 to infinity of L^1 ( positive iteration )

In [15]:
def iteration_over( l: Language ) -> Iterator[ Language ]:
    i = 0
    current = set()
    while True:
        current = current.union( power_language( l, i ) )
        yield current
        i += 1

def positive_iteration_over( l: Language ) -> Iterator[ Language ]:
    i = 1
    current = set()
    while True:
        current = current.union( power_language( l, i ) )
        yield current
        i += 1

Example:

In [16]:
it = positive_iteration_over( l1 )

for _ in range( 3 ):
    print( next( it ) )


{'b', 'abc', 'a'}
{'abc', 'abcb', 'ba', 'babc', 'bb', 'ab', 'a', 'b', 'abcabc', 'aabc', 'aa', 'abca'}
{'abcabca', 'abcaabc', 'aabcb', 'abcab', 'b', 'abcabc', 'baa', 'aabc', 'bbabc', 'abcbb', 'abca', 'aab', 'abcabcb', 'abcb', 'bab', 'babca', 'aaabc', 'abb', 'ababc', 'abcbabc', 'abc', 'ba', 'babc', 'abcaa', 'baabc', 'bb', 'bbb', 'babcb', 'a', 'abcba', 'aaa', 'aba', 'bba', 'ab', 'babcabc', 'aa', 'abcabcabc', 'aabca', 'aabcabc'}


In [19]:

a = { "a" }
b = { "b" }

L_a = concat( concat( b, a ), b )

L_7 = power_language( L_a, 7 )

i = iteration_over( L_7 )
for _ in range( 15 ):
    print( next( i ) )

{'babbabbabbabbabbabbab'}
