Ternary trees is similar to Btrees in spirit but they have a little more data in the sense that they don't store values as such but they store characters. Whereas Btrees() can store anything, a ternary trees is intended to store strings. That's its purpose, so it's a little less general purpose.

A ternary search tree has nodes with the following attributes:
* a character, can be `None`;
* a Boolean flag that indicates whether the character represented
  by this node has been the last in a string that was inserted in the
  tree;
* the "less-than" child;
* the "equals" child and
* the "larger-than" child.

The data structure should support the following operations:
* string insert
* string search
* prefix string search
* return the number of strings stored in the data structure
* return all strings stored in the data structure

Also ensure that an instance of the data structure can be visualy represented, e.g., in aSCII format.

# Implementation

In this case, it is implemented in one/two python files, as a module which we need to write. We can either implement it as a module or do the implementation in the notebook

In [1]:
%load_ext autoreload
%autoreload 2

The data structure has been implemented as a class.

In [2]:
from ternary_search_tree import TernarySearchTree

In [40]:
class TtreeNode:
    """
    Node in a ternary search tree, representing a single character in a string.
    """

    def __init__(self, char):
        """
        Initialize a tree node with the given character.

        Parameters
        ----------
        char : str
            The character to store in this node (only the first character is used).
        """
        self._char = char[0]
        self._lt = None
        self._eq = None
        self._gt = None
        self.flag_wordend = False

    def _insert(self, string):
        """
        Insert a string into the tree, starting at this node.

        Parameters
        ----------
        string : str
            The string to insert into the tree.
        """
        if not string:
            self.flag_wordend = True
            return

        char, rest = string[0], string[1:]
        if char < self._char:
            if not self._lt:
                self._lt = TtreeNode(char)
            self._lt._insert(string)
        elif char > self._char:
            if not self._gt:
                self._gt = TtreeNode(char)
            self._gt._insert(string)
        else:
            if not rest:
                self.flag_wordend = True
            else:
                if not self._eq:
                    self._eq = TtreeNode(rest[0])
                self._eq._insert(rest)

    def _all_strings(self, pf=''):
        """
        Get all strings stored in the subtree rooted at this node.

        Parameters
        ----------
        pf : str, optional
            The prefix built so far (default is '').

        Returns
        -------
        list of str
            All strings that originate from this node.
        """
        words = []
        word = pf + self._char
        if self.flag_wordend:
            words.append(word)
        if self._lt:
            words.extend(self._lt._all_strings(pf))
        if self._eq:
            words.extend(self._eq._all_strings(word))
        if self._gt:
            words.extend(self._gt._all_strings(pf))
        return words

    def _psearch(self, string):
        """
        Search for a prefix in the tree starting from this node.

        Parameters
        ----------
        string : str
            The prefix to search for.

        Returns
        -------
        TtreeNode or None
            The node where the prefix ends, or None if not found.
        """
        if not string:
            return self if self.flag_wordend else None

        char, rest = string[0], string[1:]
        if char < self._char and self._lt:
            return self._lt._psearch(string)
        elif char > self._char and self._gt:
            return self._gt._psearch(string)
        elif char == self._char:
            if not rest:
                return self
            if self._eq:
                return self._eq._psearch(rest)
        return None

    def __len__(self):
        """
        Count how many strings are stored in the subtree.

        Returns
        -------
        int
            Number of complete strings in the subtree.
        """
        count = 1 if self.flag_wordend else 0
        if self._lt:
            count += len(self._lt)
        if self._eq:
            count += len(self._eq)
        if self._gt:
            count += len(self._gt)
        return count

    def _to_string(self, indent=''):
        """
        Get a string representation of the tree rooted at this node.

        Parameters
        ----------
        indent : str
            Indentation for formatting child nodes.

        Returns
        -------
        str
            Formatted string representation of the subtree.
        """
        s = f"{indent}char: {repr(self)}, Terminates: {self.flag_wordend}"
        if self._lt:
            s += '\n' + self._lt._to_string(indent + '  ')
        if self._eq:
            s += '\n' + self._eq._to_string(indent + '  ')
        if self._gt:
            s += '\n' + self._gt._to_string(indent + '  ')
        return s

    def __repr__(self):
        return f"{self._char}{'*' if self.flag_wordend else ''}"

In [41]:
class TernarySearchTree:
    """
    A ternary search tree for storing strings, including support for the empty string.
    """

    def __init__(self):
        """
        Initialize an empty ternary search tree.
        """
        self._root = None
        self._has_empty = False

    def insert(self, string):
        """
        Insert a string into the ternary search tree.

        Parameters
        ----------
        string : str
            The string to insert. Empty string is supported.
        """
        if string == '':
            self._has_empty = True
            return
        if not self._root:
            self._root = TtreeNode(string[0])
        self._root._insert(string)

    def all_strings(self):
        """
        Get all strings stored in the ternary search tree.

        Returns
        -------
        list of str
            All stored strings in lexicographical order.
        """
        words = []
        if self._has_empty:
            words.append('')
        if self._root:
            words.extend(self._root._all_strings())
        return words

    def __len__(self):
        """
        Count the number of strings stored in the ternary search tree.

        Returns
        -------
        int
            Number of stored strings.
        """
        base_len = len(self._root) if self._root else 0
        return base_len + int(self._has_empty)

    def __repr__(self):
        """
        String representation of the ternary search tree.

        Returns
        -------
        str
            A string describing the tree contents.
        """
        rep = ''
        if self._has_empty:
            rep += "<empty string stored>\n"
        return rep + (self._root._to_string('') if self._root else 'empty tree')

    def search(self, prefix):
        """
        Search for all strings in the tree with the given prefix.

        Parameters
        ----------
        prefix : str
            The prefix to search for.

        Returns
        -------
        list of str
            All strings in the tree that start with the prefix.
        """
        if prefix == '':
            return [''] if self._has_empty else []
        node = self._root._psearch(prefix) if self._root else None
        if not node:
            return []
        if node._eq:
            return node._eq._all_strings(prefix)
        return [prefix] if node.flag_wordend else []

# Example usage

Create a new empty ternery search tree.

In [42]:
tst = TernarySearchTree()

In [43]:
print(tst)

empty tree


Insert the string `'abc'` into the tree.

In [44]:
tst.insert('abc')

Display the tree.

In [45]:
print(tst)

char: a, Terminates: False
  char: b, Terminates: False
    char: c*, Terminates: True


Insert another string `'aqt'`.

In [46]:
tst.insert('aqt')

In [47]:
print(tst)

char: a, Terminates: False
  char: b, Terminates: False
    char: c*, Terminates: True
    char: q, Terminates: False
      char: t*, Terminates: True


The tree should now contain two strings.

In [48]:
len(tst)

2

In [49]:
tst.all_strings()

['abc', 'aqt']

Search for the string `'ab'`, it should be found since it is a prefix of `'abc'`.

In [50]:
tst.search('ab')

['abc']

The string `'ac'` should not be found.

In [51]:
tst.search('ac')

[]

The tree can also contain the empty string.

In [52]:
tst.insert('')

In [53]:
len(tst)

3

In [54]:
print(tst)

<empty string stored>
char: a, Terminates: False
  char: b, Terminates: False
    char: c*, Terminates: True
    char: q, Terminates: False
      char: t*, Terminates: True


In [55]:
tst.all_strings()

['', 'abc', 'aqt']

# Testing

In this project we are supposed to implement similarly to the Btree. Also, do proper testing that your algorithm actually works as expected. Also, pay attention to Corner cases: what happens if I have an empty ternary tree, does the right thing happen (i.e. do I get correct values for the length, do I get correct values for the strings that are sorted in there, etc...). Make sure to have tests in place to test for these cases. The third thing is the performance test: how does it scale with increasing number of words stored in it, how long does it take to build the ternary search tree, also how long does it take to find stuff (in worst case too). Basically, similar tests as we did for the binary tree or the sorting algorithm.

The whole thing is supposed to be implemented using version control. Hence, everything lives in a Github repository. Also there should be documentation in your implementation, just as the documentation seen in the sorting thing.   

Discussion should also be there, i.e., you see something -> comment on what you see. Try to explain what you see, whether you expect it. 

Create a new Ternary-search-tree and insert some words.

In [56]:
tst = TernarySearchTree()
with open('data/search_trees/insert_words.txt') as file:
    words = [
        line.strip() for line in file
    ]
for word in words:
    tst.insert(word)
unique_words = set(words)

Verify the length of the data stucture.

In [57]:
assert len(tst) == len(unique_words), \
       f'{len(tst)} in tree, expected {len(unique_words)}'

Verify that all words that were inserted can be found.

In [58]:
for word in unique_words:
    assert tst.search(word), f'{word} not found'

Verify that all prefixes can be found.

In [65]:
for word in unique_words:
    for i in range(len(word) - 1, 0, -1):
        prefix = word[:i]
        assert tst.search(prefix), f'{prefix} not found'

Check that when searching for a exact match, only the inserted words are found, and no prefixes.

In [66]:
for word in unique_words:
    for i in range(len(word), 0, -1):
        prefix = word[:i]
        if prefix not in unique_words:
            assert not tst.search(prefix, exact=True), \
                   f'{prefix} found'

TypeError: TernarySearchTree.search() got an unexpected keyword argument 'exact'

Check that the empty string is in the tree (since it is a prefix of any string).

In [67]:
assert tst.search(''), 'empty string not found'

AssertionError: empty string not found

Check that the empty string is not in the tree for an exact search.

In [62]:
assert not tst.search('', exact=True), 'empty string found'

TypeError: TernarySearchTree.search() got an unexpected keyword argument 'exact'

Check that words in the file `data/search_trees/not_insert_words.txt` can not be found in the tree.

In [63]:
with open('data/search_trees/not_insert_words.txt') as file:
    for line in file:
        word = line.strip()
        assert not tst.search(word), f'{word} should not be found'

Check that all strings are returned.

In [64]:
all_strings = tst.all_strings()
assert len(all_strings) == len(unique_words), \
       f'{len(all_strings)} words, expected {len(unique_words)}'
assert sorted(all_strings) == sorted(unique_words), 'words do not match'

If not output was generated, all tests have passed.