Ternary trees is similar to Btrees in spirit but they have a little more data in the sense that they don't store values as such but they store characters. Whereas Btrees() can store anything, a ternary trees is intended to store strings. That's its purpose, so it's a little less general purpose.

A ternary search tree has nodes with the following attributes:
* a character, can be `None`;
* a Boolean flag that indicates whether the character represented
  by this node has been the last in a string that was inserted in the
  tree;
* the "less-than" child;
* the "equals" child and
* the "larger-than" child.

The data structure should support the following operations:
* string insert
* string search
* prefix string search
* return the number of strings stored in the data structure
* return all strings stored in the data structure

Also ensure that an instance of the data structure can be visualy represented, e.g., in aSCII format.

# Implementation

In this case, it is implemented in one/two python files, as a module which we need to write. We can either implement it as a module or do the implementation in the notebook

In [286]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


The data structure has been implemented as a class.

In [287]:
from ternary_search_tree import TernarySearchTree

# Example usage

Create a new empty ternery search tree.

In [288]:
tst = TernarySearchTree()

In [289]:
print(tst)

empty tree


Insert the string `'abc'` into the tree.

In [290]:
tst.insert('abc')

Display the tree.

In [291]:
print(tst)

char: abc, Terminates: False
_lt:  char: a,   Terminates: False
_eq:    char: b,     Terminates: False
_eq:      char: c*,       Terminates: True


Insert another string `'aqt'`.

In [292]:
tst.insert('aqt')

In [293]:
print(tst)

char: abc, Terminates: False
_lt:  char: a,   Terminates: False
_eq:    char: b,     Terminates: False
_eq:      char: c*,       Terminates: True
_gt:      char: q,       Terminates: False
_eq:        char: t*,         Terminates: True


The tree should now contain two strings.

In [294]:
len(tst)

printing length of abc


2

In [295]:
tst.all_strings()

empty string in less than? word: abc
this word has flag wordend: aqt
this word has flag wordend: abc


['aqt', 'abc']

Search for the string `'ab'`, it should be found since it is a prefix of `'abc'`.

In [296]:
tst.search('ab', exact=False)

searching for char a at node abc
searching for char a at node a
searching for char b at node b
this word has flag wordend: abc


['abc']

In [297]:
tst.search('ab', exact=True)

searching for char a at node abc
searching for char a at node a
searching for char b at node b


[]

The string `'ac'` should not be found.

In [306]:
tst.search('ac', exact=False)

searching for char a at node abc
searching for char a at node a
searching for char c at node b
searching for char c at node q


[]

The tree can also contain the empty string.

In [299]:
tst.insert('')

this is an empty string


In [300]:
len(tst)

printing length of abc


2

In [301]:
print(tst.all_strings())

empty string in less than? word: abc
this word has flag wordend: aqt
this word has flag wordend: abc
['', 'aqt', 'abc']


In [305]:
t = TernarySearchTree()
t.insert("")
print(t.all_strings())
len(t)

start ttree with empty string
['']
printing length of *


0

In [179]:
print(tst)

char: abc*, Terminates: True
_lt:  char: a,   Terminates: False
_eq:    char: b,     Terminates: False
_eq:      char: c*,       Terminates: True
_gt:      char: q,       Terminates: False
_eq:        char: t*,         Terminates: True


In [304]:
tst.all_strings()

empty string in less than? word: abc
this word has flag wordend: aqt
this word has flag wordend: abc


['', 'aqt', 'abc']

# Testing

The worst case for the quicksort algorithm is the sorted list. Maybe that's something we should remember if we look at this dataset (a hint from GJB).

In this project we are supposed to implement similarly to the Btree. Also, do proper testing that your algorithm actually works as expected. Also, pay attention to Corner cases: what happens if I have an empty ternary tree, does the right thing happen (i.e. do I get correct values for the length, do I get correct values for the strings that are sorted in there, etc...). Make sure to have tests in place to test for these cases. The third thing is the performance test: how does it scale with increasing number of words stored in it, how long does it take to build the ternary search tree, also how long does it take to find stuff (in worst case too). Basically, similar tests as we did for the binary tree or the sorting algorithm.

The whole thing is supposed to be implemented using version control. Hence, everything lives in a Github repository. Also there should be documentation in your implementation, just as the documentation seen in the sorting thing.   

Discussion should also be there, i.e., you see something -> comment on what you see. Try to explain what you see, whether you expect it. 

Create a new Ternary-search-tree and insert some words.

In [181]:
tst = TernarySearchTree()
with open('data/search_trees/insert_words.txt') as file:
    words = [
        line.strip() for line in file
    ]
for word in words:
    tst.insert(word)
unique_words = set(words)

this string is > 1: combine
this string is > 1: combine
this string is > 1: ombine
this string is > 1: mbine
this string is > 1: bine
this string is > 1: ine
this string is > 1: ne
this string is returned: e
this string is returned: ne
this string is returned: ine
this string is returned: bine
this string is returned: mbine
this string is returned: ombine
this string is returned: combine
this string is returned: combine
this string is > 1: combinations
this string is > 1: combinations
this string is > 1: ombinations
this string is > 1: mbinations
this string is > 1: binations
this string is > 1: inations
this string is > 1: nations
this string is > 1: ations
this string is > 1: ations
this string is > 1: tions
this string is > 1: ions
this string is > 1: ons
this string is > 1: ns
this string is returned: s
this string is returned: ns
this string is returned: ons
this string is returned: ions
this string is returned: tions
this string is returned: ations
this string is returned: ations

Verify the length of the data stucture.

In [182]:
assert len(tst) == len(unique_words), \
       f'{len(tst)} in tree, expected {len(unique_words)}'

printing length of combine*
printing length of combine*


AssertionError: 19 in tree, expected 20

Verify that all words that were inserted can be found.

In [None]:
for word in unique_words:
    assert tst.search(word), f'{word} not found'

searching for char f at node c
searching for char f at node d
searching for char f at node f
searching for char o at node u
searching for char o at node o
searching for char n at node n
searching for char t at node t
searching for char a at node a
searching for char i at node i
searching for char n at node n
searching for char c at node c
searching for char o at node o
searching for char m at node m
searching for char b at node b
searching for char i at node i
searching for char n at node n
searching for char e at node e
searching for char c at node c
searching for char o at node o
searching for char m at node m
searching for char b at node b
searching for char i at node i
searching for char n at node n
searching for char e at node e
searching for char d at node d
searching for char f at node c
searching for char f at node d
searching for char f at node f
searching for char u at node u
searching for char t at node t
searching for char i at node i
searching for char l at node l
searchin

AssertionError: combines not found

Verify that all prefixes can be found.
Needs to be modified

In [None]:
for word in unique_words:
    for i in range(len(word) - 1, 0, -1):
        prefix = word[:i]
        assert tst.search(prefix), f'{prefix} not found'

searching for char f at node c
searching for char f at node d
searching for char f at node f
searching for char o at node u
searching for char o at node o
searching for char n at node n
searching for char t at node t
searching for char a at node a
searching for char i at node i
searching for char f at node c
searching for char f at node d
searching for char f at node f
searching for char o at node u
searching for char o at node o
searching for char n at node n
searching for char t at node t
searching for char a at node a
searching for char f at node c
searching for char f at node d
searching for char f at node f
searching for char o at node u
searching for char o at node o
searching for char n at node n
searching for char t at node t
searching for char f at node c
searching for char f at node d
searching for char f at node f
searching for char o at node u
searching for char o at node o
searching for char n at node n
searching for char f at node c
searching for char f at node d
searchin

Check that when searching for a exact match, only the inserted words are found, and no prefixes.
Check exact=True

In [None]:
for word in unique_words:
    for i in range(len(word), 0, -1):
        prefix = word[:i]
        if prefix not in unique_words:
            assert not tst.search(prefix, exact=True), \
                   f'{prefix} found'

searching for char f at node c
searching for char f at node d
searching for char f at node f
searching for char o at node u
searching for char o at node o
searching for char n at node n
searching for char t at node t
searching for char a at node a
searching for char i at node i
searching for char f at node c
searching for char f at node d
searching for char f at node f
searching for char o at node u
searching for char o at node o
searching for char n at node n
searching for char t at node t
searching for char a at node a
searching for char f at node c
searching for char f at node d
searching for char f at node f
searching for char o at node u
searching for char o at node o
searching for char n at node n
searching for char f at node c
searching for char f at node d
searching for char f at node f
searching for char o at node u
searching for char o at node o
searching for char f at node c
searching for char f at node d
searching for char f at node f
searching for char c at node c
searchin

AssertionError: c found

Check that the empty string is in the tree (since it is a prefix of any string).

In [None]:
assert tst.search(''), 'empty string not found'

searching for char  at node c
searching for char  at node b


AssertionError: empty string not found

Check that the empty string is not in the tree for an exact search.
exact argument

In [None]:
assert not tst.search('', exact=True), 'empty string found'

searching for char  at node c
searching for char  at node b


Check that words in the file `data/search_trees/not_insert_words.txt` can not be found in the tree.

In [None]:
with open('data/search_trees/not_insert_words.txt') as file:
    for line in file:
        word = line.strip()
        assert not tst.search(word), f'{word} should not be found'

searching for char f at node c
searching for char f at node d
searching for char f at node f
searching for char u at node u
searching for char t at node t
searching for char u at node i
searching for char u at node u
searching for char r at node r
searching for char e at node e
searching for char f at node c
searching for char f at node d
searching for char f at node f
searching for char o at node u
searching for char o at node o
searching for char n at node n
searching for char t at node t
searching for char a at node a
searching for char i at node i
searching for char n at node n
searching for char a at node c
searching for char a at node b
searching for char g at node c
searching for char g at node d
searching for char g at node f
searching for char g at node t
searching for char m at node c
searching for char m at node d
searching for char m at node f
searching for char m at node t
searching for char t at node c
searching for char t at node d
searching for char t at node f
searchin

Check that all strings are returned.

In [None]:
all_strings = tst.all_strings()
assert len(all_strings) == len(unique_words), \
       f'{len(all_strings)} words, expected {len(unique_words)}'
assert sorted(all_strings) == sorted(unique_words), 'words do not match'

AssertionError: 19 words, expected 20

If not output was generated, all tests have passed.