# Performance Demonstration of Ternary Search Trees (TST)

This notebook evaluates the performance of Ternary Search Trees across key operations on datasets of varying sizes. Each section contains:

In [3]:
import random
import time
import matplotlib.pyplot as plt
from ternary_search_tree import TernarySearchTree


In [2]:
# Load English words from file and strip whitespace
with open('data/search_trees/corncob_lowercase.txt') as file:
    words = [line.strip() for line in file]

## Small-Sized TST
### Insertion Performance
Let's start inserting a small dataset into our TST (1000 words)

In [52]:
sizes = [100, 500, 1000, 5000]  # Sizes of trees to build
nr_runs = 10  # Number of benchmark runs to average timing results

# create a list of random samples for each size
samples = [
    random.sample(words, k=size) for size in sizes
]

Let's now build trees with the prespecified sizes:

In [None]:
insert_sample = random.sample(words, k=20)  # Fixed 20-word sample to test insert time
print(insert_sample)
insert_times = {}
all_times = {}

# Benchmark insert performance as tree size increases
for sample in samples:
    all_runs = []
    # Measure the time to insert 20 new words (multiple runs for averaging)
    insert_times[len(sample)] = 0.0
    for i in range(nr_runs):
        tst = TernarySearchTree()

        random.shuffle(sample)
        # First, build the tree with the sample size
        for word in sample:
            tst.insert(word)

        start_time = time.time_ns()
        for word in insert_sample:
            tst.insert(word)
        end_time = time.time_ns()

        insert_times[len(sample)] += end_time - start_time
        all_runs.append((end_time-start_time))

    # Average over number of runs and convert to milliseconds
    insert_times[len(sample)] /= nr_runs * 1_000_000.0
    all_times[len(sample)] = all_runs

print(all_times)

['obeyed', 'troubadours', 'irritable', 'unorthodox', 'securest', 'inclusively', 'peatlands', 'climbdown', 'blacklisted', 'varnishes', 'apprehensions', 'amnesty', 'roving', 'patchable', 'bivouacs', 'massless', 'encloses', 'weaver', 'entrapped', 'relied']
{100: [0, 1001000, 0, 1204300, 0, 0, 0, 999300, 999500, 0], 500: [0, 1018800, 654000, 0, 0, 0, 0, 0, 0, 0], 1000: [0, 546800, 0, 1802300, 1035100, 0, 630900, 519900, 508400, 0], 5000: [1014400, 0, 1280800, 0, 0, 0, 551600, 1010800, 0, 0]}


### Prefix Search Performance



### Exact Search Performance

### All Strings

## Medium-Sized TST

## Large-Sized TST


------------------------------------------------------------------------------------------
Time and Space Complexity of Ternary Search Tree (TST)
------------------------------------------------------------------------------------------

Time Complexity (n = number of words, L = average word length):

- Insertion:
    - Worst Case: O(L * n) → if the tree becomes unbalanced (like a linked list)
    - Average Case: O(L * log n) → assuming a roughly balanced ternary tree
    - Best Case: O(L) → for inserting words into an optimally balanced TST

- Search:
    - Worst Case: O(L * n) → in a degenerate (unbalanced) tree
    - Average Case: O(L * log n)
    - Best Case: O(L)

- Prefix Search:
    - TST supports prefix search naturally and efficiently in O(L) time (to find prefix) + O(k) to collect matching strings.

Space Complexity:

- O(n * L), where:
    - n = number of distinct words
    - L = average length of each word
    - Each character may be stored in a separate node, but space is reused for common prefixes.
- Additional memory for three pointers (left, mid, right) per node.

Comparison:
- Python's set() has O(1) average insert and search time using hash tables, but does not support prefix search.
- TST trades off speed for space efficiency and support for prefix queries.
