# Trie Implementation and Performance

## Very Basic Trie Implementation
The first approach might look something like the code below
* Each node stores a letter, a map, from `c` to all child nodes, and a color.

![](images/color.png)

In [None]:
public class TrieSet {
    private static final int R = 128; // How many children are expected
    private Node root; // root of trie
    
    private static class Node {
        private char ch; // Every node has a character
        private boolean isKey; // indicates blue or white
        
        // A map from a character to some value type. 
        // For example, the letter 'a' on the right side has 3
        // children: d, m, p
        private DataIndexedCharMap next;
        private Node(char c, boolean b, int R) {
            ch = c; isKey = b;
            next = new DataIndexedCharMap<Node>(R);
        }
    }
}

## Zooming in On a Node

Let's try to do a box-and-pointer diagram for a `DataIndexedCharMap`.
![](images/zoom.png)

The letter `a` contains a `next` pointer that points to an object that links to all its children. The link at `119` (ASCII of `w` is 119) leads to the node that contains `w`. `w`'s next also contains a bunch of links to its children.

This is fundamentally what a `Trie` looks like. 

A diagram representation is as the following,

![](images/zoom2.png)

Imagine `a` has 128 links where only one is used, and the rest are `null`.

## Very Basic Trie Implementation
If we use a `DataIndexedCharMap` to track children, every node has R links.

![](images/basic.png)

Notice that there's a redundancy:
* When we follow the `a` link, we ended up with a node containing letter `a`.
* When we follow the `w` link, we ended up with a node containing the letter `w`

 We can remove the letters from the representation and things will work just fine!
 
 ![](images/basic2.png)

## Trie Performance in Terms of `N`

Given a Trie with `N` keys. Note: `N` is not the number of nodes. It's the number of keys (e.g. "sam", "sad", "same", "sap", etc.)

What is the runtime for `add` and `contains`?

**Ans**: Both of them are constant $\Theta(1)$

Suppose we have billions of items and we try to `contains("potato")`. We start at the root, find "p-o-t-a-t-o", then done. Runtime independent of number of keys.

Or in terms of `L`, the length of key,
* `add`: $\Theta(L)$
* `contains`: $O(L)$

When our keys are strings, Trie give us slightly better performance on `contains` and `add`

![](images/performance.png)

The downside of this `DictCharKey` based `Trie` is the huge memory cost of storing R links per node.
* Recall each character has 127 unused links in our ASCII implementation
* Wasteful since most links are unusued in real world usage.