<a href="https://colab.research.google.com/github/ssuzana/Data-Structures-and-Algorithms-Notebooks/blob/main/07_Tries.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The **trie** is a kind of tree that is ideal for text-based features such as autocomplete. The word *trie* is derived from the word *retrieval*.
Most people pronounce trie as "try".

**The Trie Node** Like most other trees, the trie is a collection of nodes that point to other nodes. However, the trie is not a binary tree.

In our implementation, each trie node contains a hash table, where the keys are English characters and the values are the other nodes of the trie.

In [None]:
class TrieNode:
  def __init__(self):
    self.children = {}

# the Trie class keeps track of self.root variable that points to root node
class Trie:
  def __init__(self):
    self.root = TrieNode()    

#**Trie Search**

The algorithm for **prefix search** (i.e. search to see whether a string is the beginning of a word). This search will end up finding complete words as well, since a complete word is at least as good as a prefix.

1.   We establish a variable called `currentNode`. At the beginning of the algorithm, this points to the root node.
2.   We iterate over each character of our search string.
3.   As we point to each character of our search string, we look to see if the `currentNode` has a child with that character key.
4.   If it does not, we return `None`, as it means our search string does not exist in the trie.
5.   If the `currentNode` does have a child with the current character as the key, we update the `currentNode` to become that child. We then go back to step 2, continuing to iterate over each character in our search string.
6.   If we get to the end of our search string, it means we've found our search string.







In [None]:
def search(self, word):
  currentNode = self.root

  for char in word:
    # If the current node has child key with current character:
    if currentNode.children.get(char):
      # Follow the child node:
      currentNode = currentNode.children[char]

  else:
    # If the current character isn't found among
    # the current node's children, our search word
    # must not be in the trie:
    return None

  return currentNode       

**Time Complexity of Trie Search**
We focus on each character of our search string one at a time. As we do so, we use each node's hash table to find the appropriate child node in one step (a hash table lookup takes $O(1)$ time). Then, our algorithm takes as many steps as there are characters in our search string.

Expressing trie search in terms of Big $O$ is slightly tricky. We can't quite call it $O(1)$, since the number of steps is not constant, as it depends on the search string's length. And $O(N)$ is misleading, since $N$ normally refers to the amount of data in the data structure. That would be the number of nodes in our trie, which is a much greater number than the number of characters in our search string.


---


Most references have decided to call this $O(K)$, where $K$ is the number of characters in our search string.


---


Even though $O(K)$ is not constant, it is similar to constant time in one important sense. Our trie can grow tremendously, but that will have no affect on the speed of our search. An $O(K)$ algorithm on a string of three characters will always take three steps, no matter how large the trie is.

#**Trie Insertion** 

Inserting a new word into a trie is similar to searching for an existing word.

1.   We establish a variable called `currentNode`. At the beginning of the algorithm, this points to the root node.
2.   We iterate over each character of our search string.
3.   As we point to each character of our search string, we look to see if the `currentNode` has a child with that character as a key.
4.   If it does, we update the `currentNode` to become that child node and we go back to step 2, moving on to the next character of our search string.
5.   If `currentNode` does not have a child node that matches the current character, we create such a child node and update `currentNode` to be this new node. We then go back to step 2, moving on to the next character of our search string.
6.   After we instert the final character of our new word, we add a `*` child to the last node to indicate the word is complete.

Like search, trie insertion takes about $O(K)$ steps. If we count the adding of the "*" at the end, it's technically $K+1$ steps, but because we drop the constants, we express the speed as $O(K)$.

In [None]:
def insert(self, word):
  currentNode = self.root

  for char in word:
    if currentNode.children.get(char):
      currentNode = currentNode.children[char]
    else:
      # If the current character isn't found among 
      # the current node's children,
      # we add the character as a new child node:
      newNode = TrieNode()
      currentNode.children[char] = newNode

      currentNode = newNode  

  # After insterting the entire word into the trie,
  # we add a "*" key at the end:
  currentNode.children["*"] = None   

#**Autocomplete**

The following method, called `collectAllWords`, colects a list of all the trie's words starting from a particular node.

This method accepts three arguments:

> The first is the `node` whose descendents we're collecting words from. 

> The second argument, `word`, begins as an empty string, and we add characters to it as we move through the trie. 

> The third argument, `words`, begins as an empty array, and by the end of the function will contain all the words from the trie.

In [None]:
def collectAllWords(self, node=None, word="", words=[]):
  # The current node is the node passed in as the first parameter,
  # or the root node if none is provided:
  currentNode = node or self.root

  # We iterate through all the current node's children:
  for key, childNode in currentNode.children.items():
    # If the current key is *, it means we hit the end of a 
    # complete word, so we can add it to our words array:
    if key == "*":
      words.append(word)
    else:
      # If we're still in the middle of a word,
      # we recursively call this function on the child node.
      self.collectAllWords(childNode, word + key, words)
  return words       

The `autocomplete` method accepts the `prefix` parameter, which is the string of characters the user begins typing in.

> First, we search the trie for the existence of the `prefix`.

> If the search method doesn't find the prefix in the trie, it returns `None`. 

> However, if the prefix is found in the trie, the `search` method returns the node in the trie that represents the final character in the prefix.

> Our `autocomplete` method continues by calling the `collectAllWords`
method on the node returned by the `search` method.

> Our method finally returns an array of all possible endings to the user's prefix, which we could then display to the user as possible autocomplete options.



In [None]:
def autocomplete(self, prefix):
  currentNode = self.search(prefix)
  if not currentNode:
    return None
  return self.collectAllWords(currentNode)  

#**Leetcode 208. Implement Trie (Prefix Tree)** `Medium`
A trie (pronounced as "try") or prefix tree is a tree data structure used to efficiently store and retrieve keys in a dataset of strings. There are various applications of this data structure, such as autocomplete and spellchecker.

Implement the Trie class:

`Trie()` Initializes the trie object.

void `insert`(String word) Inserts the string word into the trie.

boolean `search`(String `word`) Returns `true` if the string word is in the trie (i.e., was inserted before), and `false` otherwise.

boolean `startsWith`(String `prefix`) Returns `true` if there is a previously inserted string word that has the prefix prefix, and `false` otherwise.

In [None]:
class TrieNode:
    def __init__(self):
         self.children = {}
            
class Trie:

    def __init__(self):
        self.root = TrieNode()
       
    def insert(self, word: str) -> None:
        cur = self.root
        
        for char in word:
            if cur.children.get(char):
                cur = cur.children[char]
            else:
                newNode = TrieNode()
                cur.children[char] = newNode
                cur = newNode
        cur.children["*"] = None        
         

    def search(self, word: str) -> bool:
        cur = self.root
        
        for char in word:
            if cur.children.get(char):
                cur = cur.children[char]
            else:
                return False
                
        return "*" in cur.children

    def startsWith(self, prefix: str) -> bool:
        cur = self.root
        
        for char in prefix:
            if cur.children.get(char):
                cur = cur.children[char]
            else:
                return False
                
        return True 

In [None]:
# Example 1:
# Input
# ["Trie", "insert", "search", "search", "startsWith", "insert", "search"]
# [[], ["apple"], ["apple"], ["app"], ["app"], ["app"], ["app"]]
# Output
# [null, null, true, false, true, null, true]
trie = Trie()
trie.insert("apple")
trie.search("apple")
trie.search("app")
trie.startsWith("app")
trie.insert("app")
trie.search("app")

True

#**Leetcode 211. Design Add and Search Words Data Structure** `Medium`
Design a data structure that supports adding new words and finding if a string matches any previously added string.

Implement the `WordDictionary` class:

* `WordDictionary()` Initializes the object.
void `addWord(word)` Adds word to the data structure, it can be matched later.
* bool `search(word)` Returns `true` if there is any string in the data structure that matches word or `false` otherwise.
* `word` may contain dots '.' where dots can be matched with any letter.





In [None]:
class TrieNode:
  def __init__(self):
    self.children = {}
    self.isWord = False

class WordDictionary:

    def __init__(self):
        self.root = TrieNode()  

    def addWord(self, word: str) -> None:
      cur = self.root

      for letter in word:
        if letter not in cur.children:
          newNode = TrieNode()
          cur.children[letter] = newNode

        cur = cur.children[letter]

      cur.isWord = True
    
    def search(self, word: str) -> bool:

      def dfs(idx, root):
        cur = root

        for i in range(idx, len(word)):
          char = word[i]
          if char == ".":
            for child in cur.children.values():
              if dfs(i+1, child):
                return True
            return False
          else:
            if char not in cur.children:
              return False
            cur =  cur.children[char]
        return cur.isWord

      return dfs(0, self.root)   
          

In [None]:
# Input
# ["WordDictionary","addWord","addWord","addWord","search","search","search","search"]
# [[],["bad"],["dad"],["mad"],["pad"],["bad"],[".ad"],["b.."]]
# Output
# [null,null,null,null,false,true,true,true]

wordDictionary = WordDictionary()
wordDictionary.addWord("bad")
wordDictionary.addWord("dad")
wordDictionary.addWord("mad")
wordDictionary.search("pad") # return False
wordDictionary.search("bad") # return True
wordDictionary.search(".ad") # return True
wordDictionary.search("b..") # return True

True