Trie is an efficient information reTrieval data structure. Using Trie, search complexities can be brought to optimal limit (key length). If we store keys in binary search tree, a well balanced BST will need time proportional to M * log N, where M is maximum string length and N is number of keys in tree. Using Trie, we can search the key in O(M) time. However the penalty is on Trie storage requirements

Every node of Trie consists of multiple branches. Each branch represents a possible character of keys. We need to mark the last node of every key as end of word node. A Trie node field isEndOfWord is used to distinguish the node as end of word node.

Inserting a key into Trie is a simple approach. Every character of the input key is inserted as an individual Trie node. Note that the children is an array of pointers (or references) to next level trie nodes. The key character acts as an index into the array children. If the input key is new or an extension of the existing key, we need to construct non-existing nodes of the key, and mark end of the word for the last node. If the input key is a prefix of the existing key in Trie, we simply mark the last node of the key as the end of a word. The key length determines Trie depth.

Searching for a key is similar to insert operation, however, we only compare the characters and move down. The search can terminate due to the end of a string or lack of key in the trie. In the former case, if the isEndofWord field of the last node is true, then the key exists in the trie. In the second case, the search terminates without examining all the characters of the key, since the key is not present in the trie.

<strong>Insert and search costs O(key_length), however the memory requirements of Trie is O(ALPHABET_SIZE * key_length * N) where N is number of keys in Trie.</strong>

# Trie | (Insert and Search)

In [1]:
class TrieNode:
    def __init__(self):
        self.children=[None]*26
        self.isEndOfWord=False

class Trie:
    def __init__(self):
        self.root=self.getTrieNode()

    def getTrieNode(self):
        return TrieNode()

    def getIndex(self,ch):
        return ord(ch)-ord('a')

    def insert(self,key):
        root=self.root
        l=len(key)
        for i in range(l):
            index=self.getIndex(key[i])
            if not root.children[index]:
                root.children[index]=self.getTrieNode()
            root=root.children[index]
        root.isEndOfWord=True

    def searchNode(self,key):
        root=self.root
        l=len(key)
        for i in range(l):
            index=self.getIndex(key[i])
            if not root.children[index]:
                return False
            root=root.children[index]
        return root!=None and root.isEndOfWord

if __name__ == '__main__':
    keys=["the","a","there","anaswe","any",
            "by","their"]
    t=Trie()
    for key in keys:
        t.insert(key)
    keysForSearch=['the','these','their','thaw']
    for key in keysForSearch:
        print(t.searchNode(key))


True
False
True
False


# Trie | (Delete)

During delete operation we delete the key in bottom up manner using recursion. The following are possible conditions when deleting key from trie, 

Key may not be there in trie. Delete operation should not modify trie.

Key present as unique key (no part of key contains another key (prefix), nor the key itself is prefix of another key in trie). Delete all the nodes.

Key is prefix key of another long key in trie. Unmark the leaf node.

Key present in trie, having atleast one other key as prefix key. Delete nodes from end of key until first leaf node of longest prefix key.

In [1]:
class TrieNode:
    def __init__(self):
        self.children=[None]*26
        self.isEndOfWord=False

class Trie:
    def __init__(self):
        self.root=self.getTrieNode()

    def getTrieNode(self):
        return TrieNode()

    def getIndex(self,ch):
        return ord(ch)-ord('a')

    def insert(self,key):
        root=self.root
        l=len(key)
        for i in range(l):
            index=self.getIndex(key[i])
            if not root.children[index]:
                root.children[index]=self.getTrieNode()
            root=root.children[index]
        root.isEndOfWord=True

    def searchNode(self,key):
        root=self.root
        l=len(key)
        for i in range(l):
            index=self.getIndex(key[i])
            if not root.children[index]:
                return False
            root=root.children[index]
        return root!=None and root.isEndOfWord

    def isEmpty(self,root):
        for i in range(26):
            if root.children[i]!=None:
                return False
        return True

    def deleteKey(self,root,key,depth):
        if root is None:
            return None

        if depth==len(key):
            if root.isEndOfWord:
                root.isEndOfWord=False
            if self.isEmpty(root):
                root=None
            return root

        index=self.getIndex(key[depth])
        root.children[index]=self.deleteKey(root.children[index],key,depth+1)

        if self.isEmpty(root) and root.isEndOfWord==False:
            root=None
        return root


if __name__ == '__main__':
    trie=Trie()
    keys=["the","a","there","bye","anaswe","any","by","their","cat"]
    for key in keys:
        trie.insert(key)
    for key in keys:
        print(f"{key} -> {trie.searchNode(key)}")
    print()
    trie.deleteKey(trie.root, "cat", 0)
    for key in keys:
        print(f"{key} -> {trie.searchNode(key)}")
    print()
    trie.deleteKey(trie.root, "thy", 0)
    for key in keys:
        print(f"{key} -> {trie.searchNode(key)}")
    print()
    trie.deleteKey(trie.root, "by", 0)
    for key in keys:
        print(f"{key} -> {trie.searchNode(key)}")
    print()
    trie.deleteKey(trie.root, "there", 0)
    for key in keys:
        print(f"{key} -> {trie.searchNode(key)}")
    print()


the -> True
a -> True
there -> True
bye -> True
anaswe -> True
any -> True
by -> True
their -> True
cat -> True

the -> True
a -> True
there -> True
bye -> True
anaswe -> True
any -> True
by -> True
their -> True
cat -> False

the -> True
a -> True
there -> True
bye -> True
anaswe -> True
any -> True
by -> True
their -> True
cat -> False

the -> True
a -> True
there -> True
bye -> True
anaswe -> True
any -> True
by -> False
their -> True
cat -> False

the -> True
a -> True
there -> False
bye -> True
anaswe -> True
any -> True
by -> False
their -> True
cat -> False



# Count Number of Strings with given Prefix

Maintain a prefix Count inside the Trie Node

In [2]:
class TrieNode:
    def __init__(self):
        self.children=[None]*26
        self.pc=1

class Trie:
    def __init__(self):
        self.root=self.getTrieNode()

    def getTrieNode(self):
        return TrieNode()

    def getIndex(self,ch):
        return ord(ch)-ord('a')

    def insert(self,key):
        root=self.root
        l=len(key)
        for i in range(l):
            index=self.getIndex(key[i])
            if not root.children[index]:
                root.children[index]=self.getTrieNode()
            else:
                root.children[index].pc+=1
            root=root.children[index]

    def getPrefixCount(self,prefix):
        root=self.root
        l=len(prefix)
        for i in range(l):
            index=self.getIndex(prefix[i])
            if not root.children[index]:
                return 0
            root=root.children[index]
        return root.pc

if __name__ == '__main__':
    keys=['abac','abaa','abab','aabb','aabc']
    t=Trie()
    for key in keys:
        t.insert(key)
    prefixes=['ab','aba','abaa','ac','aa']
    for i in prefixes:
        print(f"Prefixes with {i} are {t.getPrefixCount(i)}")


Prefixes with ab are 3
Prefixes with aba are 3
Prefixes with abaa are 1
Prefixes with ac are 0
Prefixes with aa are 2


# Longest prefix matching

In [3]:
class TrieNode:
    def __init__(self):
        self.children=[None]*26
        self.isEndOfWord=False

class Trie:
    def __init__(self):
        self.root=self.getTrieNode()

    def getTrieNode(self):
        return TrieNode()

    def getIndex(self,ch):
        return ord(ch)-ord('a')

    def insert(self,key):
        root=self.root
        l=len(key)
        for i in range(l):
            index=self.getIndex(key[i])
            if not root.children[index]:
                root.children[index]=self.getTrieNode()
            root=root.children[index]
        root.isEndOfWord=True

    def getLongestPrefix(self,word):
        root=self.root
        l=len(word)
        result=""
        curr=""
        for i in range(l):
            index=self.getIndex(word[i])
            if not root.children[index]:
                break
            else:
                curr+=word[i]
                if root.children[index].isEndOfWord:
                    result=curr
            root=root.children[index]
        return result if result else "No Prefix Found"

if __name__ == '__main__':
    keys=['are','area','base','cat','cater','children','basement']
    t=Trie()
    for key in keys:
        t.insert(key)
    word="basemexy"
    # word="caterer"
    # word="child"
    print(t.getLongestPrefix(word))


base


# Find shortest unique prefix for every word in a given list

In [4]:
class TrieNode:
    def __init__(self):
        self.children=[None]*26
        self.pc=1

class Trie:
    def __init__(self):
        self.root=self.getTrieNode()

    def getTrieNode(self):
        return TrieNode()

    def getIndex(self,ch):
        return ord(ch)-ord('a')

    def insert(self,key):
        root=self.root
        l=len(key)
        for i in range(l):
            index=self.getIndex(key[i])
            if not root.children[index]:
                root.children[index]=self.getTrieNode()
            else:
                root.children[index].pc+=1
            root=root.children[index]

    def getShortestUniquePrefix(self,key):
        root=self.root
        l=len(key)
        currPrefix=""
        for i in range(l):
            index=self.getIndex(key[i])
            if not root.children[index]:
                return None
            else:
                currPrefix+=key[i]
                if root.children[index].pc==1:
                    return currPrefix
            root=root.children[index]

if __name__ == '__main__':
    trie=Trie()
    keys=["zebra","dog","duck","dove"]
    for key in keys:
        trie.insert(key)
    for key in keys:
        print(f"Unique Prefix for {key} is {trie.getShortestUniquePrefix(key)}")


Unique Prefix for zebra is z
Unique Prefix for dog is dog
Unique Prefix for duck is du
Unique Prefix for dove is dov
