<p>Strings can essentially be viewed as the most important and common topics for a variety of programming problems. String processing has a variety of real world applications too, such as:</p>
<ul>
<li>Search Engines   </li>
<li>Genome Analysis   </li>
<li>Data Analytics  </li>
</ul>
<p>All the content presented to us in textual form can be visualized as nothing but just strings.  </p>
<p><strong>Tries:</strong> </p>
<p>Tries are an extremely special and useful data-structure that are based on the <em>prefix of a string</em>. They are used to represent the “Re<strong>trie</strong>val” of data and thus the name Trie.   </p>
<p><strong>Prefix : What is prefix:</strong> </p>
<p>The prefix of a string is nothing but any $$n$$ letters $$n \le |S|$$ that can be considered beginning strictly from the starting of a string. For example , the word “abacaba” has the following prefixes:  </p>
<p>a<br />
ab<br />
aba<br />
abac<br />
abaca<br />
abacab  </p>
<p>A Trie is a special data structure used to store strings that can be visualized like a graph. It consists of nodes and edges. Each node consists of at max 26 children and edges connect each parent node to its children. These 26 pointers are nothing but pointers for each of the 26 letters of the English alphabet A separate edge is maintained for every edge.  </p>
<p>Strings are stored in a top to bottom manner on the basis of their prefix in a trie. All prefixes of length 1 are stored at until level 1, all prefixes of length 2 are sorted at until level 2 and so on.   </p>
<p>For example , consider the following diagram :
<img alt="enter image description here" src="https://he-s3.s3.amazonaws.com/media/uploads/fb14630.png" /></p>
<p>Now, one would be wondering why to use a data structure such as a trie for processing a single string? Actually, Tries are generally used on groups of strings, rather than a single string. When given multiple strings , we can solve a variety of problems based on them. For example, consider an English dictionary and a single string $$s$$, find the prefix of maximum length from the dictionary strings matching the string $$s$$. Solving this problem using  a naive approach would require us to match the prefix of the given string with the prefix of every other word in the dictionary and note the maximum. The is an expensive process considering the amount of time it would take. Tries can solve this problem in much more efficient way.    </p>
<p>Before processing each Query of the type where we need to search the length of the longest prefix, we first need to add all the existing words into the dictionary.  A Trie consists of a special node called the root node. This node doesn't have any incoming edges. It only contains 26 outgoing edfes for each letter in the alphabet and is the root of the Trie.  </p>
<p>So, the insertion of any string into a Trie starts from the root node. All prefixes of length one are direct children of the root node. In addition, all prefixes of length 2 become children of the nodes existing at level one.  </p>
<p>The pseudo code for insertion of a string into a tire would look as follows:  </p>

In [1]:
class TrieMap(object):
    """ Trie implementation of a map.  Associating keys (strings or other
        sequence type) with values.  Values can be any type. """
    
    def __init__(self, kvs):
        self.root = {}
        # For each key (string)/value pair
        for (k, v) in kvs: self.add(k, v)
    
    def add(self, k, v):
        """ Add a key-value pair """
        cur = self.root
        for c in k: # for each character in the string
            if c not in cur:
                cur[c] = {} # if not there, make new edge on character c
            cur = cur[c]
        cur['value'] = v # at the end of the path, add the value
    
    def query(self, k):
        """ Given key, return associated value or None """
        cur = self.root
        for c in k:
            if c not in cur:
                return None # key wasn't in the trie
            cur = cur[c]
        # get value, or None if there's no value associated with this node
        return cur.get('value')

In [2]:
mp = TrieMap([("hello", "value 1"), ("there", 2), ("the", "value 3")])
mp.query("hello")

'value 1'

In [3]:
mp.query("hello there") # returns None

In [4]:
mp.query("there")

2

In [5]:
mp.query("the")

'value 3'

### Suffix Trie

In [6]:
class SuffixTrie(object):
    """ Encapsulates a suffix trie of a provided string t """
    
    def __init__(self, t):
        """ Make suffix trie from t """
        t += '$'  # terminator symbol
        self.root = {}
        for i in range(len(t)):  # for each suffix
            cur = self.root
            for c in t[i:]:  # for each character in i'th suffix
                if c not in cur:
                    cur[c] = {}  # add outgoing edge if necessary
                cur = cur[c]
    
    def follow_path(self, s):
        """ Follow path given by characters of s.  Return node at
            end of path, or None if we fall off. """
        cur = self.root
        for c in s:
            if c not in cur:
                return None  # no outgoing edge on next character
            cur = cur[c]  # descend one level
        return cur
    
    def has_substring(self, s):
        """ Return true if s appears as a substring of t """
        return self.follow_path(s) is not None
    
    def has_suffix(self, s):
        """ Return true if s is a suffix of t """
        node = self.follow_path(s)
        return node is not None and '$' in node
    
    def to_dot(self):
        """ Return dot representation of trie to make a picture """
        lines = []
        def _to_dot_helper(node, parid):
            childid = parid
            for c, child in node.items():
                lines.append('  %d -> %d [ label="%s" ];' % (parid, childid+1, c))
                childid = _to_dot_helper(child, childid+1)
            return childid
        lines.append('digraph "Suffix trie" {')
        lines.append('  node [shape=circle label=""];')
        _to_dot_helper(self.root, 0)
        lines.append('}')
        return '\n'.join(lines) + '\n'

In [7]:
strie = SuffixTrie('there would have been a time for such a word')

In [8]:
strie.has_substring('nope')

False

In [9]:
strie.has_substring('would have been')

True

In [10]:
strie.has_substring('such a word')

True

In [11]:
strie.has_suffix('would have been')

False

In [12]:
%dotstr SuffixTrie("CAT").to_dot()

UsageError: Line magic function `%dotstr` not found.
