# Design Application - Spell Checker


## Algorithm Design Canvas

Every design question needs to be handled by following predefined [Design Canvas](https://www.hiredintech.com/the-algorithm-design-canvas.pdf) 

Before getting into the code, 
1. Identify constraints of the design by asking more questions to the interviewer
2. Come up with the possible ideas
3. Identify time and space constraints
4. Identify test cases including edge cases
5. Now get into the code


![image](https://user-images.githubusercontent.com/2688478/33750976-527a7a58-db8c-11e7-9b6f-41dfb727b7ee.png)



## Question 
Design a Spell Checker

Lets approach the Spell Checker problem by going over the design canvas steps

### Constraints
1. Spell checker will store UTF8 character set words
2. Will check for the validity of the word after the word has been typed
3. False positive not allowed (Word is not valid but identified to be a valid word)  
4. Memory / space requirement / constraint:
5. Time requirement / constraint: O(k) k == length of the longest word in the dictionary


### Use Cases
1. User types the word in a document and spell checker checks if the word is valid
2. Spell checker accepts a String as an input and returns true or false based on if it is valid or not valid.

### Possible Solutions
<span style="color:red">1. Hashtable </span>   
Store all possible words in the English language dictionary in the hashtable. When a word is typed in the application, that word is searched in the hashtable. If the word is found in the hashtable, the word is correct, else the word is incorrect.  
When the word is completely typed, we calculate hash value for the word based on the hash function.  
For this Hash value, a location in the hash table is searched (any conflicts are resolved) and if the value is found at the location, the word is considered to be correct.  
Search, Insert and Delete  
 - Best Case Time Taken = O(1) --> Privided we have a uniform hash function 
 - Worst Case Time Taken = O(N) --> this is when Hash function puts all the values into a single bucket and we endup searching all elements one by one using separate chaining.

<span style="color:red">2. Tries </span>   
If Hashtable provides such a good performance then why do we even talk about any other data structure

That is because, 
 - for the best performance we assume that the hash function is a uniform hashing function. That is, hash function uniformly distributes the input into different bucket avoiding any collision.
 - In the worst case scenario, the time taken is proportional to the length of the input O(N)
 - Also hash table does not support **Ordered Operations** like rank and floor etc
 
So can we do better  
The answer is yes, if we can avoid examining the entire key, as with the String sorting.

![image](https://user-images.githubusercontent.com/2688478/33792683-3f3d468a-dc5b-11e7-8c2d-630623597a77.png)
Typically Search miss is faster than search hit because we might find out that the key does not exists before even we are done with all the characters in the input.

We store each word in the English language in a **Ternary Search Trie**. When searching for the word, we search each character of the word in the trie. If the entire word exists, the spelling of the word is correct else incorrect.


### Test Cases
TODO


### Performance Improvement

In a regular R-Way tries, each node stores reference array of all possible characters that can appear in the trie.  
So for characters with UTF-8 encoding, we need an array of size 256. For UTF-16 (Unicode) it would be 65K. This results into significant amount of wasted space.  
Ths solution is to use **Ternary Search Tries** with 3 nodes.

### Advantages of TST over Hashing
- TST is fast as hashing but space efficient. Faster than hashing especially for search misses.
- Hashing requires examing entire keys. TST does not require examining entire key for misses.
- Hashing performance heavily relies on Hash function.
- Hash function like TST does not support ordered operations.
- In practice a **Hybrid of R-Way tries and TST** is setup R-way branching at the root of the TST. It takes care of 2 characters and then it becomes smaller TSTs.
![image](https://user-images.githubusercontent.com/2688478/33792942-7e830bc0-dc62-11e7-927c-c101e5a975e8.png)

### TST disadvantages
- Only works with strings (or digital keys)

In [1]:
public class SpellChecker {
    private Node root;
    
    private class Node {
        private char c;
        private boolean val;
        private Node left, mid, right;
    }
    
    public void loadTrie(String[] data) {
        for(String a: data) 
            put(a, Boolean.TRUE);
    }
    
    public void put(String key, boolean val) {
        root = put (root, key, val, 0);
    }
    
    private Node put (Node x, String key, boolean val, int d) {
        
        char c = key.charAt(d);
        if (x == null) { x = new Node(); x.c = c;}
        if      (c < x.c)              x.left = put (x.left, key, val, d);
        else if (c > x.c)              x.right = put (x.right, key, val, d);
        else if (d < key.length() - 1) x.mid = put (x.mid, key, val, d+1);
        else                           x.val = val;
        
        return x;
    }
    
    public boolean isValid(String key) {
        return get(key);
    }
    
    public boolean get(String key){
        Node x = get (root, key, 0);
        if(x == null) return false;
        return x.val;
    } 
    
    private Node get (Node x, String key, int d){
        if(x == null) return null;
        char c = key.charAt(d);
        if (c < x.c)                  return get (x.left, key, d);
        else if (c > x.c)             return get (x.right, key, d);
        else if (d < key.length() -1) return get (x.mid, key, d+1);
        else                          return x;
    }
    
    
}

com.twosigma.beaker.javash.bkrf2f7e762.SpellChecker

In [2]:
String[] a = {"test", "this", "line"}; 
SpellChecker sc = new SpellChecker();
sc.loadTrie(a);

System.out.println("is word \"test\" valid: " + sc.isValid("test"));
System.out.println("is word \"line\" valid: " + sc.isValid("line"));
System.out.println("is word \"missing\" valid: " + sc.isValid("missing"));

is word "test" valid: true
is word "line" valid: true
is word "missing" valid: false


null

## Performance
![image](https://user-images.githubusercontent.com/2688478/33792923-fa33cf1c-dc61-11e7-96e4-b5d3ac8d203b.png)