# Autocomplete

## The Autocomplete Problem
Example, when we type "how are" into Google, we get 10 suggestions right away, "how are you", "how are you in spanish", "how are you doing", etc.

One way to do this is to create a Trie-based map from strings to values
* Value represents the priority - how important Google thinks that string is
* Can store billions of strings efficiently since they share nodes
* When a user types in a string "hello", we;
    * Call `keysWithPrefix("hello")`
    * Return the 10 strings with the highest value

## Autocomplete Example, for Top 3 Matches
Suppose we have 6 strings with values shown below:

In [None]:
buck = 10
sad = 12
smog = 5
spit = 15
spite = 20
spy = 7

![](images/top.png)

If the user types `s`, we:
* Call `keysWithPrefix("s")`

In [None]:
sad, smog, spit, spite, spy

* Return the 3 keys with the greatest value

In [1]:
spit, spite, sad

NameError: name 'spit' is not defined

This approach has a major flaw: if we enter a short string, the number of keys with the appropriate prefix is too many!
* The program will collect billions of results, in the end just to keep 10. This is very inefficient!

## A More Efficient Autocomplete
One way to address this issue is to make each node stores:
* Its own value, as well as,
* The value of its best substring

![](images/more.png)

In [None]:
// Example is the root node
Value = None // indicates that this node is not a key
best = 20 // indicates that its best substring is the key 'spite'

The `t` node has a value of `15`, representing the string `spit`, but its best substring is `spite` as well with `20`

The search will consider nodes in order of "best". In this case, the program will consider `sp(20)` than `sm(5)`
* The search stops when top 3 matches are all better than the best remaining

Details left as an exercise. Hint: Use a PQ! See Bear Maps gold points for more.

## Even More Efficient Autocomplete

Looking at the previous Trie, the nodes b-u-c-k are redundant: there's really no need to keep them separate. The same goes for a-d, m-o-g. We can merge nodes that are redundant!
* This version of trie is known as "radix trie"

![](images/radix.png)

# Trie Summary

## Tries 

When the key is a string, we can use a Trie.
* Theoretically better performance than hash table or search tree
* Have to decide on a mapping from letter to node. 3 natural choices:
    * `DataIndexedCharMap`
        * e.g. an array of all possible links
    * Bushy BST
    * Hash Table

All 3 choices are fine, though hash table is probably the most natural

* Supports special string operations (e.g. `longestPrefixOf` and `keysWithPrefix`)
    * `keysWithPrefix` is the heart of important technology like autocomplete
    * Optimal implementation of autocomplete involves use of a priority queue

## Domain Specific Sets and Maps
More generally, we can sometimes take special advantage of our key type to improve sets and maps. For example:

* Tries handle String keys. This allows fast string specific operations
* Note that there are many other types of string sets / maps out there:
    * Suffix Trees
    * DAWG