# 12. Autocomplete and Suggestions\n
\n
AutoComplete is a critical feature for modern search experience. It predicts the rest of a word or sentence as the user types.\n
\n
## Technique: Trie (Prefix Tree)\n
We will implement a **Trie** data structure to efficiently store the vocabulary and retrieve words sharing a common prefix.

In [1]:
import os
import glob
from collections import Counter

# Reuse vocabulary loader
def load_vocabulary(data_dir="../data"):
    vocab = Counter()
    for filepath in glob.glob(os.path.join(data_dir, "*.txt")):
        with open(filepath, 'r', encoding='utf-8') as f:
            text = f.read()
            tokens = text.lower().split()
            valid_tokens = [t for t in tokens if not t.isnumeric()]
            vocab.update(valid_tokens)
    return vocab

vocab_counts = load_vocabulary()
print(f"Loaded {len(vocab_counts)} unique words.")

Loaded 457 unique words.


## 1. Trie Data Structure Implementation\n
Each node in the Trie represents a character. A path from the root to a node represents a prefix or a complete word.

In [2]:
class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end_of_word = False
        self.frequency = 0

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word, frequency=1):
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end_of_word = True
        node.frequency = frequency

    def search(self, word):
        node = self.root
        for char in word:
            if char not in node.children:
                return False
            node = node.children[char]
        return node.is_end_of_word

    def _dfs(self, node, prefix, results):
        if node.is_end_of_word:
            results.append((prefix, node.frequency))
        
        for char, child_node in node.children.items():
            self._dfs(child_node, prefix + char, results)

    def starts_with(self, prefix):
        node = self.root
        for char in prefix:
            if char not in node.children:
                return []
            node = node.children[char]
        
        results = []
        self._dfs(node, prefix, results)
        return results

## 2. Building the Autocomplete Index\n
We populate the Trie with our document vocabulary.

In [3]:
trie = Trie()

for word, freq in vocab_counts.items():
    trie.insert(word, freq)

print("Trie built successfully!")

Trie built successfully!


## 3. Testing Autocomplete\n
Let's simulate a user typing and show suggestions ranked by frequency.

In [4]:
def get_suggestions(prefix, top_k=5):
    # Get all words starting with prefix
    candidates = trie.starts_with(prefix)
    
    # Sort by frequency (descending)
    candidates.sort(key=lambda x: x[1], reverse=True)
    
    return candidates[:top_k]

# Interactive Demo
prefixes = ["ने", "का", "स", "रा"]

print(f"{'Prefix':<10} | {'Suggestions (Word, Freq)'}")
print("-" * 50)

for p in prefixes:
    suggs = get_suggestions(p)
    print(f"{p:<10} | {suggs}")

Prefix     | Suggestions (Word, Freq)
--------------------------------------------------
ने         | [('नेपालको', 15), ('नेपालमा', 15), ('नेपाल', 9), ('नेपाली', 9), ('नेपालका', 1)]
का         | [('काठमाडौं', 2), ('काठमाडौंमा', 1), ('काम', 1)]
स          | [('स्वास्थ्य', 6), ('स्थानीय', 3), ('सबैभन्दा', 3), ('सरकारले', 3), ('सेवा', 3)]
रा         | [('राष्ट्रिय', 5), ('राजमार्ग', 4), ('राष्ट्र', 1), ('राष्ट्रभाषा', 1), ('राष्ट्रपति', 1)]


## 4. Next Steps\n
This basic Trie works well for prefixes. For more advanced features, we could:\n
- Implement a **Ternary Search Tree** for memory efficiency.\n
- Add **Fuzzy Matching** to handle typos in prefixes (connecting with the previous module).\n
- Cache top suggestions at each node for faster retrieval.