**Assignment No. 7:** Write a better auto-complete algorithm using an N-gram model (similar models are used for
translation, determining the author of a text, and speech recognition)


**STEPS :-**
1.Import Libraries – Import re, random, defaultdict, and Counter from collections.

2.Initialize Model – Define an NgramModel class with n-gram size.

3.Preprocess Text – Convert text to lowercase, remove punctuation, and split into words.

4.Train Model – Build (N-1)-gram prefix and count occurrences of next words.

5.Predict Next Word – Use the trained model to suggest the most common next words.

6.Test with Sample Input – Train on a demo corpus and predict words based on input context.

In [None]:
import re
import random
from collections import defaultdict, Counter

class NgramModel:
    def __init__(self, n=3):
        self.n = n  # N-gram size
        self.ngram_counts = defaultdict(Counter)

    def preprocess(self, text):
        text = text.lower()
        text = re.sub(r'[^a-zA-Z0-9\s]', '', text)  # Remove punctuation
        words = text.split()
        return words

    def train(self, corpus):
        words = self.preprocess(corpus)
        for i in range(len(words) - self.n + 1):
            prefix = tuple(words[i:i+self.n-1])  # (N-1)-gram as key
            next_word = words[i+self.n-1]  # The next word
            self.ngram_counts[prefix][next_word] += 1

    def predict(self, context, top_k=3):
        context = tuple(self.preprocess(context)[-self.n+1:])  # Get last (N-1) words
        if context in self.ngram_counts:
            suggestions = self.ngram_counts[context].most_common(top_k)
            return [word for word, _ in suggestions]
        return []

# Example corpus
demo_corpus = "The quick brown fox jumps over the lazy dog. The quick brown cat sleeps on the mat."

# Train the model
model = NgramModel(n=3)
model.train(demo_corpus)

# Test prediction
context = "The quick brown"
predictions = model.predict(context)
print(f"Predictions for '{context}': {predictions}")


Predictions for 'The quick brown': ['fox', 'cat']
