<p style="font-family:Roboto; font-size: 26px; color: magenta"> 1.2 - N-Gram Language Modelling with NLTK</p>

<p style="font-family:Consolas; font-size: 18px; color: lightgreen"> N-gram Language Model</p>

<p style="font-family:Consolas; font-size: 18px; color: lightgreen"> An N-gram language model predicts the probability of a given N-gram within any sequence of words in a language.</p>
<p style="font-family:Consolas; font-size: 18px; color: lightgreen"> A well-crafted N-gram model can effectively predict the next word in a sentence</p>

<p style="font-family:Roboto; font-size: 26px; color: magenta"> Implementing N-Gram Language Modelling in NLTK</p>

In [1]:
# Import necessary libraries
import nltk
from nltk import bigrams, trigrams
from nltk.corpus import reuters
from collections import defaultdict

# Download necessary NLTK resources
nltk.download('reuters')
nltk.download('punkt')

[nltk_data] Downloading package reuters to
[nltk_data]     C:\Users\38067\AppData\Roaming\nltk_data...
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\38067\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [2]:
# Tokenize the text
words = nltk.word_tokenize(' '.join(reuters.words()))

# Create trigrams
tri_grams = list(trigrams(words))

# Build a trigram model
model = defaultdict(lambda: defaultdict(lambda: 0))

In [3]:
# Count frequency of co-occurrence
for w1, w2, w3 in tri_grams:
    model[(w1, w2)][w3] += 1

# Transform the counts into probabilities
for w1_w2 in model:
    total_count = float(sum(model[w1_w2].values()))
    for w3 in model[w1_w2]:
        model[w1_w2][w3] /= total_count

In [4]:
# Function to predict the next word
def predict_next_word(w1, w2):
    """
    Predicts the next word based on the previous two words using the trained trigram model.
    Args:
    w1 (str): The first word.
    w2 (str): The second word.

    Returns:
    str: The predicted next word.
    """
    next_word = model[w1, w2]
    if next_word:
        predicted_word = max(next_word, key=next_word.get)  # Choose the most likely next word
        return predicted_word
    else:
        return "No prediction available"

# Example usage
print("Next Word:", predict_next_word('the', 'stock'))

Next Word: of
