### *Step 1: Imports*

*Import necessary modules and functions. `data_utils` contains preprocessing functions, `model` defines our neural network, a   `train_utils` handles the training loop, `test_utils` handles testing, and `predict` provides prediction functionality.*


In [1]:
from pathlib import Path
from data_utils import load_and_tokenize, build_vocab, prepare_data
from model import FeedforwardNN
from train_utils import train_model
from test_utils import test_model 
from predict import predict_next_word

### *Step 2: Load Dataset*

*Specify the path to the Penn Treebank dataset and load the training and validation sentences using our `load_and_tokenize` function.*

In [2]:
train_sentences = load_and_tokenize("ptbdataset/ptb.train.txt")
val_sentences   = load_and_tokenize("ptbdataset/ptb.valid.txt")
test_sentences  = load_and_tokenize("ptbdataset/ptb.test.txt")

### *Step 3: Build Vocabulary and Prepare Data*

*Create word-to-index and index-to-word mappings using `build_vocab`. Then, `prepare_data` converts tokenized sentences into numerical sequences for model training. Inputs are sequences of words, and outputs are the next words.*

In [3]:
word_to_index, index_to_word = build_vocab(train_sentences)
train_inputs, train_outputs, max_seq_len = prepare_data(train_sentences, word_to_index)
val_inputs, val_outputs, _ = prepare_data(val_sentences, word_to_index, max_len=max_seq_len)
test_inputs, test_outputs, _ = prepare_data(test_sentences, word_to_index, max_len=max_seq_len)

### *Step 4: Initialize Model*

*Define the neural network with input size equal to the sequence length, two hidden layers (128 and 64 neurons), and output size equal to the vocabulary size. We then create an instance of `FeedforwardNN`.*

In [4]:
seq_len = max_seq_len - 1
embedding_dim = 50
hidden_size1, hidden_size2 = 128, 64
vocab_size = len(word_to_index)

### *Step 5: Train the Model*

*Train the neural network using `train_model`, which handles batching, loss computation, backpropagation, and validation. For simplicity, we predict only the first next word in each sequence.*


In [5]:
# Instantiate with correct signature
model = FeedforwardNN(seq_len=max_seq_len-1, 
                      embedding_dim=embedding_dim, 
                      hidden1=hidden_size1, 
                      hidden2=hidden_size2, 
                      vocab_size=vocab_size)

In [6]:
train_model(model, train_inputs, train_outputs, val_inputs, val_outputs)

Epoch 1/5, Loss: 4.9836
Validation Loss: 3.6094
Epoch 2/5, Loss: 2.4653
Validation Loss: 2.6687
Epoch 3/5, Loss: 1.1740
Validation Loss: 2.2921
Epoch 4/5, Loss: 0.4830
Validation Loss: 2.2266
Epoch 5/5, Loss: 0.1631
Validation Loss: 2.4360


*Observations from the Loss Values*

*Training Loss Trend*

*Epoch 1 → 4.9836 → Epoch 5 → 0.1631*

*The training loss decreases sharply across epochs, which indicates that the model is learning the training data well and fitting it accurately.*

*Validation Loss Trend*

*Starts at 3.6094 (Epoch 1) and ends at 2.4360 (Epoch 5).*

*The validation loss decreases initially but slightly increases at the last epoch, suggesting the model might be starting to overfit the training data.*


### *Step 6: Save the trained model*

In [7]:
import torch
torch.save(model.state_dict(), "model.pth")

### *Step 7: Evaluate the model on a test set*

In [8]:
# After training the model
test_model(model, test_inputs, test_outputs)

Test Loss: 1.8733


*Test Loss Observation*

*Test Loss: 1.8733*

*The test loss is lower than the initial validation loss and indicates that the model generalizes reasonably well to unseen data, though there is still a gap compared to the final training loss, suggesting some overfitting.*


### *Step 8: Make predictions / inference*

In [11]:
# Example: use the first training sentence as seed  
seed_seq = train_inputs[0]  # first training sentence
next_word = predict_next_word(model, seed_seq, word_to_index, index_to_word)
print("Predicted next word:", next_word)

Predicted next word: banknote
