### *Step 1: Imports*

*Import necessary modules and functions. `data_utils` contains preprocessing functions, `model` defines our neural network, and `train_utils` handles the training loop.*


In [46]:
from pathlib import Path
from data_utils import load_and_tokenize, build_vocab, prepare_data
from model import FeedforwardNN
from train_utils import train_model

### *Step 2: Load Dataset*

*Specify the path to the Penn Treebank dataset and load the training and validation sentences using our `load_and_tokenize` function.*

In [47]:
train_sentences = load_and_tokenize("ptbdataset/ptb.train.txt")
val_sentences   = load_and_tokenize("ptbdataset/ptb.valid.txt")

### *Step 3: Build Vocabulary and Prepare Data*

*Create word-to-index and index-to-word mappings using `build_vocab`. Then, `prepare_data` converts tokenized sentences into numerical sequences for model training. Inputs are sequences of words, and outputs are the next words.*

In [48]:
word_to_index, index_to_word = build_vocab(train_sentences)
train_inputs, train_outputs, max_seq_len = prepare_data(train_sentences, word_to_index)
val_inputs, val_outputs, _ = prepare_data(val_sentences, word_to_index, max_len=max_seq_len)

### *Step 4: Initialize Model*

*Define the neural network with input size equal to the sequence length, two hidden layers (128 and 64 neurons), and output size equal to the vocabulary size. We then create an instance of `FeedforwardNN`.*

In [49]:
seq_len = max_seq_len - 1
embedding_dim = 50
hidden_size1, hidden_size2 = 128, 64
vocab_size = len(word_to_index)

### *Step 5: Train the Model*

*Train the neural network using `train_model`, which handles batching, loss computation, backpropagation, and validation. For simplicity, we predict only the first next word in each sequence.*


In [50]:
# Instantiate with correct signature
model = FeedforwardNN(seq_len=max_seq_len-1, 
                      embedding_dim=embedding_dim, 
                      hidden1=hidden_size1, 
                      hidden2=hidden_size2, 
                      vocab_size=vocab_size)

In [52]:
train_model(model, train_inputs, train_outputs, val_inputs, val_outputs)

Epoch 1/5, Loss: 5.0188
Validation Loss: 3.6656
Epoch 2/5, Loss: 2.4526
Validation Loss: 2.6854
Epoch 3/5, Loss: 1.1483
Validation Loss: 2.3516
Epoch 4/5, Loss: 0.4708
Validation Loss: 2.3418
Epoch 5/5, Loss: 0.1720
Validation Loss: 2.5293
