# Model Evaluation Notebook

This notebook loads preprocessed text data and evaluates several machine learning models:
- Hidden Markov Model
- Naive Bayes
- Neural Network
- Bayesian Network
- Decision Tree

Each model is trained and evaluated with accuracy scores, classification reports, and confusion matrices.

## Setup and Imports

In [2]:
import os
import sys
# Add project root to sys.path
project_root = os.path.abspath(os.path.join(os.getcwd(), '..', '..'))
sys.path.append(project_root)

import pandas as pd
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from src.data.preprocess import DataPreprocessor
from src.models.train_model import ModelTrainer
from src.models.predict_model import ModelPredictor
from src.config import *
from sklearn.decomposition import TruncatedSVD

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\PC\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\PC\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\PC\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
  from .autonotebook import tqdm as notebook_tqdm


## 1. Data Preparation

Loading the data, preprocessing it, and vectorizing the text features.

In [4]:
preprocessor = DataPreprocessor(TEST_DIR)
preprocessor.clean_data()
X_train, X_test, y_train, y_test = preprocessor.split_data(test_size=0.2)
(X_train_vec, X_test_vec), vectorizer = preprocessor.vectorize_text()

## 2. Evaluation Helper

Define a function to evaluate model performance using standard metrics.

In [5]:
def evaluate_model(y_true, y_pred, model_name):
    print(f"\n=== {model_name} Performance ===")
    print(f"Accuracy: {accuracy_score(y_true, y_pred):.4f}")
    print("Classification Report:")
    print(classification_report(y_true, y_pred))
    print("Confusion Matrix:")
    print(confusion_matrix(y_true, y_pred))

## 3. Model Training and Evaluation

Initialize the trainer and predictor objects.

In [6]:
trainer = ModelTrainer()
predictor = ModelPredictor()

### 3.1 Hidden Markov Model

Applying dimensionality reduction before training the HMM model.

In [7]:
# Dimensionality reduction for HMM
svd = TruncatedSVD(n_components=40)
X_train_hmm = svd.fit_transform(X_train_vec)
X_test_hmm = svd.transform(X_test_vec)
hmm_model = trainer.train_hidden_markov_model(X_train_hmm, y_train, n_components=2)
hmm_pred = predictor.predict_hidden_markov_model(X_test_hmm, trainer)
evaluate_model(y_test, hmm_pred, "Hidden Markov Model")

Successfully trained HMM for sentiment 0 with 2400 samples
Successfully trained HMM for sentiment 1 with 2471 samples
Successfully trained HMM for sentiment 2 with 2399 samples
HMM models saved at: d:\Documents\CODE\HCMUT\Machine Learning Assignment\models\trained\hmm_models.pkl

=== Hidden Markov Model Performance ===
Accuracy: 0.3273
Classification Report:
              precision    recall  f1-score   support

           0       0.42      0.02      0.04       600
           1       0.00      0.00      0.00       618
           2       0.33      0.97      0.49       600

    accuracy                           0.33      1818
   macro avg       0.25      0.33      0.18      1818
weighted avg       0.25      0.33      0.18      1818

Confusion Matrix:
[[ 14   0 586]
 [  0   0 618]
 [ 19   0 581]]


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Looking at the Hidden Markov Model (HMM), the accuracy is 32.7%, which is quite low. The confusion matrix shows it mostly misclassifies classes 0 and 2. HMMs are typically used for sequential data, so maybe they're not the best fit for text classification. The low recall for class 1 (0%) suggests it's not capturing that class at all. Dimensionality reduction might have removed important features, or the model isn't capturing the text structure well.

### 3.2 Naive Bayes Model

In [13]:
nb_model = trainer.train_naive_bayes(preprocessor=preprocessor)
# nb_pred = predictor.predict_naive_bayes(X_test_vec)
# evaluate_model(y_test, nb_pred, "Naive Bayes")

Starting Naive Bayes model training...
Class weights for balancing: {0: 1.0097222222222222, 1: 0.9807095642789694, 2: 1.0101431151868834}
Training Multinomial Naive Bayes model...
Best hyperparameters: {'alpha': 0.01, 'fit_prior': True}
Test Accuracy: 0.8405
              precision    recall  f1-score   support

           0       0.92      0.88      0.90       600
           1       0.73      0.93      0.82       618
           2       0.92      0.72      0.81       600

    accuracy                           0.84      1818
   macro avg       0.86      0.84      0.84      1818
weighted avg       0.86      0.84      0.84      1818

Prediction class distribution:
Class 0: 569 predictions
Class 1: 781 predictions
Class 2: 468 predictions
Naive Bayes model saved at: d:\Documents\CODE\HCMUT\Machine Learning Assignment\models\trained\naive_bayes_model.pkl


The Naive Bayes model has an accuracy of 84.05%. That's pretty good. The classification report shows high precision for class 0 (92%) and 2 (92%), but lower for class 1 (73%). This makes sense because Naive Bayes assumes feature independence, which might not hold here, leading to some misclassifications, especially in the neutral class (1). The model might be better at detecting clear positive or negative sentiments but struggles with neutral ones.

### 3.3 Neural Network Model

In [12]:
nn_model = trainer.train_neural_network(batch_size=8)
# nn_pred = predictor.predict_neural_network()
# evaluate_model(y_test, nn_pred, "Neural Network")

Starting RNN model training...
Class weights for balancing: {0: 1.0097222222222222, 1: 0.9807095642789694, 2: 1.0101431151868834}
Epoch 1/20
[1m727/727[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m32s[0m 31ms/step - accuracy: 0.5767 - loss: 0.8855 - val_accuracy: 0.8535 - val_loss: 0.4336 - learning_rate: 0.0010
Epoch 2/20
[1m727/727[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 28ms/step - accuracy: 0.8627 - loss: 0.3978 - val_accuracy: 0.8638 - val_loss: 0.3760 - learning_rate: 0.0010
Epoch 3/20
[1m727/727[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 28ms/step - accuracy: 0.9158 - loss: 0.2645 - val_accuracy: 0.8673 - val_loss: 0.3687 - learning_rate: 0.0010
Epoch 4/20
[1m727/727[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 27ms/step - accuracy: 0.9384 - loss: 0.2150 - val_accuracy: 0.8631 - val_loss: 0.4282 - learning_rate: 0.0010
Epoch 5/20
[1m727/727[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 28ms/step - accuracy: 0.9514 - loss: 0.1

The Neural Network (RNN) achieved 85.37% accuracy, which is the highest. The confusion matrix indicates it's good across all classes but slightly weaker in class 2 (positive). RNNs are good at capturing sequential data in text, so they can understand context better, leading to higher accuracy. The balanced precision and recall suggest it generalizes well, though there's room for improvement in class 1.

### 3.4 Bayesian Network Model

In [19]:
bayesian_model = trainer.train_bayesian_network(X_train, y_train)
bayesian_pred = predictor.predict_bayesian_network(X_test)
evaluate_model(y_test, bayesian_pred, "Bayesian Network")

  0%|          | 236/1000000 [07:29<528:30:25,  1.90s/it] 



=== Bayesian Network Performance ===
Accuracy: 0.7514
Classification Report:
              precision    recall  f1-score   support

           0       0.95      0.73      0.83       600
           1       0.61      0.90      0.73       618
           2       0.83      0.62      0.71       600

    accuracy                           0.75      1818
   macro avg       0.80      0.75      0.75      1818
weighted avg       0.80      0.75      0.75      1818

Confusion Matrix:
[[439 134  27]
 [ 15 554  49]
 [  7 220 373]]


### 3.5 Decision Tree Model

In [16]:
dt_model = trainer.train_decision_tree(X_train_vec, y_train)
dt_pred = predictor.predict_decision_tree(X_test_vec, trainer)
evaluate_model(y_test, dt_pred, "Decision Tree")


=== Decision Tree Performance ===
Accuracy: 0.8179
Classification Report:
              precision    recall  f1-score   support

           0       0.94      0.83      0.88       600
           1       0.69      0.94      0.80       618
           2       0.90      0.68      0.78       600

    accuracy                           0.82      1818
   macro avg       0.85      0.82      0.82      1818
weighted avg       0.84      0.82      0.82      1818

Confusion Matrix:
[[500  76  24]
 [ 20 578  20]
 [ 10 181 409]]


The Decision Tree model has 81.79% accuracy. It's performing well, especially in class 0 (negative) with high precision (94%). However, class 1 (neutral) has lower precision (69%), indicating it's often misclassified. Decision Trees might overfit or struggle with imbalanced data, but their interpretability is a plus.

## 4. Conclusion

This notebook has demonstrated the training and evaluation of multiple classification models on text data. The evaluation metrics can be compared to determine which model performs best for this particular dataset and task.