# SMS Spam Classification: Model Training & Evaluation

This notebook covers model training, scoring, evaluation, validation, hyperparameter tuning, and benchmarking for SMS spam classification.

## 1. Import Required Libraries
We begin by importing all necessary libraries for feature extraction, modeling, and evaluation.

In [8]:
# Import required libraries
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix, classification_report
from sklearn.model_selection import GridSearchCV

## 2. Load Data Splits
Load the train, validation, and test CSV files for model training and evaluation.

In [9]:
# Load train, validation, and test splits
train_df = pd.read_csv('data/train.csv')
val_df = pd.read_csv('data/validation.csv')
test_df = pd.read_csv('data/test.csv')

print(f"Train shape: {train_df.shape}")
print(f"Validation shape: {val_df.shape}")
print(f"Test shape: {test_df.shape}")

Train shape: (4457, 3)
Validation shape: (557, 3)
Test shape: (558, 3)


In [11]:
# Ensure no NaN values in 'text' columns before feature extraction
train_df = train_df.dropna(subset=['text'])
val_df = val_df.dropna(subset=['text'])
test_df = test_df.dropna(subset=['text'])

## 3. Feature Extraction and Preprocessing
Convert SMS text into numerical features using TF-IDF or CountVectorizer.

In [12]:
# Use TF-IDF for feature extraction
vectorizer = TfidfVectorizer()
X_train = vectorizer.fit_transform(train_df['text'])
X_val = vectorizer.transform(val_df['text'])
X_test = vectorizer.transform(test_df['text'])
y_train = train_df['label_num']
y_val = val_df['label_num']
y_test = test_df['label_num']

print(f"TF-IDF feature matrix shape (train): {X_train.shape}")

TF-IDF feature matrix shape (train): (4456, 8356)


## 4. Define and Fit Models on Training Data
We will define and train three benchmark models: Logistic Regression, Multinomial Naive Bayes, and Linear SVM.

In [13]:
# Define and fit models
models = {
    'Logistic Regression': LogisticRegression(max_iter=1000, random_state=42),
    'MultinomialNB': MultinomialNB(),
    'LinearSVC': LinearSVC(random_state=42)
}

for name, model in models.items():
    model.fit(X_train, y_train)
    print(f"{name} trained.")

Logistic Regression trained.
MultinomialNB trained.
LinearSVC trained.


## 5. Score Models on Given Data
We will write functions to score the trained models on train, validation, and test datasets.

In [14]:
# Function to score models
def score_model(model, X, y, dataset_name):
    y_pred = model.predict(X)
    acc = accuracy_score(y, y_pred)
    f1 = f1_score(y, y_pred)
    print(f"{model.__class__.__name__} on {dataset_name} - Accuracy: {acc:.4f}, F1: {f1:.4f}")
    return acc, f1

# Score all models on train and validation sets
for name, model in models.items():
    print(f"\n{name}:")
    score_model(model, X_train, y_train, 'Train')
    score_model(model, X_val, y_val, 'Validation')


Logistic Regression:
LogisticRegression on Train - Accuracy: 0.9711, F1: 0.8796
LogisticRegression on Validation - Accuracy: 0.9623, F1: 0.8346

MultinomialNB:
MultinomialNB on Train - Accuracy: 0.9681, F1: 0.8653
MultinomialNB on Validation - Accuracy: 0.9515, F1: 0.7769

LinearSVC:
LinearSVC on Train - Accuracy: 0.9993, F1: 0.9975
LinearSVC on Validation - Accuracy: 0.9820, F1: 0.9286


## 6. Evaluate Model Predictions
We will evaluate model predictions using confusion matrix, precision, recall, F1-score, and provide clear explanations of the results.

In [15]:
# Function to evaluate model predictions
def evaluate_model(model, X, y, dataset_name):
    y_pred = model.predict(X)
    print(f"\nEvaluation for {model.__class__.__name__} on {dataset_name}:")
    print(confusion_matrix(y, y_pred))
    print(classification_report(y, y_pred))

# Evaluate all models on validation set
for name, model in models.items():
    evaluate_model(model, X_val, y_val, 'Validation')


Evaluation for LogisticRegression on Validation:
[[483   0]
 [ 21  53]]
              precision    recall  f1-score   support

           0       0.96      1.00      0.98       483
           1       1.00      0.72      0.83        74

    accuracy                           0.96       557
   macro avg       0.98      0.86      0.91       557
weighted avg       0.96      0.96      0.96       557


Evaluation for MultinomialNB on Validation:
[[483   0]
 [ 27  47]]
              precision    recall  f1-score   support

           0       0.95      1.00      0.97       483
           1       1.00      0.64      0.78        74

    accuracy                           0.95       557
   macro avg       0.97      0.82      0.87       557
weighted avg       0.95      0.95      0.95       557


Evaluation for LinearSVC on Validation:
[[482   1]
 [  9  65]]
              precision    recall  f1-score   support

           0       0.98      1.00      0.99       483
           1       0.98      0.8

## 7. Validation: Fit, Score, and Evaluate on Train and Validation Sets
We will validate the models by fitting, scoring, and evaluating on both train and validation sets.

In [16]:
# Validation: Fit, Score, and Evaluate
for name, model in models.items():
    print(f"\n{name} - Train Set:")
    score_model(model, X_train, y_train, 'Train')
    evaluate_model(model, X_train, y_train, 'Train')
    print(f"\n{name} - Validation Set:")
    score_model(model, X_val, y_val, 'Validation')
    evaluate_model(model, X_val, y_val, 'Validation')


Logistic Regression - Train Set:
LogisticRegression on Train - Accuracy: 0.9711, F1: 0.8796

Evaluation for LogisticRegression on Train:
[[3856    2]
 [ 127  471]]
              precision    recall  f1-score   support

           0       0.97      1.00      0.98      3858
           1       1.00      0.79      0.88       598

    accuracy                           0.97      4456
   macro avg       0.98      0.89      0.93      4456
weighted avg       0.97      0.97      0.97      4456


Logistic Regression - Validation Set:
LogisticRegression on Validation - Accuracy: 0.9623, F1: 0.8346

Evaluation for LogisticRegression on Validation:
[[483   0]
 [ 21  53]]
              precision    recall  f1-score   support

           0       0.96      1.00      0.98       483
           1       1.00      0.72      0.83        74

    accuracy                           0.96       557
   macro avg       0.98      0.86      0.91       557
weighted avg       0.96      0.96      0.96       557


Mult

## 8. Hyperparameter Tuning
We will fine-tune model hyperparameters using GridSearchCV on train and validation sets.

In [17]:
# Example: Hyperparameter tuning for Logistic Regression
param_grid = {'C': [0.01, 0.1, 1, 10]}
gs = GridSearchCV(LogisticRegression(max_iter=1000, random_state=42), param_grid, cv=3, scoring='f1')
gs.fit(X_train, y_train)
print(f"Best params: {gs.best_params_}")
print(f"Best F1 (train): {gs.best_score_:.4f}")
# Evaluate on validation set
y_val_pred = gs.predict(X_val)
print(classification_report(y_val, y_val_pred))

Best params: {'C': 10}
Best F1 (train): 0.8772
              precision    recall  f1-score   support

           0       0.98      1.00      0.99       483
           1       1.00      0.88      0.94        74

    accuracy                           0.98       557
   macro avg       0.99      0.94      0.96       557
weighted avg       0.98      0.98      0.98       557



## 9. Benchmark and Select Best Model on Test Data
We will score all benchmark models on the test set, compare their performance, and select the best-performing model.

In [18]:
# Score all models on test set and select the best
results = {}
for name, model in models.items():
    print(f"\n{name} - Test Set:")
    acc, f1 = score_model(model, X_test, y_test, 'Test')
    results[name] = {'accuracy': acc, 'f1': f1}
    evaluate_model(model, X_test, y_test, 'Test')

# Add tuned Logistic Regression if available
if 'gs' in locals():
    print("\nTuned Logistic Regression - Test Set:")
    acc = accuracy_score(y_test, gs.predict(X_test))
    f1 = f1_score(y_test, gs.predict(X_test))
    print(f"Accuracy: {acc:.4f}, F1: {f1:.4f}")
    print(classification_report(y_test, gs.predict(X_test)))
    results['Tuned Logistic Regression'] = {'accuracy': acc, 'f1': f1}

# Select best model
best_model = max(results, key=lambda k: results[k]['f1'])
print(f"\nBest model on test set: {best_model} (F1: {results[best_model]['f1']:.4f})")


Logistic Regression - Test Set:
LogisticRegression on Test - Accuracy: 0.9695, F1: 0.8722

Evaluation for LogisticRegression on Test:
[[482   0]
 [ 17  58]]
              precision    recall  f1-score   support

           0       0.97      1.00      0.98       482
           1       1.00      0.77      0.87        75

    accuracy                           0.97       557
   macro avg       0.98      0.89      0.93       557
weighted avg       0.97      0.97      0.97       557


MultinomialNB - Test Set:
MultinomialNB on Test - Accuracy: 0.9533, F1: 0.7903

Evaluation for MultinomialNB on Test:
[[482   0]
 [ 26  49]]
              precision    recall  f1-score   support

           0       0.95      1.00      0.97       482
           1       1.00      0.65      0.79        75

    accuracy                           0.95       557
   macro avg       0.97      0.83      0.88       557
weighted avg       0.96      0.95      0.95       557


LinearSVC - Test Set:
LinearSVC on Test - Acc

---

**End of Model Training & Evaluation Notebook**