# Imports & Notebook Setup

We will import the necessary packages utilized for modeling/evaluation in this notebook. We also make sure that our viualizations are exported to results/figures/modeling & evalutation/ directory, and metrics are exported to results/metrics directory:

In [None]:
import os
import sys 
from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    accuracy_score,
    precision_score,
    recall_score,
    f1_score,
    confusion_matrix,
    classification_report
)

# Allow imports from ../src directory
sys.path.append(os.path.abspath(".."))
from src.train_models import (
    build_logistic_regression_model,
    build_random_forest_model,
    build_xgboost
)
from src.preprocessing import (
    train_val_test_split
)
FIG_DIR = "../results/figures/modeling & evaluation/"
METRICS_DIR = "../results/metrics/"
os.makedirs(FIG_DIR, exist_ok=True)
os.makedirs(METRICS_DIR, exist_ok=True)
_plt_original_show = plt.show
_plt_fig_counter = {"count": 0}
def _save_and_show(*args, **kwargs) -> None:
    _plt_fig_counter["count"] += 1
    filename = os.path.join(FIG_DIR, f"figure_{_plt_fig_counter['count']:03d}.png")
    plt.savefig(filename, dpi=300, bbox_inches="tight")
    _plt_original_show(*args, **kwargs)
plt.show = _save_and_show

Compilation complete
Compilation complete


- `pandas`: data wrangling and CSV loading
- `numpy`: array and numerical operations
- `matplotlib` & `seaborn`: visualizations and plots
- `os`, `sys`, `pathlib`: directory management and custom imports
- `sklearn`: train/test splitting and evaluation metrics. 
-  `src.train_models`: Logistic Regression, Random Forest and XGBoost.

# Feature and Target Seperation

Load `ais_data_model_ready` and seperate features (X) and target (y)

In [3]:
model_df = pd.read_csv("../data/ais_data_model_ready.csv")
TARGET_COL = "navigationalstatus"
X = model_df.drop(columns=[TARGET_COL])
y = model_df[TARGET_COL]
print(f"Number of features: {X.shape[1]}")
print(f"Number of samples: {X.shape[0]}")
print(f"List of features names: {X.columns.tolist()}")
print(f"Target variable: {y.name}")

Number of features: 7
Number of samples: 326066
List of features names: ['sog', 'cog', 'heading', 'width', 'length', 'draught', 'shiptype']
Target variable: navigationalstatus


# Train / Validation / Test Split

Two-stage stratified split to preserve class distribution across all splits:

1. Stage 1 (80/20): 80% -> Train+Val, 20% -> Test
2. Stage 2 (80/20 of the 80%): 80% of 80% -> Train (64%), 20% of 80% -> Val (16%)

Final Distribution
- Training: 64%, Validation: 16%, Test: 20%

Function Parameters
- `test_size`: 20% reserved for final evaluation (untouched during development)
- `val_size`: 16% for hyperparameter tuning
- `random_state`: Ensures reproducibility
- Stratified: Preserves imbalanced class distribution across all splits

The function returns 6 variables: `X_train, X_val, X_test, y_train, y_val, y_test`

In [6]:
X_train, X_val, X_test, y_train, y_val, y_test = train_val_test_split(
    X,
    y,
    test_size=0.2,  # 20% reserved for final evaluation (untouched during development)
    val_size=0.16,  # 16% for hyperparameter tuning
    random_state=42
)
print("Train/val/test split complete!")

Train/val/test split complete!


In [7]:
# Print split information
print(f"\nTotal samples: {len(X)}")
print(f"\nTrain set:")
print(f"  Shape: {X_train.shape}")
print(f"  Percentage: {len(X_train)/len(X)*100:.1f}%")
print(f"\nValidation set:")
print(f"  Shape: {X_val.shape}")
print(f"  Percentage: {len(X_val)/len(X)*100:.1f}%")
print(f"\nTest set:")
print(f"  Shape: {X_test.shape}")
print(f"  Percentage: {len(X_test)/len(X)*100:.1f}%")


Total samples: 326066

Train set:
  Shape: (208682, 7)
  Percentage: 64.0%

Validation set:
  Shape: (52170, 7)
  Percentage: 16.0%

Test set:
  Shape: (65214, 7)
  Percentage: 20.0%


In [9]:
# Check set class distribution
print("\nTrain set class distribution:")
print(y_train.value_counts(normalize=True).sort_index())


Train set class distribution:
navigationalstatus
At anchor                                                0.001509
Constrained by her draught                               0.037617
Engaged in fishing                                       0.015061
Moored                                                   0.010336
Power-driven vessel pushing ahead or towing alongside    0.000724
Power-driven vessel towing astern                        0.000757
Reserved for future amendment [HSC]                      0.005362
Restricted maneuverability                               0.004974
Under way sailing                                        0.004049
Under way using engine                                   0.917779
Unknown value                                            0.001831
Name: proportion, dtype: float64


In [10]:
# Check validation set class distribution
print("\nValidation set class distribution:")
print(y_val.value_counts(normalize=True).sort_index())


Validation set class distribution:
navigationalstatus
At anchor                                                0.001514
Constrained by her draught                               0.037627
Engaged in fishing                                       0.015066
Moored                                                   0.010332
Power-driven vessel pushing ahead or towing alongside    0.000728
Power-driven vessel towing astern                        0.000748
Reserved for future amendment [HSC]                      0.005348
Restricted maneuverability                               0.004965
Under way sailing                                        0.004064
Under way using engine                                   0.917788
Unknown value                                            0.001821
Name: proportion, dtype: float64


In [11]:
# Check test set class distribution
print("\nTest set class distribution:")
print(y_test.value_counts(normalize=True).sort_index())


Test set class distribution:
navigationalstatus
At anchor                                                0.001503
Constrained by her draught                               0.037615
Engaged in fishing                                       0.015058
Moored                                                   0.010351
Power-driven vessel pushing ahead or towing alongside    0.000721
Power-driven vessel towing astern                        0.000751
Reserved for future amendment [HSC]                      0.005367
Restricted maneuverability                               0.004984
Under way sailing                                        0.004048
Under way using engine                                   0.917778
Unknown value                                            0.001825
Name: proportion, dtype: float64


# Define Evaluation Helper

Create a function that will provide us a standard to score each model.

In [12]:
def evaluate_classifier(model, X, y, model_name, split_name):
    """
    Evaluate a fitted classifier and return a dictionary of metrics.
    
    Args:
        model: A fitted sklearn-style classifier with .predict() method
        X: Feature matrix (NumPy array or pandas DataFrame)
        y: True labels
        model_name: String identifier for the model (e.g., "Logistic Regression")
        split_name: String identifier for the data split (e.g., "val" or "test")
    
    Returns:
        dict: Dictionary containing all computed metrics, predictions, and metadata
    """
    
    # Get predictions
    y_pred = model.predict(X)
    
    # Compute metrics
    accuracy = accuracy_score(y, y_pred)
    macro_f1 = f1_score(y, y_pred, average="macro")
    weighted_f1 = f1_score(y, y_pred, average="weighted")
    precision_macro = precision_score(y, y_pred, average="macro")
    recall_macro = recall_score(y, y_pred, average="macro")
    
    # Confusion matrix and classification report
    cm = confusion_matrix(y, y_pred)
    cr = classification_report(y, y_pred)
    
    # Print concise summary
    print(
        f"{model_name} [{split_name}] | "
        f"Accuracy: {accuracy:.3f} | "
        f"Macro-F1: {macro_f1:.3f} | "
        f"Weighted-F1: {weighted_f1:.3f} | "
        f"Precision (Macro): {precision_macro:.3f} | "
        f"Recall (Macro): {recall_macro:.3f}"
    )
    
    # Return results dictionary
    return {
        "model_name": model_name,
        "split": split_name,
        "accuracy": accuracy,
        "macro_f1": macro_f1,
        "weighted_f1": weighted_f1,
        "precision_macro": precision_macro,
        "recall_macro": recall_macro,
        "confusion_matrix": cm,
        "classification_report": cr,
        "y_true": y,
        "y_pred": y_pred
    }

print("Evaluation function instantiated.")

Evaluation function instantiated.


# Build and Train Models