# 14. Model Deployment (Saving the Model)

## Objective
Train the final optimized XGBoost model and save it along with the preprocessing pipeline for future use.

## Steps
1.  **Load Data**: Use `trajectory_ml_ready_advanced.csv`.
2.  **Build Pipeline**: Use `ColumnTransformer` for One-Hot Encoding so it handles new data correctly.
3.  **Train**: Train on the full dataset (or a large split) using the best hyperparameters.
4.  **Save**: Export the pipeline using `joblib`.

In [1]:
import pandas as pd
import numpy as np
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from imblearn.over_sampling import SMOTE
from imblearn.pipeline import Pipeline as ImbPipeline
import joblib

print("Libraries Loaded.")

Libraries Loaded.


## 1. Load and Prepare Data

In [2]:
df = pd.read_csv('trajectory_ml_ready_advanced.csv')

# Define Features and Target
drop_cols = ['UNITID', 'Institution_Name', 'Year', 'Target_Trajectory', 'Target_Label', 'State']
X = df.drop(columns=drop_cols)
y = df['Target_Label']

# Identify Categorical and Numerical Columns
categorical_cols = ['Division']
numerical_cols = [col for col in X.columns if col not in categorical_cols]

print(f"Categorical: {categorical_cols}")
print(f"Numerical: {len(numerical_cols)} features")

Categorical: ['Division']
Numerical: 20 features


## 2. Build the Pipeline

In [3]:
# Preprocessing for categorical data
categorical_transformer = OneHotEncoder(handle_unknown='ignore')

# Bundle preprocessing for numerical and categorical data
preprocessor = ColumnTransformer(
    transformers=[
        ('cat', categorical_transformer, categorical_cols),
        ('num', 'passthrough', numerical_cols)
    ])

# Best Hyperparameters
best_params = {
    'n_estimators': 300,
    'max_depth': 3,
    'learning_rate': 0.05,
    'subsample': 0.7,
    'colsample_bytree': 0.9,
    'min_child_weight': 3,
    'use_label_encoder': False,
    'eval_metric': 'mlogloss',
    'random_state': 42
}

# Create the full pipeline with SMOTE
# Note: SMOTE only works in a pipeline if we use imblearn's Pipeline
pipeline = ImbPipeline(steps=[
    ('preprocessor', preprocessor),
    ('smote', SMOTE(random_state=42)),
    ('classifier', XGBClassifier(**best_params))
])

print("Pipeline Created.")

Pipeline Created.


## 3. Train and Save

In [4]:
# Train on the full dataset for the final model
pipeline.fit(X, y)
print("Model Trained on Full Dataset.")

# Save the model
model_filename = 'final_trajectory_model.joblib'
joblib.dump(pipeline, model_filename)
print(f"Model saved to {model_filename}")

Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


Model Trained on Full Dataset.
Model saved to final_trajectory_model.joblib
