# ParkSense: Model Training

**Goal**: Train a Machine Learning model to predict if a parking bay will be free in the next 15 minutes.

**Approach**:
1.  **Input Features**: Location (`Lat`, `Lon`) and Time (`Hour`, `Day`, `Weekend`).
2.  **Model**: Random Forest Classifier (Robust, non-linear, handles interactions).
3.  **Target**: `is_free_15m` (1 = Free, 0 = Occupied).
4.  **Output**: A saved model file (`.joblib`) ready for the backend API.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import joblib
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, roc_auc_score

# Settings
sns.set_style("whitegrid")

## 1. Load Data
Load the clean dataset we prepared in Notebook 01.

In [None]:
DATA_PATH = '../data/processed_parking_data.csv'
df = pd.read_csv(DATA_PATH)

# Features (X) and Target (y)
features = ['Latitude', 'Longitude', 'hour', 'day_of_week', 'is_weekend']
target = 'is_free_15m'

X = df[features]
y = df[target]

print(f"Features: {features}")
print(f"Target: {target}")
X.head()

## 2. Split Data (Train vs Test)
We keep 20% of the data unseen to test how well the model generalizes.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training samples: {X_train.shape[0]}")
print(f"Testing samples: {X_test.shape[0]}")

## 3. Train Model (Random Forest)
We use a Random Forest with 100 trees. It's a great baseline for tabular data.

In [None]:
model = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)
model.fit(X_train, y_train)
print("Model trained successfully!")

## 4. Evaluation
How good is it? We check Accuracy, Precision, Recall, and ROC-AUC.

In [None]:
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:, 1]

# Metrics
acc = accuracy_score(y_test, y_pred)
auc = roc_auc_score(y_test, y_prob)

print(f"Accuracy: {acc:.4f}")
print(f"ROC-AUC: {auc:.4f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

### Confusion Matrix
Visualizing where the model makes mistakes.

In [None]:
plt.figure(figsize=(6, 5))
sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, fmt='d', cmap='Blues', cbar=False)
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.title('Confusion Matrix')
plt.show()

### Feature Importance
What matters most? Location? Time?

In [None]:
importances = pd.Series(model.feature_importances_, index=features).sort_values(ascending=False)

plt.figure(figsize=(8, 5))
sns.barplot(x=importances.values, y=importances.index, palette='viridis')
plt.title('Feature Importance')
plt.show()

## 5. Save Model
Export the trained model to `models/` so the backend can load it.

In [None]:
import os

MODEL_DIR = '../models'
if not os.path.exists(MODEL_DIR):
    os.makedirs(MODEL_DIR)

MODEL_PATH = os.path.join(MODEL_DIR, 'parking_model_v1.joblib')
joblib.dump(model, MODEL_PATH)

print(f"Model saved to {MODEL_PATH}")