# ParkSense: Availability Prediction Model

**Objective**: Predict the probability of parking availability for groups of bays at 15 and 30-minute intervals.

## 1. Grouping & Data Preparation
We verified that kerbside IDs are sequential. We will group by blocks of 20 IDs.

In [3]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
from sklearn.metrics import mean_absolute_error

# 1. Load data
df = pd.read_csv('data/supabase_snapshots.csv')
df['status_timestamp'] = pd.to_datetime(df['status_timestamp'])

# 2. Create Groups (Hypothesis: Sequential IDs = Spatial Proximity)
df['group_id'] = (df['kerbsideid'] // 20) * 20

# 3. Convert status to binary (1 = Occupied, 0 = Available)
df['is_occupied'] = df['status'].apply(lambda x: 1 if x == 'Present' else 0)

# 4. Resample into 15-minute intervals per group
# This calculates the occupancy ratio (0.0 to 1.0) for the group
group_ts = df.groupby(['group_id', pd.Grouper(key='status_timestamp', freq='15min')])['is_occupied'].mean().reset_index()
group_ts.columns = ['group_id', 'timestamp', 'occupancy_ratio']

print(f"Aggregation complete. {len(group_ts)} time-series samples created.")
group_ts.head()

Aggregation complete. 288341 time-series samples created.


Unnamed: 0,group_id,timestamp,occupancy_ratio
0,5680,2023-10-25 22:15:00+00:00,0.0
1,5700,2023-10-26 01:00:00+00:00,0.0
2,5720,2023-04-22 06:30:00+00:00,1.0
3,5720,2023-10-24 22:45:00+00:00,0.0
4,5720,2023-10-31 06:00:00+00:00,1.0


## 2. Feature Engineering
We need to create the 'Future' columns we want to predict.

In [4]:
# Create target variables (shifted by 1 and 2 rows since freq is 15m)
group_ts = group_ts.sort_values(['group_id', 'timestamp'])
group_ts['target_15m'] = group_ts.groupby('group_id')['occupancy_ratio'].shift(-1)
group_ts['target_30m'] = group_ts.groupby('group_id')['occupancy_ratio'].shift(-2)

# Standard Time Features
group_ts['hour'] = group_ts['timestamp'].dt.hour
group_ts['day_of_week'] = group_ts['timestamp'].dt.dayofweek

# Lag Features (What happened 15 mins ago?)
group_ts['lag_15m'] = group_ts.groupby('group_id')['occupancy_ratio'].shift(1)

# Drop rows with NaN (edges of the shifts)
model_data = group_ts.dropna()

print(f"Feature engineering complete. Prepared {len(model_data)} rows for training.")

Feature engineering complete. Prepared 287065 rows for training.


## 3. Training the 15-Minute Predictor
We will predict the probability (ratio) of a group being occupied.

In [5]:
X = model_data[['group_id', 'occupancy_ratio', 'hour', 'day_of_week', 'lag_15m']]
y = model_data['target_15m']

# Chronological Split (don't use random shuffle for time series!)
split_idx = int(len(X) * 0.8)
X_train, X_test = X.iloc[:split_idx], X.iloc[split_idx:]
y_train, y_test = y.iloc[:split_idx], y.iloc[split_idx:]

model_15m = XGBRegressor(n_estimators=100, learning_rate=0.1, max_depth=5)
model_15m.fit(X_train, y_train)

preds = model_15m.predict(X_test)
mae = mean_absolute_error(y_test, preds)

print(f"15-Minute Prediction MAE: {mae:.4f}")
print("Interpretion: On average, the model is within " + str(round(mae*100, 2)) + "% of the actual occupancy probability.")

15-Minute Prediction MAE: 0.2457
Interpretion: On average, the model is within 24.57% of the actual occupancy probability.


## 4. Answering your Question
*"At 5pm all these spots are occupied (Ratio=1.0), what is the prob of being un-occupied in the next 15 mins?"*

In [6]:
# Simulating your scenario
# Group 18000, 5 PM (Hour 17), Ratio 1.0 (Full)
scenario = pd.DataFrame([{
    'group_id': 18000,
    'occupancy_ratio': 1.0,
    'hour': 17,
    'day_of_week': 3, # Thursday
    'lag_15m': 0.95
}])

prob_occupied = model_15m.predict(scenario[X.columns])[0]
prob_available = 1.0 - prob_occupied

print(f"SCENARIO: Group is currently 100% full at 5:00 PM")
print(f"RESULT: Probability of finding an available spot at 5:15 PM: {prob_available*100:.2f}%")

SCENARIO: Group is currently 100% full at 5:00 PM
RESULT: Probability of finding an available spot at 5:15 PM: 65.17%
