<a href="https://colab.research.google.com/github/l3ft-debug/summer-analytics-25/blob/main/Capstone_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Dynamic Pricing for Urban Parking Lots
Capstone Project — Summer Analytics 2025

This notebook demonstrates a real-time, data-driven pricing engine for urban parking, as specified in the project problem statement. We process live streams for **all 14 lots**, engineer features, and implement three stages of dynamic pricing models, visualizing results with Bokeh.

## Table of Contents
1. [Introduction](#introduction)
2. [Data Loading & Preprocessing](#data)
3. [Feature Engineering](#features)
4. [Model 1: Baseline Linear](#baseline)
5. [Model 2: Demand-Based](#demand)
6. [Model 3: Competitive Pricing](#competitive)
7. [Real-Time Simulation](#simulation)
8. [Visualization](#visualization)
9. [Discussion & Next Steps](#discussion)

## 1. Introduction

Urban parking spaces are scarce. Static prices cause inefficiency: overcrowding or underuse. We develop and simulate three levels of real-time pricing logic, using only numpy, pandas, and Pathway, with visualizations via Bokeh.

In [1]:
# Install necessary libraries (quiet mode)
!pip install pathway bokeh --quiet

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.4/60.4 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m149.4/149.4 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m69.7/69.7 MB[0m [31m14.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.6/77.6 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m777.6/777.6 kB[0m [31m28.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.2/139.2 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m26.5/26.5 MB[0m [31m37.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
import numpy as np
import pandas as pd
import pathway as pw
import bokeh.plotting
import panel as pn
pn.extension()

## 2. Data Loading & Preprocessing

We load data for all 14 parking lots. Each record includes location, lot features, queue, vehicle type, traffic, and event indicators. We combine date/time, encode categorical features, and prepare for streaming.

In [3]:
# Load your dataset (update path as needed)
df = pd.read_csv('dataset.csv')
# Combine into a single timestamp
df['Timestamp'] = pd.to_datetime(df['LastUpdatedDate'] + ' ' + df['LastUpdatedTime'], format='%d-%m-%Y %H:%M:%S')
# Sort and reset index
df = df.sort_values(['SystemCodeNumber', 'Timestamp']).reset_index(drop=True)
# Display first few rows
df.head()

Unnamed: 0,ID,SystemCodeNumber,Capacity,Latitude,Longitude,Occupancy,VehicleType,TrafficConditionNearby,QueueLength,IsSpecialDay,LastUpdatedDate,LastUpdatedTime,Timestamp
0,0,BHMBCCMKT01,577,26.144536,91.736172,61,car,low,1,0,04-10-2016,07:59:00,2016-10-04 07:59:00
1,1,BHMBCCMKT01,577,26.144536,91.736172,64,car,low,1,0,04-10-2016,08:25:00,2016-10-04 08:25:00
2,2,BHMBCCMKT01,577,26.144536,91.736172,80,car,low,2,0,04-10-2016,08:59:00,2016-10-04 08:59:00
3,3,BHMBCCMKT01,577,26.144536,91.736172,107,car,low,2,0,04-10-2016,09:32:00,2016-10-04 09:32:00
4,4,BHMBCCMKT01,577,26.144536,91.736172,150,bike,low,2,0,04-10-2016,09:59:00,2016-10-04 09:59:00


## 3. Feature Engineering

We encode categorical features for modeling:
- `VehicleType` → numeric weight
- `TrafficConditionNearby` → numeric traffic level
- Optionally: normalize queue length, encode special day

In [4]:
# Encode VehicleType
vehicle_map = {'car': 1.0, 'bike': 0.5, 'truck': 1.5, 'cycle': 0.3}
df['VehicleTypeWeight'] = df['VehicleType'].map(vehicle_map).fillna(1.0)

# Encode TrafficConditionNearby
traffic_map = {'low': 0.2, 'average': 0.5, 'high': 1.0}
df['TrafficLevel'] = df['TrafficConditionNearby'].map(traffic_map).fillna(0.5)

# Ensure QueueLength is numeric
df['QueueLength'] = pd.to_numeric(df['QueueLength'], errors='coerce').fillna(0)

# Save prepared CSV for streaming
stream_cols = [
    'Timestamp','SystemCodeNumber','Occupancy','Capacity','QueueLength','VehicleTypeWeight',
    'TrafficLevel','IsSpecialDay','Latitude','Longitude'
]
df[stream_cols].to_csv('parking_stream_all.csv', index=False)

## 4. Model 1: Baseline Linear Model

$$\text{Price}_{t+1} = \text{Price}_t + \alpha \frac{\text{Occupancy}}{\text{Capacity}}$$
We use a simple linear price update for each lot, per time step.

In [5]:
class BaselineSchema(pw.Schema):
    Timestamp: str
    SystemCodeNumber: str
    Occupancy: int
    Capacity: int

# Simulate stream for the baseline
data_bl = pw.demo.replay_csv('parking_stream_all.csv', schema=BaselineSchema, input_rate=1000)

# Price update logic: each lot independent, price carried forward
BASE_PRICE = 10.0
ALPHA = 2.0  # Tune as needed

# --- CHANGE: Compute BaselinePrice per lot and assign to df ---
df['BaselinePrice'] = np.nan
for lot in df['SystemCodeNumber'].unique():
    lot_df = df[df['SystemCodeNumber'] == lot].copy()
    lot_df = lot_df.sort_values('Timestamp')
    prices = [BASE_PRICE]
    for i in range(1, len(lot_df)):
        new_price = prices[-1] + ALPHA * (lot_df.iloc[i]['Occupancy'] / lot_df.iloc[i]['Capacity'])
        prices.append(new_price)
    df.loc[lot_df.index, 'BaselinePrice'] = prices


## 5. Model 2: Demand-Based Price Function

We create a demand function based on multiple features, normalize, and price accordingly:
$$
Demand = \alpha \frac{\text{Occupancy}}{\text{Capacity}} + \beta \cdot \text{QueueLength} - \gamma \cdot \text{TrafficLevel} + \delta \cdot \text{IsSpecialDay} + \epsilon \cdot \text{VehicleTypeWeight}
$$
Price is capped between `0.5x` and `2x` base.

In [6]:
# Demand function parameters (tune as appropriate)
params = dict(alpha=1.0, beta=0.5, gamma=0.3, delta=1.0, epsilon=0.6, lam=0.8)

def compute_demand(row, p):
    return (
        p['alpha']*row['Occupancy']/row['Capacity'] +
        p['beta'] * row['QueueLength'] -
        p['gamma'] * row['TrafficLevel'] +
        p['delta'] * row['IsSpecialDay'] +
        p['epsilon'] * row['VehicleTypeWeight']
    )

# --- CHANGE: Compute demand and normalize per lot ---
df['Demand'] = np.nan
df['NormDemand'] = np.nan
df['DemandPrice'] = np.nan
def demand_price(norm_demand, base=BASE_PRICE, lam=params['lam']):
    price = base * (1 + lam * norm_demand)
    return np.clip(price, 0.5*base, 2*base)

for lot in df['SystemCodeNumber'].unique():
    mask = df['SystemCodeNumber'] == lot
    lot_df = df[mask].copy()
    df.loc[mask, 'Demand'] = lot_df.apply(lambda r: compute_demand(r, params), axis=1)
    dmin, dmax = df.loc[mask, 'Demand'].min(), df.loc[mask, 'Demand'].max()
    norm = (df.loc[mask, 'Demand']-dmin)/(dmax-dmin+1e-6)
    df.loc[mask, 'NormDemand'] = norm
    df.loc[mask, 'DemandPrice'] = norm.apply(lambda d: demand_price(d))


## 6. Model 3: Competitive Pricing Model

Now, we incorporate competition: if a lot is full and nearby lots are cheaper, suggest rerouting or lower your price. If others are expensive, your price can rise. (Proximity via lat/long, price comparison by time.)

In [7]:
from geopy.distance import geodesic

# Precompute lot locations
lot_locations = df.groupby('SystemCodeNumber')[['Latitude','Longitude']].first().to_dict('index')

def get_nearby_lots(lot, radius_km=0.5):
    origin = lot_locations[lot]
    return [other for other, loc in lot_locations.items() if other != lot and geodesic((origin['Latitude'],origin['Longitude']),(loc['Latitude'],loc['Longitude'])).km < radius_km]

def competitive_price(row, df, margin=0.5):
    lot = row['SystemCodeNumber']
    time = row['Timestamp']
    my_price = row['DemandPrice']
    occ = row['Occupancy']
    cap = row['Capacity']
    competitors = get_nearby_lots(lot)
    nearby_prices = []
    for c in competitors:
        df_c = df[(df['SystemCodeNumber']==c)]
        if not df_c.empty:
            idx = (df_c['Timestamp']-time).abs().idxmin()
            nearby_prices.append(df_c.loc[idx,'DemandPrice'])
    if occ >= cap and any([p < my_price-margin for p in nearby_prices]):
        return my_price - margin
    elif any([p > my_price+margin for p in nearby_prices]):
        return my_price + margin
    else:
        return my_price

# --- CHANGE: Compute CompPrice per lot ---
df['CompPrice'] = np.nan
for lot in df['SystemCodeNumber'].unique():
    mask = df['SystemCodeNumber'] == lot
    df.loc[mask, 'CompPrice'] = df[mask].apply(lambda row: competitive_price(row, df), axis=1)


## 7. Real-Time Simulation with Pathway

We simulate streaming data for all lots. (For full real-time, implement each model as a Pathway operator.)

In [8]:
class FullSchema(pw.Schema):
    Timestamp: str
    SystemCodeNumber: str
    Occupancy: int
    Capacity: int
    QueueLength: int
    VehicleTypeWeight: float
    TrafficLevel: float
    IsSpecialDay: int
    Latitude: float
    Longitude: float

data_stream = pw.demo.replay_csv('parking_stream_all.csv', schema=FullSchema, input_rate=500)

# Here you would reimplement (in Pathway ops) the logic for each model for real-time pricing per lot.
# For brevity, we show offline computation above. For Pathway, see sample notebook's usage.

## 8. Visualization

We use Bokeh to plot pricing for each lot. You can compare Baseline, Demand, and Competitive models.

In [13]:
from bokeh.io import output_notebook, show
output_notebook()
from bokeh.layouts import gridplot
plots = []
lots = df['SystemCodeNumber'].unique()[:4]  # Demo: first 4 lots
for lot in lots:
    lot_df = df[df['SystemCodeNumber']==lot]
    fig = bokeh.plotting.figure(title=f"Lot {lot}", x_axis_type='datetime', width=400, height=300)
    fig.line(lot_df['Timestamp'], lot_df['BaselinePrice'], color='blue', legend_label='Baseline')
    fig.line(lot_df['Timestamp'], lot_df['DemandPrice'], color='green', legend_label='Demand-based')
    fig.line(lot_df['Timestamp'], lot_df['CompPrice'], color='red', legend_label='Competitive')
    fig.legend.location = 'top_left'
    plots.append(fig)
show(gridplot([plots[:2],plots[2:]]))

## 9. Discussion & Next Steps

**Demand function:**
- Weighted sum of occupancy, queue, traffic, event, vehicle.
- Tuned to ensure price smoothness, bounded (0.5x–2x base).

**Assumptions:**
- Features are directly related to willingness to pay/demand.
- Proximity defined as 0.5 km (change as needed).
- Prices are updated at each time step.

**How price changes:**
- Increases with occupancy, queue, special events, larger vehicles.
- Decreases with high traffic (optional: congestion deterrent).
- Competitive logic adjusts price in context of nearby lots.

### Further steps
- Implement all logic directly in Pathway for true streaming.
- Add rerouting suggestions (if a lot is full and cheaper space nearby).
- Add more visualizations: queue, occupancy, revenue.