# Lot-Level Enforcement Aggregation
Recreate enforcement data at **lot number level** instead of zone level.
This allows for accurate predictions per individual lot:
- Lot 150 (CUE Garage)
- Lot 71 (Library Garage)  
- Lot 146 (Student Rec Center)
- etc.
Each lot has different enforcement patterns.

In [3]:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')
print("Loading data...")

Loading data...


## Load Raw Data with Lot Numbers

In [4]:
# Load tickets with lot numbers
tickets = pd.read_csv('../../data/processed/tickets_enriched.csv', parse_dates=['Issue_DateTime'])
print(f"Tickets: {len(tickets):,} records")
print(f"Date range: {tickets['Issue_DateTime'].min()} to {tickets['Issue_DateTime'].max()}")
print(f"Unique lots with tickets: {tickets['Lot_number'].nunique()}")
# Load LPR with lot numbers
lpr = pd.read_csv('../../data/processed/lpr_enriched.csv', parse_dates=['Date_Time'])
print(f"\nLPR scans: {len(lpr):,} records")
print(f"Date range: {lpr['Date_Time'].min()} to {lpr['Date_Time'].max()}")
print(f"Unique lots in LPR: {lpr['Lot_number'].nunique()}")
# Note: AMP data is zone-level only, not lot-level
print(f"\n(Note: AMP data unavailable at lot level - using LPR as unpaid proxy)")

Tickets: 192,709 records
Date range: 2018-07-02 08:13:00 to 2025-10-30 15:56:00
Unique lots with tickets: 187

LPR scans: 1,780,391 records
Date range: 2022-07-01 05:24:26 to 2025-06-30 21:58:49
Unique lots in LPR: 185

(Note: AMP data unavailable at lot level - using LPR as unpaid proxy)


## Create Hourly Date Range

In [5]:
# Create hourly date range from July 2022 to June 2025
start_date = pd.to_datetime('2022-07-01 00:00:00')
end_date = pd.to_datetime('2025-06-30 23:00:00')
hourly_dates = pd.date_range(start=start_date, end=end_date, freq='H')
print(f"Created {len(hourly_dates):,} hourly timestamps")
# Get unique lot numbers from tickets and LPR only (AMP doesn't have lot numbers)
tickets_lots = set(tickets['Lot_number'].dropna().astype(int).unique())
lpr_lots = set(lpr['Lot_number'].dropna().astype(int).unique())
all_lots = sorted(tickets_lots | lpr_lots)
print(f"\nTotal unique lots: {len(all_lots)}")
print(f"Lots: {all_lots[:20]}...")  # Show first 20

Created 26,304 hourly timestamps

Total unique lots: 190
Lots: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]...


## Aggregate Tickets by Lot + Hour

In [6]:
# Round tickets to hour
tickets['datetime'] = tickets['Issue_DateTime'].dt.floor('H')
tickets['date'] = tickets['datetime'].dt.date
# Aggregate by lot + datetime
tickets_hourly = tickets.groupby(['Lot_number', 'datetime']).size().reset_index(name='tickets_issued')
print(f"Tickets aggregated: {len(tickets_hourly):,} lot-hour combinations")
print(f"Date range: {tickets_hourly['datetime'].min()} to {tickets_hourly['datetime'].max()}")

Tickets aggregated: 76,981 lot-hour combinations
Date range: 2018-07-02 08:00:00 to 2025-10-30 15:00:00


## Aggregate LPR by Lot + Hour

In [7]:
# Round LPR to hour
lpr['datetime'] = lpr['Date_Time'].dt.floor('H')
lpr['date'] = lpr['datetime'].dt.date
# Aggregate by lot + datetime
lpr_hourly = lpr.groupby(['Lot_number', 'datetime']).size().reset_index(name='lpr_scans')
print(f"LPR aggregated: {len(lpr_hourly):,} lot-hour combinations")
print(f"Date range: {lpr_hourly['datetime'].min()} to {lpr_hourly['datetime'].max()}")

LPR aggregated: 76,550 lot-hour combinations
Date range: 2022-07-01 05:00:00 to 2025-06-30 21:00:00


## Skip AMP Aggregation
AMP data is zone-level only and doesn't have lot numbers. We'll set amp_sessions to 0 and use LPR as proxy for unpaid estimate.

In [8]:
# Skip AMP aggregation - not available at lot level
print("Skipping AMP aggregation (zone-level data only)")
print("Will use LPR as proxy for unpaid estimate")

Skipping AMP aggregation (zone-level data only)
Will use LPR as proxy for unpaid estimate


## Merge All Data

In [9]:
# Create full grid of lot + hour combinations
lot_hour_grid = pd.MultiIndex.from_product(
    [all_lots, hourly_dates],
    names=['Lot_number', 'datetime']
).to_frame(index=False)
print(f"Full grid: {len(lot_hour_grid):,} lot-hour combinations")
# Merge data (LPR and tickets only, no AMP at lot level)
enforcement_lot = lot_hour_grid.copy()
enforcement_lot = enforcement_lot.merge(
    lpr_hourly,
    on=['Lot_number', 'datetime'],
    how='left'
).merge(
    tickets_hourly,
    on=['Lot_number', 'datetime'],
    how='left'
)
# Fill NaN with 0
enforcement_lot['lpr_scans'] = enforcement_lot['lpr_scans'].fillna(0).astype(int)
enforcement_lot['amp_sessions'] = 0  # Not available at lot level
enforcement_lot['tickets_issued'] = enforcement_lot['tickets_issued'].fillna(0).astype(int)
print(f"\nMerged data: {len(enforcement_lot):,} records")
print(f"Columns: {list(enforcement_lot.columns)}")

Full grid: 4,997,760 lot-hour combinations

Merged data: 4,997,760 records
Columns: ['Lot_number', 'datetime', 'lpr_scans', 'tickets_issued', 'amp_sessions']


## Calculate Enforcement Metrics

In [10]:
# Since AMP not available at lot level, use LPR as proxy for total vehicles
# Unpaid estimate = LPR scans (assume all LPR vehicles are potential violators)
enforcement_lot['unpaid_estimate'] = enforcement_lot['lpr_scans']
# Enforcement rate: tickets / unpaid vehicles (when unpaid > 0)
enforcement_lot['enforcement_rate'] = 0.0
mask = enforcement_lot['lpr_scans'] > 0
enforcement_lot.loc[mask, 'enforcement_rate'] = (
    enforcement_lot.loc[mask, 'tickets_issued'] / enforcement_lot.loc[mask, 'lpr_scans']
)
# Flag for estimated LPR (will be True for July-Oct 2025)
enforcement_lot['lpr_estimated'] = False
print("Calculated enforcement metrics")
print(f"\nSummary:")
print(enforcement_lot[['lpr_scans', 'amp_sessions', 'tickets_issued', 'unpaid_estimate', 'enforcement_rate']].describe())

Calculated enforcement metrics

Summary:
          lpr_scans  amp_sessions  tickets_issued  unpaid_estimate  \
count  4.997760e+06     4997760.0    4.997760e+06     4.997760e+06   
mean   3.562378e-01           0.0    1.868197e-02     3.562378e-01   
std    5.681619e+00           0.0    3.144816e-01     5.681619e+00   
min    0.000000e+00           0.0    0.000000e+00     0.000000e+00   
25%    0.000000e+00           0.0    0.000000e+00     0.000000e+00   
50%    0.000000e+00           0.0    0.000000e+00     0.000000e+00   
75%    0.000000e+00           0.0    0.000000e+00     0.000000e+00   
max    6.660000e+02           0.0    4.900000e+01     6.660000e+02   

       enforcement_rate  
count      4.997760e+06  
mean       1.245646e-03  
std        4.083554e-02  
min        0.000000e+00  
25%        0.000000e+00  
50%        0.000000e+00  
75%        0.000000e+00  
max        2.000000e+01  


## Add Temporal Features

In [11]:
enforcement_lot['date'] = enforcement_lot['datetime'].dt.date
enforcement_lot['hour'] = enforcement_lot['datetime'].dt.hour
enforcement_lot['year'] = enforcement_lot['datetime'].dt.year
enforcement_lot['month'] = enforcement_lot['datetime'].dt.month
enforcement_lot['day_of_week'] = enforcement_lot['datetime'].dt.dayofweek
enforcement_lot['is_weekend'] = (enforcement_lot['day_of_week'] >= 5).astype(int)
print("Added temporal features")

Added temporal features


## Add Contextual Features (Calendar, Weather)

In [12]:
# Load contextual data
calendar = pd.read_csv('../../data/academic_calendar.csv', parse_dates=['Start_Date', 'End_Date'])
games = pd.read_csv('../../data/football_games.csv', parse_dates=['Date'])
weather = pd.read_csv('../../data/weather_pullman_hourly_2020_2025.csv', parse_dates=['datetime'])
# Merge weather
enforcement_lot = enforcement_lot.merge(
    weather[['datetime', 'temperature_f', 'precipitation_inches', 'snowfall_inches', 
             'snow_depth_inches', 'wind_mph', 'weather_code', 'weather_category',
             'is_rainy', 'is_snowy', 'is_cold', 'is_hot', 'is_windy', 'is_severe']],
    on='datetime',
    how='left'
)
print(f"Merged weather data")
# Add calendar features
enforcement_lot['is_game_day'] = 0
enforcement_lot['is_dead_week'] = 0
enforcement_lot['is_finals_week'] = 0
enforcement_lot['is_spring_break'] = 0
enforcement_lot['is_thanksgiving_break'] = 0
enforcement_lot['is_winter_break'] = 0
enforcement_lot['is_any_break'] = 0
for _, event in calendar.iterrows():
    mask = (enforcement_lot['date'] >= event['Start_Date'].date()) & (enforcement_lot['date'] <= event['End_Date'].date())
    if 'Dead Week' in event['Event_Type']:
        enforcement_lot.loc[mask, 'is_dead_week'] = 1
    elif 'Finals Week' in event['Event_Type']:
        enforcement_lot.loc[mask, 'is_finals_week'] = 1
    elif 'Spring Break' in event['Event_Type']:
        enforcement_lot.loc[mask, 'is_spring_break'] = 1
        enforcement_lot.loc[mask, 'is_any_break'] = 1
    elif 'Thanksgiving' in event['Event_Type']:
        enforcement_lot.loc[mask, 'is_thanksgiving_break'] = 1
        enforcement_lot.loc[mask, 'is_any_break'] = 1
    elif 'Winter Break' in event['Event_Type']:
        enforcement_lot.loc[mask, 'is_winter_break'] = 1
        enforcement_lot.loc[mask, 'is_any_break'] = 1
# Game days
game_dates = games['Date'].dt.date.unique()
enforcement_lot.loc[enforcement_lot['date'].isin(game_dates), 'is_game_day'] = 1
print("Added calendar features")

Merged weather data
Added calendar features


## Add Time of Day Categories

In [13]:
def categorize_time_of_day(hour):
    if 6 <= hour < 12:
        return 'Morning'
    elif 12 <= hour < 17:
        return 'Afternoon'
    elif 17 <= hour < 21:
        return 'Evening'
    elif 21 <= hour < 24:
        return 'Night'
    else:
        return 'Late Night'
enforcement_lot['time_of_day'] = enforcement_lot['hour'].apply(categorize_time_of_day)
time_map = {'Afternoon': 0, 'Evening': 1, 'Late Night': 2, 'Morning': 3, 'Night': 4}
enforcement_lot['time_of_day_code'] = enforcement_lot['time_of_day'].map(time_map)
print("Added time of day categories")

Added time of day categories


## Save July 2022 - June 2025 Data

In [14]:
# Filter to actual LPR date range (July 2022 - June 2025)
enforcement_2022_2025 = enforcement_lot[
    (enforcement_lot['datetime'] >= '2022-07-01') &
    (enforcement_lot['datetime'] <= '2025-06-30')
].copy()
print(f"\nData 2022-2025: {len(enforcement_2022_2025):,} records")
print(f"Unique lots: {enforcement_2022_2025['Lot_number'].nunique()}")
print(f"Date range: {enforcement_2022_2025['datetime'].min()} to {enforcement_2022_2025['datetime'].max()}")
# Save
output_file = '../../data/processed/enforcement_lot_level.csv'
enforcement_2022_2025.to_csv(output_file, index=False)
print(f"\nSaved: {output_file}")


Data 2022-2025: 4,993,390 records
Unique lots: 190
Date range: 2022-07-01 00:00:00 to 2025-06-30 00:00:00

Saved: ../../data/processed/enforcement_lot_level.csv


## Extend to October 2025
Estimate LPR scans for July-October 2025 using historical patterns.

In [15]:
# Calculate historical LPR patterns by lot-day-hour-month
lpr_historical = enforcement_2022_2025[
    (enforcement_2022_2025['datetime'] >= '2022-07-01') &
    (enforcement_2022_2025['datetime'] <= '2024-12-31')
].copy()
lpr_patterns = lpr_historical.groupby(['Lot_number', 'day_of_week', 'hour', 'month']).agg({
    'lpr_scans': 'mean',
    'date': 'nunique'
}).reset_index()
lpr_patterns.columns = ['Lot_number', 'day_of_week', 'hour', 'month', 'avg_lpr_scans', 'num_dates']
# Only use patterns with at least 2 occurrences
lpr_patterns = lpr_patterns[lpr_patterns['num_dates'] >= 2]
print(f"Historical LPR patterns: {len(lpr_patterns):,} lot-day-hour-month combinations")
print(f"Patterns per lot: {lpr_patterns.groupby('Lot_number').size().describe()}")

Historical LPR patterns: 383,040 lot-day-hour-month combinations
Patterns per lot: count     190.0
mean     2016.0
std         0.0
min      2016.0
25%      2016.0
50%      2016.0
75%      2016.0
max      2016.0
dtype: float64


In [16]:
# Create July-October 2025 data
july_oct_2025 = enforcement_lot[
    (enforcement_lot['datetime'] >= '2025-07-01') &
    (enforcement_lot['datetime'] <= '2025-10-30')
].copy()
print(f"July-Oct 2025 before estimation: {len(july_oct_2025):,} records")
# Merge with historical patterns
july_oct_2025 = july_oct_2025.merge(
    lpr_patterns[['Lot_number', 'day_of_week', 'hour', 'month', 'avg_lpr_scans']],
    on=['Lot_number', 'day_of_week', 'hour', 'month'],
    how='left'
)
# Use estimated LPR where we have patterns
has_pattern = july_oct_2025['avg_lpr_scans'].notna()
july_oct_2025.loc[has_pattern, 'lpr_scans'] = july_oct_2025.loc[has_pattern, 'avg_lpr_scans'].round().astype(int)
july_oct_2025.loc[has_pattern, 'lpr_estimated'] = True
# Recalculate unpaid estimate and enforcement rate (using LPR as proxy)
july_oct_2025['unpaid_estimate'] = july_oct_2025['lpr_scans']
july_oct_2025['enforcement_rate'] = 0.0
mask = july_oct_2025['lpr_scans'] > 0
july_oct_2025.loc[mask, 'enforcement_rate'] = (
    july_oct_2025.loc[mask, 'tickets_issued'] / july_oct_2025.loc[mask, 'lpr_scans']
)
july_oct_2025 = july_oct_2025.drop('avg_lpr_scans', axis=1)
estimated_count = july_oct_2025['lpr_estimated'].sum()
print(f"\nEstimated LPR for {estimated_count:,} lot-hours ({estimated_count/len(july_oct_2025)*100:.1f}%)")
print(f"Total LPR scans July-Oct 2025: {july_oct_2025['lpr_scans'].sum():,}")
print(f"Total tickets July-Oct 2025: {july_oct_2025['tickets_issued'].sum():,}")

July-Oct 2025 before estimation: 0 records

Estimated LPR for 0 lot-hours (nan%)
Total LPR scans July-Oct 2025: 0
Total tickets July-Oct 2025: 0


## Combine and Save Full Extended Dataset

In [17]:
# Combine 2022-2025 with July-Oct 2025 extended
enforcement_full_extended = pd.concat([
    enforcement_2022_2025,
    july_oct_2025
], ignore_index=True)
# Sort by lot and datetime
enforcement_full_extended = enforcement_full_extended.sort_values(['Lot_number', 'datetime'])
print(f"\nFull extended dataset: {len(enforcement_full_extended):,} records")
print(f"Unique lots: {enforcement_full_extended['Lot_number'].nunique()}")
print(f"Date range: {enforcement_full_extended['datetime'].min()} to {enforcement_full_extended['datetime'].max()}")
print(f"\nRecords by year:")
for year in sorted(enforcement_full_extended['year'].unique()):
    count = len(enforcement_full_extended[enforcement_full_extended['year'] == year])
    print(f"  {year}: {count:,} records")
# Save
output_file_extended = '../../data/processed/enforcement_lot_level_extended.csv'
enforcement_full_extended.to_csv(output_file_extended, index=False)
print(f"\nSaved: {output_file_extended}")


Full extended dataset: 4,993,390 records
Unique lots: 190
Date range: 2022-07-01 00:00:00 to 2025-06-30 00:00:00

Records by year:
  2022: 839,040 records
  2023: 1,664,400 records
  2024: 1,668,960 records
  2025: 820,990 records

Saved: ../../data/processed/enforcement_lot_level_extended.csv


## Sample of Lot-Specific Patterns

In [18]:
# Show enforcement patterns for key paid lots
key_lots = [150, 71, 146]  # CUE Garage, Library Garage, Student Rec Center
lot_names = {150: 'CUE Garage', 71: 'Library Garage', 146: 'Student Rec Center'}
for lot in key_lots:
    if lot not in enforcement_full_extended['Lot_number'].values:
        continue
    lot_data = enforcement_full_extended[enforcement_full_extended['Lot_number'] == lot]
    tickets_total = lot_data['tickets_issued'].sum()
    lpr_total = lot_data['lpr_scans'].sum()
    hours_with_tickets = (lot_data['tickets_issued'] > 0).sum()
    print(f"\nLot {lot} - {lot_names.get(lot, 'Unknown')}:")
    print(f"  Total tickets: {tickets_total:,}")
    print(f"  Total LPR scans: {lpr_total:,}")
    print(f"  Hours with enforcement: {hours_with_tickets:,} ({hours_with_tickets/len(lot_data)*100:.1f}%)")
    # Check Monday 11 AM pattern
    monday_11am = lot_data[(lot_data['day_of_week'] == 0) & (lot_data['hour'] == 11)]
    if len(monday_11am) > 0:
        enf_rate = (monday_11am['tickets_issued'] > 0).mean()
        print(f"  Monday 11 AM enforcement rate: {enf_rate*100:.1f}%")


Lot 150 - CUE Garage:
  Total tickets: 14,258
  Total LPR scans: 179,315
  Hours with enforcement: 3,556 (13.5%)
  Monday 11 AM enforcement rate: 51.3%

Lot 71 - Library Garage:
  Total tickets: 13,418
  Total LPR scans: 120,700
  Hours with enforcement: 3,473 (13.2%)
  Monday 11 AM enforcement rate: 48.1%

Lot 146 - Student Rec Center:
  Total tickets: 6,137
  Total LPR scans: 88,426
  Hours with enforcement: 1,054 (4.0%)
  Monday 11 AM enforcement rate: 11.5%


## Summary
Created lot-level enforcement data:
- Each lot has separate hourly records
- Includes tickets, LPR scans, AMP sessions per lot
- Extended through October 2025
- Ready for lot-specific predictions