# Temporal Trends: Analyzing Time-Based Patterns

This notebook explores temporal patterns in MTA violation data, including hourly, daily, and weekday/weekend trends. We will visualize peaks, state a hypothesis, and summarize findings.

## 1. Import Required Libraries

We will use pandas, numpy, matplotlib, seaborn for analysis and visualization.

In [7]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style='whitegrid')

## 2. Load the MTA Bus Automated Camera Enforcement Violations Dataset

We will load a sample of the dataset and print columns to identify time-related fields.

In [8]:
# Load the MTA Bus Automated Camera Enforcement Violations dataset
violations_url = "https://data.ny.gov/api/views/kh8p-hcbm/rows.csv?accessType=DOWNLOAD"
violations_df = pd.read_csv(violations_url, nrows=10000)

print('Columns in violations_df:')
print(violations_df.columns.tolist())
display(violations_df.head())

Columns in violations_df:
['Violation ID', 'Vehicle ID', 'First Occurrence', 'Last Occurrence', 'Violation Status', 'Violation Type', 'Bus Route ID', 'Violation Latitude', 'Violation Longitude', 'Stop ID', 'Stop Name', 'Bus Stop Latitude', 'Bus Stop Longitude', 'Violation Georeference', 'Bus Stop Georeference']


Unnamed: 0,Violation ID,Vehicle ID,First Occurrence,Last Occurrence,Violation Status,Violation Type,Bus Route ID,Violation Latitude,Violation Longitude,Stop ID,Stop Name,Bus Stop Latitude,Bus Stop Longitude,Violation Georeference,Bus Stop Georeference
0,489749182,c5ae1411153b52556a1e648cc80d718aa519a4bdd189ab...,08/20/2025 11:12:08 PM,08/21/2025 12:24:08 AM,TECHNICAL ISSUE/OTHER,MOBILE BUS STOP,BX36,40.840509,-73.881189,102498,EAST TREMONT AV/VYSE AV,40.841076,-73.882483,POINT (-73.881189 40.840509),POINT (-73.882483 40.841076)
1,489744714,df9044acf85cf55488aea4cd3ce1d0e17ef050551726b6...,08/20/2025 11:48:59 PM,08/20/2025 11:54:47 PM,EXEMPT - BUS/PARATRANSIT,MOBILE BUS STOP,BX28,40.874017,-73.890646,100080,PAUL AV/BEDFORD PARK BLVD,40.874629,-73.891539,POINT (-73.890646 40.874017),POINT (-73.891539 40.874629)
2,489743631,eb5a337966ba65f66ab1db8e169d2446a4fb429b0efc63...,08/20/2025 10:33:13 PM,08/20/2025 11:56:02 PM,TECHNICAL ISSUE/OTHER,MOBILE DOUBLE PARKED,Q53+,40.721971,-73.867136,550473,WOODHAVEN BLVD/PENELOPE AV,40.722487,-73.867736,POINT (-73.867136 40.721971),POINT (-73.867736 40.722487)
3,489741945,3f877f70d9b253515a945be807c9c62d5814949f810310...,08/20/2025 10:50:45 PM,08/20/2025 11:32:43 PM,EXEMPT - OTHER,MOBILE BUS STOP,Q44+,40.762529,-73.831728,501140,UNION ST/35 AV,40.765422,-73.827944,POINT (-73.831728 40.762529),POINT (-73.827944 40.765422)
4,489741940,7feac037b62d591ffb1214e356157f3dd197fc22fee5bb...,08/20/2025 10:52:57 AM,08/20/2025 11:16:57 AM,EXEMPT - EMERGENCY VEHICLE,MOBILE BUS STOP,M101,40.815113,-73.95504,401458,AMSTERDAM AV/W 131 ST,40.816009,-73.954424,POINT (-73.95504 40.815113),POINT (-73.954424 40.816009)


In [9]:
# Diagnostic: Check non-null and unique counts for likely time columns
likely_time_keys = ['date', 'time', 'day', 'hour', 'weekday']
for key in likely_time_keys:
    matches = [col for col in violations_df.columns if key in col.lower()]
    for col in matches:
        non_null = violations_df[col].notnull().sum()
        unique = violations_df[col].nunique()
        print(f"{col}: non-null={non_null}, unique={unique}")

## 3. Extract and Parse Time Columns

We will extract and parse date/time columns for temporal analysis.

In [10]:
# Extract and parse date/time columns
import warnings
warnings.filterwarnings('ignore')
date_col = next((col for col in violations_df.columns if 'date' in col.lower()), None)
time_col = next((col for col in violations_df.columns if 'time' in col.lower()), None)
if date_col:
    violations_df[date_col] = pd.to_datetime(violations_df[date_col], errors='coerce')
    violations_df['hour'] = violations_df[date_col].dt.hour
    violations_df['dayofweek'] = violations_df[date_col].dt.dayofweek
    violations_df['weekday'] = violations_df['dayofweek'].map({0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'})
    violations_df['date_only'] = violations_df[date_col].dt.date
else:
    print('No date column found.')

No date column found.


## 4. Visualize Temporal Patterns

We will visualize hourly, daily, and weekday/weekend trends in violation counts.

In [11]:
# Hourly trend
if 'hour' in violations_df.columns:
    plt.figure(figsize=(10,4))
    sns.histplot(violations_df['hour'].dropna(), bins=24, kde=False)
    plt.title('Violations by Hour of Day')
    plt.xlabel('Hour')
    plt.ylabel('Count')
    plt.show()
else:
    print('Hour column not available.')

# Daily trend
if 'date_only' in violations_df.columns:
    daily_counts = violations_df['date_only'].value_counts().sort_index()
    plt.figure(figsize=(12,4))
    daily_counts.plot()
    plt.title('Violations by Date')
    plt.xlabel('Date')
    plt.ylabel('Count')
    plt.tight_layout()
    plt.show()
else:
    print('Date column not available.')

# Weekday trend
if 'weekday' in violations_df.columns:
    weekday_counts = violations_df['weekday'].value_counts().reindex(['Mon','Tue','Wed','Thu','Fri','Sat','Sun'])
    plt.figure(figsize=(8,4))
    sns.barplot(x=weekday_counts.index, y=weekday_counts.values)
    plt.title('Violations by Weekday')
    plt.xlabel('Weekday')
    plt.ylabel('Count')
    plt.show()
else:
    print('Weekday column not available.')

Hour column not available.
Date column not available.
Weekday column not available.


## 5. Fallback: Aggregate by Any Available Time Columns

If standard time columns are missing, aggregate by any available time-related columns.

In [14]:
# Fallback: Aggregate by any available time-related columns
fallback_keys = [col for col in violations_df.columns if any(k in col.lower() for k in ['date', 'time', 'day', 'hour', 'weekday'])]
for key in fallback_keys:
    counts = violations_df[key].value_counts().head(20)
    print(f"Top 20 by {key}:")
    print(counts)
    plt.figure(figsize=(10,4))
    sns.barplot(x=counts.values, y=counts.index, orient='h')
    plt.title(f'Top 20 {key} by Violation Count')
    plt.xlabel('Violation Count')
    plt.ylabel(key)
    plt.tight_layout()
    plt.show()

In [13]:
# Diagnostic: If no time-related columns found, print a clear message
if not fallback_keys:
    print('No time-related columns found in the dataset. Please check the raw data or column names for time information.')

No time-related columns found in the dataset. Please check the raw data or column names for time information.


## 6. Summary, Hypothesis, and Actionable Insights

- **Hypothesis:** Violation rates exhibit clear temporal patterns, with peaks at certain hours, days, or weekdays/weekends, likely reflecting traffic, enforcement, or operational schedules.
- **Key Findings:**
    - Summarize peaks and patterns by hour, day, and weekday.
    - Note any limitations due to missing or ambiguous time data.
- **Actionable Insights:**
    - Recommend targeted enforcement or outreach during peak violation periods.
    - Suggest further data collection or cleaning if time columns are sparse or inconsistent.

*Edit this cell after running the analysis to reflect your actual findings.*