# Traffic Flow Analysis: Urban Analytics & Traffic Engineering Fundamentals

## Overview

This notebook provides a comprehensive introduction to traffic flow analysis using real-world urban intersection data. We'll explore fundamental traffic engineering concepts and apply quantitative methods to understand traffic patterns, identify congestion, and support transportation planning decisions.

## Dataset Description

Our dataset contains **hourly traffic measurements** from multiple urban intersections, including:
- **Intersection identifiers** and names
- **Temporal information**: date, hour, day of week
- **Traffic metrics**: vehicle counts, average speeds, intersection capacity

## Methods & Concepts

We'll apply key traffic engineering principles:

### Volume-to-Capacity (V/C) Ratio
A fundamental metric comparing traffic demand to available capacity. Higher ratios indicate more congestion.

### Level of Service (LOS)
A qualitative measure describing operational conditions ranging from A (best) to F (worst):
- **LOS A**: V/C < 0.60 - Free flow, unrestricted operations
- **LOS B**: V/C 0.60-0.70 - Stable flow, slight delays
- **LOS C**: V/C 0.70-0.80 - Stable but restricted flow
- **LOS D**: V/C 0.80-0.90 - Approaching unstable conditions
- **LOS E**: V/C 0.90-1.00 - Unstable flow, at capacity
- **LOS F**: V/C > 1.00 - Forced flow, breakdown conditions

### Speed-Flow Relationship
The inverse relationship between traffic volume and average speed - as volume increases, speed typically decreases.

### Peak Hour Factor (PHF)
Measures traffic flow variability within peak periods, helping identify the most critical time intervals.

### Temporal Patterns
Understanding time-of-day variations, weekday vs. weekend patterns, and AM/PM peak periods.

## 1. Setup & Import Libraries

We'll use standard Python data science libraries for our analysis.

In [None]:
# Data manipulation and numerical computing
# Visualization
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

# Statistical analysis
from scipy.stats import pearsonr

# Configure visualization settings
plt.style.use("seaborn-v0_8-darkgrid")
sns.set_palette("husl")
plt.rcParams["figure.figsize"] = (12, 6)
plt.rcParams["font.size"] = 10

# Display settings
pd.set_option("display.max_columns", None)
pd.set_option("display.precision", 2)

print("Libraries imported successfully!")

## 2. Load and Explore Data

First, we'll load our traffic data and perform initial exploration to understand the dataset structure and coverage.

In [None]:
# Load the traffic data
# Note: Update the path to match your data location
df = pd.read_csv("traffic_data.csv")

# Convert date to datetime for time series analysis
df["date"] = pd.to_datetime(df["date"])

# Display first few rows
print("First 5 rows of traffic data:")
print(df.head())
print("\n" + "=" * 80 + "\n")

# Dataset dimensions
print(f"Dataset shape: {df.shape[0]:,} rows × {df.shape[1]} columns")
print(f"Date range: {df['date'].min()} to {df['date'].max()}")
print(f"Total days: {df['date'].nunique()}")

In [None]:
# Dataset information
print("Dataset Information:")
print(df.info())
print("\n" + "=" * 80 + "\n")

# Check for missing values
print("Missing values:")
print(df.isnull().sum())
print("\n" + "=" * 80 + "\n")

# Basic statistics
print("Summary statistics:")
print(df.describe())

In [None]:
# Intersection information
print("Unique Intersections:")
print(f"Total number of intersections: {df['intersection_id'].nunique()}\n")

intersection_summary = (
    df.groupby(["intersection_id", "intersection_name"])
    .agg({"vehicle_count": ["count", "mean", "max"], "avg_speed_mph": "mean", "capacity": "first"})
    .round(1)
)

intersection_summary.columns = [
    "Observations",
    "Avg_Vehicles",
    "Max_Vehicles",
    "Avg_Speed",
    "Capacity",
]
print(intersection_summary)

print("\n" + "=" * 80 + "\n")
print("Day of Week Distribution:")
print(df["day_of_week"].value_counts().sort_index())

## 3. Traffic Volume Analysis

Traffic volume (vehicle count) is the fundamental measure of demand. We'll analyze hourly patterns and identify typical volume trends throughout the day.

In [None]:
# Calculate average hourly volumes across all intersections
hourly_volumes = df.groupby("hour")["vehicle_count"].agg(["mean", "std", "min", "max"]).round(1)

print("Average Hourly Traffic Volumes:")
print(hourly_volumes)
print(
    f"\nPeak hour: {hourly_volumes['mean'].idxmax()}:00 with {hourly_volumes['mean'].max():.0f} vehicles"
)
print(
    f"Lowest hour: {hourly_volumes['mean'].idxmin()}:00 with {hourly_volumes['mean'].min():.0f} vehicles"
)

In [None]:
# Visualize hourly volume patterns
fig, axes = plt.subplots(2, 1, figsize=(14, 10))

# Plot 1: Average hourly volumes with confidence interval
hourly_avg = df.groupby("hour")["vehicle_count"].mean()
hourly_std = df.groupby("hour")["vehicle_count"].std()

axes[0].plot(
    hourly_avg.index, hourly_avg.values, marker="o", linewidth=2, markersize=6, label="Mean Volume"
)
axes[0].fill_between(
    hourly_avg.index,
    hourly_avg - hourly_std,
    hourly_avg + hourly_std,
    alpha=0.3,
    label="±1 Std Dev",
)
axes[0].set_xlabel("Hour of Day", fontsize=12, fontweight="bold")
axes[0].set_ylabel("Vehicle Count", fontsize=12, fontweight="bold")
axes[0].set_title("Average Hourly Traffic Volume Pattern", fontsize=14, fontweight="bold")
axes[0].grid(True, alpha=0.3)
axes[0].set_xticks(range(0, 24))
axes[0].legend()

# Add peak period annotations
am_peak = hourly_avg.index[6:10][hourly_avg[6:10].idxmax()]
pm_peak = hourly_avg.index[16:20][hourly_avg[16:20].idxmax()]
axes[0].axvspan(6, 10, alpha=0.1, color="orange", label="AM Peak Period")
axes[0].axvspan(16, 20, alpha=0.1, color="red", label="PM Peak Period")

# Plot 2: Volume by day of week
daily_volumes = (
    df.groupby("day_of_week")["vehicle_count"]
    .mean()
    .reindex(["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"])
)

colors = ["#1f77b4"] * 5 + ["#ff7f0e", "#ff7f0e"]  # Different colors for weekends
axes[1].bar(
    daily_volumes.index, daily_volumes.values, color=colors, edgecolor="black", linewidth=1.5
)
axes[1].set_xlabel("Day of Week", fontsize=12, fontweight="bold")
axes[1].set_ylabel("Average Vehicle Count", fontsize=12, fontweight="bold")
axes[1].set_title("Average Daily Traffic Volume by Day of Week", fontsize=14, fontweight="bold")
axes[1].grid(True, alpha=0.3, axis="y")
axes[1].tick_params(axis="x", rotation=45)

plt.tight_layout()
plt.show()

print(f"\nWeekday average: {daily_volumes[:5].mean():.0f} vehicles")
print(f"Weekend average: {daily_volumes[5:].mean():.0f} vehicles")
print(f"Weekday/Weekend ratio: {daily_volumes[:5].mean() / daily_volumes[5:].mean():.2f}x")

## 4. Speed Analysis

Average speed is a key indicator of traffic flow quality. According to traffic flow theory, speed and volume typically have an **inverse relationship** - as more vehicles use the road, speeds decrease due to increased interactions between vehicles.

In [None]:
# Speed statistics
print("Speed Statistics:")
print(f"Mean speed: {df['avg_speed_mph'].mean():.1f} mph")
print(f"Median speed: {df['avg_speed_mph'].median():.1f} mph")
print(f"Speed range: {df['avg_speed_mph'].min():.1f} - {df['avg_speed_mph'].max():.1f} mph")
print(f"Standard deviation: {df['avg_speed_mph'].std():.1f} mph")

# Hourly speed patterns
print("\n" + "=" * 80 + "\n")
hourly_speeds = df.groupby("hour")["avg_speed_mph"].mean()
print("Average Speed by Hour:")
print(hourly_speeds.round(1))

In [None]:
# Volume-Speed Relationship Analysis
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# Scatter plot: Volume vs Speed
sample_data = df.sample(min(5000, len(df)), random_state=42)  # Sample for clearer visualization
axes[0].scatter(
    sample_data["vehicle_count"], sample_data["avg_speed_mph"], alpha=0.3, s=20, edgecolors="none"
)

# Add trend line
z = np.polyfit(df["vehicle_count"], df["avg_speed_mph"], 2)
p = np.poly1d(z)
x_trend = np.linspace(df["vehicle_count"].min(), df["vehicle_count"].max(), 100)
axes[0].plot(x_trend, p(x_trend), "r-", linewidth=2, label="Trend (2nd order polynomial)")

axes[0].set_xlabel("Vehicle Count (vehicles/hour)", fontsize=12, fontweight="bold")
axes[0].set_ylabel("Average Speed (mph)", fontsize=12, fontweight="bold")
axes[0].set_title("Speed-Flow Relationship\n(Inverse Correlation)", fontsize=14, fontweight="bold")
axes[0].grid(True, alpha=0.3)
axes[0].legend()

# Calculate correlation
correlation, p_value = pearsonr(df["vehicle_count"], df["avg_speed_mph"])
axes[0].text(
    0.05,
    0.95,
    f"Correlation: {correlation:.3f}\np-value: {p_value:.2e}",
    transform=axes[0].transAxes,
    fontsize=10,
    verticalalignment="top",
    bbox={"boxstyle": "round", "facecolor": "wheat", "alpha": 0.5},
)

# Hourly speed pattern
hourly_speed_mean = df.groupby("hour")["avg_speed_mph"].mean()
hourly_speed_std = df.groupby("hour")["avg_speed_mph"].std()

axes[1].plot(
    hourly_speed_mean.index,
    hourly_speed_mean.values,
    marker="s",
    linewidth=2,
    markersize=6,
    color="green",
    label="Mean Speed",
)
axes[1].fill_between(
    hourly_speed_mean.index,
    hourly_speed_mean - hourly_speed_std,
    hourly_speed_mean + hourly_speed_std,
    alpha=0.3,
    color="green",
)
axes[1].set_xlabel("Hour of Day", fontsize=12, fontweight="bold")
axes[1].set_ylabel("Average Speed (mph)", fontsize=12, fontweight="bold")
axes[1].set_title("Average Speed Throughout the Day", fontsize=14, fontweight="bold")
axes[1].grid(True, alpha=0.3)
axes[1].set_xticks(range(0, 24))
axes[1].legend()

# Highlight peak periods
axes[1].axvspan(6, 10, alpha=0.1, color="orange")
axes[1].axvspan(16, 20, alpha=0.1, color="red")

plt.tight_layout()
plt.show()

print(f"\nSpeed-Volume Correlation: {correlation:.3f}")
print(
    f"Interpretation: {'Strong' if abs(correlation) > 0.7 else 'Moderate' if abs(correlation) > 0.4 else 'Weak'} negative correlation"
)
print(f"Statistical significance: p-value = {p_value:.2e}")

## 5. Congestion Metrics: V/C Ratio and Level of Service

The **Volume-to-Capacity (V/C) ratio** is the most important measure of congestion. It compares actual traffic volume to the maximum capacity of the intersection.

We'll calculate V/C ratios and assign **Level of Service (LOS)** grades based on standard thresholds used by transportation engineers.

In [None]:
# Calculate V/C ratio
df["vc_ratio"] = df["vehicle_count"] / df["capacity"]


# Assign Level of Service (LOS) based on V/C ratio
def assign_los(vc_ratio):
    """Assign Level of Service grade based on V/C ratio"""
    if vc_ratio < 0.60:
        return "A"
    elif vc_ratio < 0.70:
        return "B"
    elif vc_ratio < 0.80:
        return "C"
    elif vc_ratio < 0.90:
        return "D"
    elif vc_ratio <= 1.00:
        return "E"
    else:
        return "F"


df["los"] = df["vc_ratio"].apply(assign_los)

# Summary statistics
print("Volume-to-Capacity (V/C) Ratio Statistics:")
print(f"Mean V/C: {df['vc_ratio'].mean():.3f}")
print(f"Median V/C: {df['vc_ratio'].median():.3f}")
print(f"Max V/C: {df['vc_ratio'].max():.3f}")
print(
    f"\nPercentage over capacity (V/C > 1.0): {(df['vc_ratio'] > 1.0).sum() / len(df) * 100:.1f}%"
)
print(f"Percentage near capacity (V/C > 0.9): {(df['vc_ratio'] > 0.9).sum() / len(df) * 100:.1f}%")

print("\n" + "=" * 80 + "\n")
print("Level of Service (LOS) Distribution:")
los_dist = df["los"].value_counts().sort_index()
los_pct = (los_dist / len(df) * 100).round(1)
los_summary = pd.DataFrame({"Count": los_dist, "Percentage": los_pct})
print(los_summary)

# LOS descriptions
los_descriptions = {
    "A": "Free flow - minimal delays",
    "B": "Stable flow - slight delays",
    "C": "Stable but restricted",
    "D": "Approaching unstable",
    "E": "Unstable - at capacity",
    "F": "Forced flow - breakdown",
}

print("\n" + "=" * 80 + "\n")
print("LOS Grade Descriptions:")
for grade, desc in los_descriptions.items():
    print(f"  LOS {grade}: {desc}")

In [None]:
# Visualize V/C ratio and LOS
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# Plot 1: V/C Ratio Distribution
axes[0, 0].hist(df["vc_ratio"], bins=50, edgecolor="black", alpha=0.7, color="steelblue")
axes[0, 0].axvline(
    df["vc_ratio"].mean(),
    color="red",
    linestyle="--",
    linewidth=2,
    label=f"Mean: {df['vc_ratio'].mean():.2f}",
)
axes[0, 0].axvline(1.0, color="darkred", linestyle="-", linewidth=2, label="Capacity (V/C=1.0)")
axes[0, 0].set_xlabel("V/C Ratio", fontsize=12, fontweight="bold")
axes[0, 0].set_ylabel("Frequency", fontsize=12, fontweight="bold")
axes[0, 0].set_title("Distribution of Volume-to-Capacity Ratios", fontsize=14, fontweight="bold")
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Plot 2: LOS Distribution
los_colors = {
    "A": "#2ecc71",
    "B": "#f1c40f",
    "C": "#e67e22",
    "D": "#e74c3c",
    "E": "#c0392b",
    "F": "#8b0000",
}
los_data = df["los"].value_counts().sort_index()
colors = [los_colors[grade] for grade in los_data.index]
axes[0, 1].bar(los_data.index, los_data.values, color=colors, edgecolor="black", linewidth=1.5)
axes[0, 1].set_xlabel("Level of Service", fontsize=12, fontweight="bold")
axes[0, 1].set_ylabel("Count", fontsize=12, fontweight="bold")
axes[0, 1].set_title("Level of Service Distribution", fontsize=14, fontweight="bold")
axes[0, 1].grid(True, alpha=0.3, axis="y")

# Add percentage labels on bars
for i, (_, count) in enumerate(los_data.items()):
    pct = count / len(df) * 100
    axes[0, 1].text(i, count, f"{pct:.1f}%", ha="center", va="bottom", fontweight="bold")

# Plot 3: Average V/C by Hour
hourly_vc = df.groupby("hour")["vc_ratio"].mean()
axes[1, 0].plot(
    hourly_vc.index, hourly_vc.values, marker="o", linewidth=2, markersize=6, color="darkblue"
)
axes[1, 0].axhline(
    y=0.90, color="orange", linestyle="--", linewidth=2, label="LOS E threshold (0.90)"
)
axes[1, 0].axhline(y=1.00, color="red", linestyle="--", linewidth=2, label="Capacity (1.00)")
axes[1, 0].fill_between(hourly_vc.index, 0.90, 1.00, alpha=0.2, color="orange")
axes[1, 0].fill_between(hourly_vc.index, 1.00, hourly_vc.values.max(), alpha=0.2, color="red")
axes[1, 0].set_xlabel("Hour of Day", fontsize=12, fontweight="bold")
axes[1, 0].set_ylabel("Average V/C Ratio", fontsize=12, fontweight="bold")
axes[1, 0].set_title("Average V/C Ratio by Hour", fontsize=14, fontweight="bold")
axes[1, 0].set_xticks(range(0, 24))
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Plot 4: LOS by Day of Week
day_order = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
daily_vc = df.groupby("day_of_week")["vc_ratio"].mean().reindex(day_order)
bar_colors = [
    "red" if vc > 0.9 else "orange" if vc > 0.8 else "yellow" if vc > 0.7 else "green"
    for vc in daily_vc.values
]
axes[1, 1].bar(
    range(len(daily_vc)), daily_vc.values, color=bar_colors, edgecolor="black", linewidth=1.5
)
axes[1, 1].axhline(y=0.90, color="orange", linestyle="--", linewidth=2, alpha=0.7)
axes[1, 1].set_xlabel("Day of Week", fontsize=12, fontweight="bold")
axes[1, 1].set_ylabel("Average V/C Ratio", fontsize=12, fontweight="bold")
axes[1, 1].set_title("Average V/C Ratio by Day of Week", fontsize=14, fontweight="bold")
axes[1, 1].set_xticks(range(len(day_order)))
axes[1, 1].set_xticklabels(day_order, rotation=45, ha="right")
axes[1, 1].grid(True, alpha=0.3, axis="y")

plt.tight_layout()
plt.show()

## 6. Time-of-Day Patterns: Heatmap Analysis

Heatmaps provide a comprehensive view of traffic patterns across different times and days, helping identify systematic congestion patterns.

In [None]:
# Create pivot tables for heatmaps
day_order = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]

# Volume heatmap data
volume_pivot = df.pivot_table(
    values="vehicle_count", index="day_of_week", columns="hour", aggfunc="mean"
).reindex(day_order)

# V/C ratio heatmap data
vc_pivot = df.pivot_table(
    values="vc_ratio", index="day_of_week", columns="hour", aggfunc="mean"
).reindex(day_order)

# Speed heatmap data
speed_pivot = df.pivot_table(
    values="avg_speed_mph", index="day_of_week", columns="hour", aggfunc="mean"
).reindex(day_order)

# Create heatmaps
fig, axes = plt.subplots(3, 1, figsize=(16, 12))

# Heatmap 1: Traffic Volume
sns.heatmap(
    volume_pivot,
    annot=True,
    fmt=".0f",
    cmap="YlOrRd",
    cbar_kws={"label": "Vehicles/Hour"},
    ax=axes[0],
    linewidths=0.5,
)
axes[0].set_title("Average Traffic Volume by Day and Hour", fontsize=14, fontweight="bold", pad=20)
axes[0].set_xlabel("Hour of Day", fontsize=12, fontweight="bold")
axes[0].set_ylabel("Day of Week", fontsize=12, fontweight="bold")

# Heatmap 2: V/C Ratio
sns.heatmap(
    vc_pivot,
    annot=True,
    fmt=".2f",
    cmap="RdYlGn_r",
    cbar_kws={"label": "V/C Ratio"},
    ax=axes[1],
    linewidths=0.5,
    vmin=0,
    vmax=1.2,
)
axes[1].set_title(
    "Volume-to-Capacity Ratio by Day and Hour", fontsize=14, fontweight="bold", pad=20
)
axes[1].set_xlabel("Hour of Day", fontsize=12, fontweight="bold")
axes[1].set_ylabel("Day of Week", fontsize=12, fontweight="bold")

# Heatmap 3: Average Speed
sns.heatmap(
    speed_pivot,
    annot=True,
    fmt=".1f",
    cmap="RdYlGn",
    cbar_kws={"label": "Speed (mph)"},
    ax=axes[2],
    linewidths=0.5,
)
axes[2].set_title("Average Speed by Day and Hour", fontsize=14, fontweight="bold", pad=20)
axes[2].set_xlabel("Hour of Day", fontsize=12, fontweight="bold")
axes[2].set_ylabel("Day of Week", fontsize=12, fontweight="bold")

plt.tight_layout()
plt.show()

# Identify peak congestion times
print("\nPeak Congestion Times (Top 5 V/C ratios):")
peak_times = df.nlargest(5, "vc_ratio")[
    ["day_of_week", "hour", "intersection_name", "vc_ratio", "los"]
]
print(peak_times.to_string(index=False))

## 7. Rush Hour Analysis

**Rush hours** are critical planning periods with peak traffic demand. We'll identify and quantify AM and PM peak periods and calculate the **Peak Hour Factor (PHF)**, which measures flow rate variability within the peak hour.

In [None]:
# Define peak periods
df["period"] = "Off-Peak"
df.loc[df["hour"].between(6, 9), "period"] = "AM Peak"
df.loc[df["hour"].between(16, 19), "period"] = "PM Peak"
df.loc[df["hour"].between(10, 15), "period"] = "Midday"
df.loc[df["hour"].between(20, 23) | df["hour"].between(0, 5), "period"] = "Off-Peak"

# Peak period statistics
period_stats = (
    df.groupby("period")
    .agg(
        {
            "vehicle_count": ["mean", "max", "std"],
            "avg_speed_mph": "mean",
            "vc_ratio": "mean",
            "los": lambda x: x.mode()[0] if len(x.mode()) > 0 else "N/A",
        }
    )
    .round(2)
)

period_stats.columns = [
    "Avg_Volume",
    "Max_Volume",
    "Std_Volume",
    "Avg_Speed",
    "Avg_VC",
    "Modal_LOS",
]
period_order = ["AM Peak", "Midday", "PM Peak", "Off-Peak"]
period_stats = period_stats.reindex(period_order)

print("Traffic Statistics by Time Period:")
print(period_stats)
print("\n" + "=" * 80 + "\n")

# Calculate Peak Hour Factor (PHF)
# PHF = (Total hourly volume) / (4 × peak 15-min volume)
# Since we have hourly data, we'll approximate using: PHF = Mean / Max
am_peak_data = df[df["period"] == "AM Peak"]
pm_peak_data = df[df["period"] == "PM Peak"]

am_phf = am_peak_data["vehicle_count"].mean() / am_peak_data["vehicle_count"].max()
pm_phf = pm_peak_data["vehicle_count"].mean() / pm_peak_data["vehicle_count"].max()

print("Peak Hour Factor (PHF) Estimates:")
print(f"AM Peak PHF: {am_phf:.3f}")
print(f"PM Peak PHF: {pm_phf:.3f}")
print("\nInterpretation:")
print("  PHF closer to 1.0 indicates more uniform flow throughout the peak period.")
print("  PHF closer to 0.25 indicates highly peaked, concentrated demand.")
print("  Typical urban PHF values range from 0.80 to 0.95.")

In [None]:
# Visualize rush hour patterns
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# Plot 1: Volume by period
period_volumes = df.groupby("period")["vehicle_count"].mean().reindex(period_order)
colors_period = ["#ff9999", "#66b3ff", "#ff6666", "#99ff99"]
axes[0, 0].bar(
    period_volumes.index,
    period_volumes.values,
    color=colors_period,
    edgecolor="black",
    linewidth=1.5,
)
axes[0, 0].set_ylabel("Average Vehicle Count", fontsize=12, fontweight="bold")
axes[0, 0].set_title("Average Traffic Volume by Time Period", fontsize=14, fontweight="bold")
axes[0, 0].grid(True, alpha=0.3, axis="y")
axes[0, 0].tick_params(axis="x", rotation=45)

# Plot 2: V/C ratio by period
period_vc = df.groupby("period")["vc_ratio"].mean().reindex(period_order)
axes[0, 1].bar(
    period_vc.index, period_vc.values, color=colors_period, edgecolor="black", linewidth=1.5
)
axes[0, 1].axhline(y=0.90, color="red", linestyle="--", linewidth=2, label="LOS E threshold")
axes[0, 1].set_ylabel("Average V/C Ratio", fontsize=12, fontweight="bold")
axes[0, 1].set_title("Average V/C Ratio by Time Period", fontsize=14, fontweight="bold")
axes[0, 1].grid(True, alpha=0.3, axis="y")
axes[0, 1].tick_params(axis="x", rotation=45)
axes[0, 1].legend()

# Plot 3: Box plot of volumes by period
period_data = [df[df["period"] == p]["vehicle_count"].values for p in period_order]
bp = axes[1, 0].boxplot(period_data, labels=period_order, patch_artist=True, showmeans=True)
for patch, color in zip(bp["boxes"], colors_period):
    patch.set_facecolor(color)
axes[1, 0].set_ylabel("Vehicle Count", fontsize=12, fontweight="bold")
axes[1, 0].set_title("Traffic Volume Distribution by Period", fontsize=14, fontweight="bold")
axes[1, 0].grid(True, alpha=0.3, axis="y")
axes[1, 0].tick_params(axis="x", rotation=45)

# Plot 4: LOS distribution by period
los_by_period = pd.crosstab(df["period"], df["los"], normalize="index") * 100
los_by_period = los_by_period.reindex(period_order)
los_by_period.plot(
    kind="bar",
    stacked=True,
    ax=axes[1, 1],
    color=[los_colors.get(col, "gray") for col in los_by_period.columns],
    edgecolor="black",
    linewidth=0.5,
)
axes[1, 1].set_ylabel("Percentage (%)", fontsize=12, fontweight="bold")
axes[1, 1].set_title("Level of Service Distribution by Period", fontsize=14, fontweight="bold")
axes[1, 1].legend(title="LOS", bbox_to_anchor=(1.05, 1), loc="upper left")
axes[1, 1].tick_params(axis="x", rotation=45)
axes[1, 1].grid(True, alpha=0.3, axis="y")

plt.tight_layout()
plt.show()

## 8. Intersection Comparison

Different intersections may experience varying levels of congestion due to location, design, or surrounding land use. We'll compare performance metrics across all monitored intersections.

In [None]:
# Intersection-level summary statistics
intersection_stats = (
    df.groupby(["intersection_id", "intersection_name"])
    .agg(
        {
            "vehicle_count": ["mean", "max", "std"],
            "avg_speed_mph": "mean",
            "vc_ratio": ["mean", "max"],
            "capacity": "first",
            "los": lambda x: x.mode()[0] if len(x.mode()) > 0 else "N/A",
        }
    )
    .round(2)
)

intersection_stats.columns = [
    "Avg_Volume",
    "Peak_Volume",
    "Volume_Std",
    "Avg_Speed",
    "Avg_VC",
    "Peak_VC",
    "Capacity",
    "Modal_LOS",
]
intersection_stats = intersection_stats.sort_values("Avg_VC", ascending=False)

print("Intersection Performance Comparison:")
print("(Sorted by Average V/C Ratio - Most Congested First)")
print(intersection_stats)
print("\n" + "=" * 80 + "\n")

# Identify problem intersections
problem_threshold = 0.85
problem_intersections = intersection_stats[intersection_stats["Avg_VC"] > problem_threshold]
print(f"\nProblem Intersections (Avg V/C > {problem_threshold}):")
print(f"Count: {len(problem_intersections)}")
if len(problem_intersections) > 0:
    print(problem_intersections[["Avg_VC", "Peak_VC", "Modal_LOS"]])

In [None]:
# Visualize intersection comparisons
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# Get intersection names for x-axis
intersection_names = intersection_stats.index.get_level_values("intersection_name")
x_pos = np.arange(len(intersection_names))

# Plot 1: Average V/C by intersection
colors_vc = [
    "red" if v > 0.9 else "orange" if v > 0.8 else "yellow" if v > 0.7 else "green"
    for v in intersection_stats["Avg_VC"].values
]
axes[0, 0].barh(x_pos, intersection_stats["Avg_VC"].values, color=colors_vc, edgecolor="black")
axes[0, 0].set_yticks(x_pos)
axes[0, 0].set_yticklabels(intersection_names, fontsize=9)
axes[0, 0].set_xlabel("Average V/C Ratio", fontsize=12, fontweight="bold")
axes[0, 0].set_title(
    "Average V/C Ratio by Intersection\n(Most Congested at Top)", fontsize=14, fontweight="bold"
)
axes[0, 0].axvline(x=0.9, color="red", linestyle="--", linewidth=2, alpha=0.5)
axes[0, 0].grid(True, alpha=0.3, axis="x")
axes[0, 0].invert_yaxis()

# Plot 2: Average vs Peak V/C
axes[0, 1].scatter(
    intersection_stats["Avg_VC"],
    intersection_stats["Peak_VC"],
    s=200,
    alpha=0.6,
    c=colors_vc,
    edgecolors="black",
    linewidth=2,
)
axes[0, 1].plot([0, 1.2], [0, 1.2], "k--", alpha=0.3, label="Reference line")
axes[0, 1].axhline(y=1.0, color="red", linestyle="--", linewidth=2, alpha=0.5, label="Capacity")
axes[0, 1].axvline(
    x=0.9, color="orange", linestyle="--", linewidth=2, alpha=0.5, label="LOS E threshold"
)
axes[0, 1].set_xlabel("Average V/C Ratio", fontsize=12, fontweight="bold")
axes[0, 1].set_ylabel("Peak V/C Ratio", fontsize=12, fontweight="bold")
axes[0, 1].set_title("Average vs Peak V/C Ratio by Intersection", fontsize=14, fontweight="bold")
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Annotate points
for idx, name in enumerate(intersection_names):
    axes[0, 1].annotate(
        name,
        (intersection_stats["Avg_VC"].iloc[idx], intersection_stats["Peak_VC"].iloc[idx]),
        fontsize=8,
        alpha=0.7,
        xytext=(5, 5),
        textcoords="offset points",
    )

# Plot 3: Traffic volume comparison
axes[1, 0].barh(
    x_pos, intersection_stats["Avg_Volume"].values, color="steelblue", edgecolor="black", alpha=0.7
)
axes[1, 0].set_yticks(x_pos)
axes[1, 0].set_yticklabels(intersection_names, fontsize=9)
axes[1, 0].set_xlabel("Average Vehicles per Hour", fontsize=12, fontweight="bold")
axes[1, 0].set_title("Average Traffic Volume by Intersection", fontsize=14, fontweight="bold")
axes[1, 0].grid(True, alpha=0.3, axis="x")
axes[1, 0].invert_yaxis()

# Plot 4: LOS distribution by intersection
los_by_intersection = pd.crosstab(df["intersection_name"], df["los"], normalize="index") * 100
los_by_intersection = los_by_intersection.reindex(intersection_names)
los_by_intersection.plot(
    kind="barh",
    stacked=True,
    ax=axes[1, 1],
    color=[los_colors.get(col, "gray") for col in los_by_intersection.columns],
    edgecolor="black",
    linewidth=0.5,
)
axes[1, 1].set_xlabel("Percentage (%)", fontsize=12, fontweight="bold")
axes[1, 1].set_title("LOS Distribution by Intersection", fontsize=14, fontweight="bold")
axes[1, 1].legend(title="LOS", bbox_to_anchor=(1.05, 1), loc="upper left")
axes[1, 1].grid(True, alpha=0.3, axis="x")

plt.tight_layout()
plt.show()

## 9. Summary Statistics & Key Metrics Dashboard

A comprehensive summary of all key traffic metrics and performance indicators.

In [None]:
# Calculate comprehensive summary statistics
print("=" * 80)
print(" " * 20 + "TRAFFIC FLOW ANALYSIS SUMMARY")
print("=" * 80)
print()

# Dataset overview
print("DATASET OVERVIEW")
print("-" * 80)
print(
    f"Analysis Period: {df['date'].min().strftime('%Y-%m-%d')} to {df['date'].max().strftime('%Y-%m-%d')}"
)
print(f"Total Observations: {len(df):,}")
print(f"Intersections Monitored: {df['intersection_id'].nunique()}")
print(f"Days Analyzed: {df['date'].nunique()}")
print()

# Traffic volume metrics
print("TRAFFIC VOLUME METRICS")
print("-" * 80)
print(f"Average Hourly Volume: {df['vehicle_count'].mean():.0f} vehicles")
print(f"Peak Hourly Volume: {df['vehicle_count'].max():.0f} vehicles")
print(f"Total Vehicles Counted: {df['vehicle_count'].sum():,.0f}")
print(f"Average Daily Volume: {df.groupby('date')['vehicle_count'].sum().mean():,.0f} vehicles")
print()

# Speed metrics
print("SPEED METRICS")
print("-" * 80)
print(f"Average Speed: {df['avg_speed_mph'].mean():.1f} mph")
print(f"Minimum Speed: {df['avg_speed_mph'].min():.1f} mph")
print(f"Maximum Speed: {df['avg_speed_mph'].max():.1f} mph")
print(f"Speed Standard Deviation: {df['avg_speed_mph'].std():.1f} mph")
correlation, p_value = pearsonr(df["vehicle_count"], df["avg_speed_mph"])
print(f"Volume-Speed Correlation: {correlation:.3f} (p < {p_value:.2e})")
print()

# Congestion metrics
print("CONGESTION METRICS")
print("-" * 80)
print(f"Average V/C Ratio: {df['vc_ratio'].mean():.3f}")
print(f"Maximum V/C Ratio: {df['vc_ratio'].max():.3f}")
print(
    f"Hours Over Capacity (V/C > 1.0): {(df['vc_ratio'] > 1.0).sum():,} ({(df['vc_ratio'] > 1.0).sum() / len(df) * 100:.1f}%)"
)
print(
    f"Hours Near Capacity (V/C > 0.9): {(df['vc_ratio'] > 0.9).sum():,} ({(df['vc_ratio'] > 0.9).sum() / len(df) * 100:.1f}%)"
)
print(
    f"Hours with Good Flow (V/C < 0.7): {(df['vc_ratio'] < 0.7).sum():,} ({(df['vc_ratio'] < 0.7).sum() / len(df) * 100:.1f}%)"
)
print()

# Level of Service distribution
print("LEVEL OF SERVICE DISTRIBUTION")
print("-" * 80)
los_counts = df["los"].value_counts().sort_index()
for grade in ["A", "B", "C", "D", "E", "F"]:
    if grade in los_counts.index:
        count = los_counts[grade]
        pct = count / len(df) * 100
        print(f"  LOS {grade}: {count:6,} observations ({pct:5.1f}%) - {los_descriptions[grade]}")
    else:
        print(f"  LOS {grade}:      0 observations (  0.0%) - {los_descriptions[grade]}")
print()

# Peak period analysis
print("PEAK PERIOD ANALYSIS")
print("-" * 80)
am_peak = df[df["period"] == "AM Peak"]
pm_peak = df[df["period"] == "PM Peak"]
print("AM Peak (6-9 AM):")
print(f"  Average Volume: {am_peak['vehicle_count'].mean():.0f} vehicles")
print(f"  Average V/C: {am_peak['vc_ratio'].mean():.3f}")
print(f"  Average Speed: {am_peak['avg_speed_mph'].mean():.1f} mph")
print(f"  Modal LOS: {am_peak['los'].mode()[0] if len(am_peak['los'].mode()) > 0 else 'N/A'}")
print("PM Peak (4-7 PM):")
print(f"  Average Volume: {pm_peak['vehicle_count'].mean():.0f} vehicles")
print(f"  Average V/C: {pm_peak['vc_ratio'].mean():.3f}")
print(f"  Average Speed: {pm_peak['avg_speed_mph'].mean():.1f} mph")
print(f"  Modal LOS: {pm_peak['los'].mode()[0] if len(pm_peak['los'].mode()) > 0 else 'N/A'}")
print()

# Most congested locations
print("TOP 3 MOST CONGESTED INTERSECTIONS")
print("-" * 80)
top_congested = (
    df.groupby("intersection_name")["vc_ratio"].mean().sort_values(ascending=False).head(3)
)
for rank, (name, vc) in enumerate(top_congested.items(), 1):
    print(f"{rank}. {name}: Average V/C = {vc:.3f}")
print()

# Temporal patterns
print("TEMPORAL PATTERNS")
print("-" * 80)
peak_hour = df.groupby("hour")["vehicle_count"].mean().idxmax()
peak_day = df.groupby("day_of_week")["vehicle_count"].mean().idxmax()
print(f"Busiest Hour: {peak_hour}:00")
print(f"Busiest Day: {peak_day}")
weekday_avg = df[df["day_of_week"].isin(["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"])][
    "vehicle_count"
].mean()
weekend_avg = df[df["day_of_week"].isin(["Saturday", "Sunday"])]["vehicle_count"].mean()
print(f"Weekday vs Weekend Volume Ratio: {weekday_avg / weekend_avg:.2f}x")
print()

print("=" * 80)

In [None]:
# Create a visual dashboard
fig = plt.figure(figsize=(16, 10))
gs = fig.add_gridspec(3, 3, hspace=0.3, wspace=0.3)

# Metric 1: Overall LOS distribution (pie chart)
ax1 = fig.add_subplot(gs[0, 0])
los_counts = df["los"].value_counts().sort_index()
colors_pie = [los_colors[grade] for grade in los_counts.index]
ax1.pie(
    los_counts.values,
    labels=los_counts.index,
    autopct="%1.1f%%",
    colors=colors_pie,
    startangle=90,
    textprops={"fontsize": 10, "fontweight": "bold"},
)
ax1.set_title("Overall LOS\nDistribution", fontsize=12, fontweight="bold")

# Metric 2: Average V/C by hour (line)
ax2 = fig.add_subplot(gs[0, 1:])
hourly_vc = df.groupby("hour")["vc_ratio"].mean()
ax2.plot(hourly_vc.index, hourly_vc.values, marker="o", linewidth=3, markersize=8, color="darkred")
ax2.axhline(y=0.90, color="orange", linestyle="--", linewidth=2, alpha=0.7)
ax2.axhline(y=1.00, color="red", linestyle="--", linewidth=2, alpha=0.7)
ax2.fill_between(hourly_vc.index, 0, hourly_vc.values, alpha=0.3, color="darkred")
ax2.set_xlabel("Hour", fontsize=11, fontweight="bold")
ax2.set_ylabel("V/C Ratio", fontsize=11, fontweight="bold")
ax2.set_title("V/C Ratio Throughout Day", fontsize=12, fontweight="bold")
ax2.set_xticks(range(0, 24, 2))
ax2.grid(True, alpha=0.3)

# Metric 3: Volume by period (bar)
ax3 = fig.add_subplot(gs[1, 0])
period_vol = df.groupby("period")["vehicle_count"].mean().reindex(period_order)
ax3.bar(
    range(len(period_vol)), period_vol.values, color=colors_period, edgecolor="black", linewidth=1.5
)
ax3.set_xticks(range(len(period_order)))
ax3.set_xticklabels(period_order, rotation=45, ha="right", fontsize=9)
ax3.set_ylabel("Vehicles", fontsize=11, fontweight="bold")
ax3.set_title("Volume by Period", fontsize=12, fontweight="bold")
ax3.grid(True, alpha=0.3, axis="y")

# Metric 4: Speed vs Volume scatter
ax4 = fig.add_subplot(gs[1, 1])
sample = df.sample(min(2000, len(df)), random_state=42)
scatter = ax4.scatter(
    sample["vehicle_count"],
    sample["avg_speed_mph"],
    c=sample["vc_ratio"],
    cmap="RdYlGn_r",
    alpha=0.5,
    s=15,
    vmin=0,
    vmax=1.2,
)
ax4.set_xlabel("Volume", fontsize=11, fontweight="bold")
ax4.set_ylabel("Speed (mph)", fontsize=11, fontweight="bold")
ax4.set_title("Speed-Flow Relationship", fontsize=12, fontweight="bold")
plt.colorbar(scatter, ax=ax4, label="V/C Ratio")
ax4.grid(True, alpha=0.3)

# Metric 5: Weekday vs Weekend
ax5 = fig.add_subplot(gs[1, 2])
df["day_type"] = df["day_of_week"].apply(
    lambda x: "Weekend" if x in ["Saturday", "Sunday"] else "Weekday"
)
daytype_stats = df.groupby("day_type").agg(
    {"vehicle_count": "mean", "vc_ratio": "mean", "avg_speed_mph": "mean"}
)
x = np.arange(len(daytype_stats.index))
width = 0.25
ax5_2 = ax5.twinx()
ax5.bar(
    x - width,
    daytype_stats["vehicle_count"],
    width,
    label="Volume",
    color="steelblue",
    edgecolor="black",
)
ax5.bar(
    x, daytype_stats["vc_ratio"] * 1000, width, label="V/C×1000", color="orange", edgecolor="black"
)
ax5_2.bar(
    x + width,
    daytype_stats["avg_speed_mph"],
    width,
    label="Speed",
    color="green",
    edgecolor="black",
)
ax5.set_xticks(x)
ax5.set_xticklabels(daytype_stats.index)
ax5.set_ylabel("Volume / V/C×1000", fontsize=10, fontweight="bold")
ax5_2.set_ylabel("Speed (mph)", fontsize=10, fontweight="bold")
ax5.set_title("Weekday vs Weekend", fontsize=12, fontweight="bold")
ax5.legend(loc="upper left", fontsize=8)
ax5_2.legend(loc="upper right", fontsize=8)

# Metric 6: Key statistics table
ax6 = fig.add_subplot(gs[2, :])
ax6.axis("off")

# Create summary table data
table_data = [
    ["Metric", "Value", "Metric", "Value"],
    [
        "Avg Volume",
        f"{df['vehicle_count'].mean():.0f} veh/hr",
        "Avg Speed",
        f"{df['avg_speed_mph'].mean():.1f} mph",
    ],
    [
        "Peak Volume",
        f"{df['vehicle_count'].max():.0f} veh/hr",
        "Min Speed",
        f"{df['avg_speed_mph'].min():.1f} mph",
    ],
    [
        "Avg V/C Ratio",
        f"{df['vc_ratio'].mean():.3f}",
        "Peak Hour",
        f"{df.groupby('hour')['vehicle_count'].mean().idxmax()}:00",
    ],
    [
        "% Over Capacity",
        f"{(df['vc_ratio'] > 1.0).sum() / len(df) * 100:.1f}%",
        "Modal LOS",
        f"{df['los'].mode()[0]}",
    ],
    ["Intersections", f"{df['intersection_id'].nunique()}", "Observations", f"{len(df):,}"],
]

table = ax6.table(cellText=table_data, cellLoc="center", loc="center", bbox=[0, 0, 1, 1])
table.auto_set_font_size(False)
table.set_fontsize(10)
table.scale(1, 2)

# Style header row
for i in range(4):
    cell = table[(0, i)]
    cell.set_facecolor("#4472C4")
    cell.set_text_props(weight="bold", color="white")

# Alternate row colors
for i in range(1, len(table_data)):
    for j in range(4):
        cell = table[(i, j)]
        if i % 2 == 0:
            cell.set_facecolor("#E7E6E6")
        else:
            cell.set_facecolor("#FFFFFF")

fig.suptitle("Traffic Flow Analysis Dashboard", fontsize=18, fontweight="bold", y=0.98)
plt.show()

## 10. Planning Recommendations

Based on our analysis, here are key findings and recommendations for transportation planners and traffic engineers.

In [None]:
# Generate automated recommendations based on analysis
print("=" * 80)
print(" " * 20 + "PLANNING RECOMMENDATIONS")
print("=" * 80)
print()

# 1. Identify critical intersections
critical_intersections = (
    df.groupby("intersection_name")["vc_ratio"].mean().sort_values(ascending=False).head(3)
)
print("1. PRIORITY INTERSECTIONS FOR IMPROVEMENT")
print("-" * 80)
print("   The following intersections show the highest average congestion:")
print()
for rank, (name, vc) in enumerate(critical_intersections.items(), 1):
    los = df[df["intersection_name"] == name]["los"].mode()[0]
    print(f"   {rank}. {name}")
    print(f"      - Average V/C: {vc:.3f}")
    print(f"      - Modal LOS: {los}")
    if vc > 0.95:
        print("      - CRITICAL: Operating at or above capacity")
        print("      - Recommend: Capacity expansion or demand management")
    elif vc > 0.85:
        print("      - WARNING: Approaching capacity during peak periods")
        print("      - Recommend: Signal timing optimization and monitoring")
    print()

# 2. Peak period management
print("2. PEAK PERIOD MANAGEMENT")
print("-" * 80)
am_vc = df[df["period"] == "AM Peak"]["vc_ratio"].mean()
pm_vc = df[df["period"] == "PM Peak"]["vc_ratio"].mean()
print(f"   AM Peak Average V/C: {am_vc:.3f}")
print(f"   PM Peak Average V/C: {pm_vc:.3f}")
print()
if pm_vc > am_vc * 1.1:
    print("   PM peak experiences significantly higher congestion than AM peak.")
    print("   Recommendations:")
    print("   - Implement afternoon peak-hour pricing or restrictions")
    print("   - Promote flexible work schedules to spread PM demand")
    print("   - Consider dedicated transit lanes during PM hours")
elif am_vc > pm_vc * 1.1:
    print("   AM peak experiences significantly higher congestion than PM peak.")
    print("   Recommendations:")
    print("   - Implement morning peak-hour pricing or restrictions")
    print("   - Promote staggered work start times")
    print("   - Enhance transit service during AM hours")
else:
    print("   Both peak periods show similar congestion levels.")
    print("   Recommendations:")
    print("   - Implement all-day congestion management strategies")
    print("   - Focus on modal shift to transit, cycling, walking")
    print("   - Consider adaptive signal timing for both peaks")
print()

# 3. Speed management
print("3. SPEED AND SAFETY CONSIDERATIONS")
print("-" * 80)
low_speed_pct = (df["avg_speed_mph"] < 15).sum() / len(df) * 100
print(f"   Percentage of hours with speeds < 15 mph: {low_speed_pct:.1f}%")
if low_speed_pct > 10:
    print("   High frequency of very low speeds indicates severe congestion.")
    print("   Recommendations:")
    print("   - Investigate bottlenecks and geometric constraints")
    print("   - Consider grade-separated improvements at key locations")
    print("   - Implement active traffic management systems")
correlation, _ = pearsonr(df["vehicle_count"], df["avg_speed_mph"])
print(f"\n   Volume-Speed Correlation: {correlation:.3f}")
if correlation < -0.5:
    print("   Strong inverse relationship confirms congestion impact on speeds.")
    print("   This is typical urban behavior - focus on volume reduction strategies.")
print()

# 4. Temporal strategies
print("4. TEMPORAL DEMAND MANAGEMENT")
print("-" * 80)
weekday_vc = df[df["day_type"] == "Weekday"]["vc_ratio"].mean()
weekend_vc = df[df["day_type"] == "Weekend"]["vc_ratio"].mean()
print(f"   Weekday Average V/C: {weekday_vc:.3f}")
print(f"   Weekend Average V/C: {weekend_vc:.3f}")
print(f"   Weekday/Weekend Ratio: {weekday_vc / weekend_vc:.2f}x")
print()
if weekday_vc / weekend_vc > 1.3:
    print("   Strong weekday peak indicates commute-focused congestion.")
    print("   Recommendations:")
    print("   - Promote telecommuting and flexible work arrangements")
    print("   - Enhance public transit for commuters")
    print("   - Implement workplace travel demand management programs")
    print("   - Consider weekday-specific pricing strategies")
print()

# 5. Overall system performance
print("5. SYSTEM-WIDE PERFORMANCE TARGETS")
print("-" * 80)
los_f_pct = (df["los"] == "F").sum() / len(df) * 100
los_de_pct = (df["los"].isin(["D", "E"])).sum() / len(df) * 100
los_abc_pct = (df["los"].isin(["A", "B", "C"])).sum() / len(df) * 100
print(f"   Current LOS F (failed): {los_f_pct:.1f}%")
print(f"   Current LOS D-E (poor): {los_de_pct:.1f}%")
print(f"   Current LOS A-C (acceptable): {los_abc_pct:.1f}%")
print()
print("   Performance Targets:")
print(f"   - Reduce LOS F to < 5% (currently {los_f_pct:.1f}%)")
print(f"   - Achieve LOS A-C for > 70% of observations (currently {los_abc_pct:.1f}%)")
print("   - Maintain average V/C ratio < 0.85 system-wide")
print()

# 6. Implementation priorities
print("6. IMPLEMENTATION PRIORITIES (SHORT TO LONG TERM)")
print("-" * 80)
print("   SHORT TERM (0-1 year):")
print("   - Optimize signal timing at top 3 congested intersections")
print("   - Implement real-time traveler information systems")
print("   - Enhance incident response and clearance procedures")
print("   - Promote alternative mode use (transit, bike, walk)")
print()
print("   MEDIUM TERM (1-3 years):")
print("   - Install adaptive traffic signal control systems")
print("   - Implement demand-based pricing strategies")
print("   - Expand transit service on congested corridors")
print("   - Develop comprehensive TDM programs")
print()
print("   LONG TERM (3-10 years):")
print("   - Capacity expansion at chronically congested locations")
print("   - Major transit infrastructure improvements")
print("   - Land use policy changes to reduce travel demand")
print("   - Connected and automated vehicle integration")
print()

print("=" * 80)
print("\nNote: All recommendations should be validated with detailed engineering")
print("studies, cost-benefit analysis, and community engagement before implementation.")
print("=" * 80)

## Conclusion

This notebook has demonstrated fundamental traffic flow analysis techniques using real-world urban intersection data. Key takeaways:

### Traffic Engineering Concepts Applied:
- **Volume-to-Capacity (V/C) Ratio**: Core congestion metric comparing demand to capacity
- **Level of Service (LOS)**: Qualitative performance grades from A (best) to F (worst)
- **Speed-Flow Relationship**: Inverse correlation between traffic volume and average speed
- **Peak Hour Analysis**: Identification and quantification of AM/PM peak demand periods
- **Temporal Patterns**: Weekday vs weekend and time-of-day variations

### Analytical Methods:
- Descriptive statistics and data exploration
- Time series analysis of traffic patterns
- Comparative analysis across intersections
- Visualization techniques (heatmaps, line plots, scatter plots, dashboards)
- Statistical correlation analysis

### Planning Applications:
- Identification of problem locations requiring intervention
- Understanding demand patterns to inform operational strategies
- Performance monitoring and target setting
- Data-driven decision support for infrastructure investments

### Next Steps:
To extend this analysis, consider:
- **Predictive modeling**: Forecast future traffic conditions
- **Scenario analysis**: Evaluate impacts of proposed improvements
- **Before/after studies**: Measure effectiveness of implemented changes
- **Multi-modal analysis**: Incorporate transit, bicycle, and pedestrian data
- **Environmental impact**: Estimate emissions and fuel consumption
- **Economic analysis**: Calculate congestion costs and benefit-cost ratios

---

**References:**
- Highway Capacity Manual (HCM), Transportation Research Board
- Traffic Engineering Handbook, Institute of Transportation Engineers
- FHWA Traffic Analysis Toolbox