# PGH Transit Atlas - Static Visualization Generator

**Purpose:** Generate publication-quality static visualizations using Seaborn and Bokeh

**Author:** Rizaldy Utomo | Public Policy, Analytics, AI Management @ CMU

**Tech Stack:** Python (Pandas, Seaborn, Bokeh), Matplotlib

---

This notebook generates 8 visualizations for the static EDA report:
- **6 Seaborn charts** (PNG exports): Timeseries, bar charts, heatmap, 2Ã—2 grid
- **2 Bokeh charts** (HTML exports): Interactive hourly pattern, archetype comparison

All outputs saved to `./static_viz/` directory.

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import json
from pathlib import Path
import matplotlib
matplotlib.use('Agg')  # Non-interactive backend
import matplotlib.pyplot as plt
import seaborn as sns
from bokeh.plotting import figure, output_file, save
from bokeh.models import HoverTool, ColumnDataSource
import warnings
warnings.filterwarnings('ignore')

# Set style
sns.set_style("whitegrid")
sns.set_palette("husl")
plt.rcParams['figure.dpi'] = 100
plt.rcParams['font.family'] = 'sans-serif'
plt.rcParams['font.sans-serif'] = ['Arial']

print("âœ“ Libraries imported")

In [None]:
# Create output directory
output_dir = Path('./static_viz')
output_dir.mkdir(exist_ok=True)

print(f"âœ“ Output directory: {output_dir.absolute()}")

## 1. Load Processed Data

Loading all JSON/CSV files from `./processed_data/` directory generated by `etl.py`.

In [None]:
data_dir = Path('./processed_data')

# Load JSON data
with open(data_dir / 'daily_timeseries.json', 'r') as f:
    daily_data = json.load(f)

with open(data_dir / 'archetypes.json', 'r') as f:
    archetypes = json.load(f)

with open(data_dir / 'demographics.json', 'r') as f:
    demographics = json.load(f)

with open(data_dir / 'station_archetypes.json', 'r') as f:
    station_archetypes = json.load(f)

with open(data_dir / 'heatmap.json', 'r') as f:
    heatmap_raw = json.load(f)

print("âœ“ Data loaded successfully")
print(f"  â€¢ Daily timeseries: {len(daily_data['dates'])} days")
print(f"  â€¢ Archetypes: {len(archetypes)} behavioral segments")
print(f"  â€¢ Demographics: {len(demographics['stations'])} stations")
print(f"  â€¢ Station archetypes: {len(station_archetypes)} archetype categories")

## 2. VIZ 1: Daily Timeseries (Seaborn)

**Research Question:** How does ridership fluctuate across the year?

**Hypothesis:** Campus trips should show extreme seasonality tied to academic calendar.

**Method:** Line chart with 3 series (Total, Campus, City) across 365 days.

In [None]:
# Prepare daily timeseries DataFrame
df_daily = pd.DataFrame({
    'date': pd.to_datetime(daily_data['dates']),
    'Total': daily_data['pogoh_trips'],
    'Campus': daily_data['pogoh_campus_trips'],
    'City': daily_data['pogoh_city_trips']
})

# Create figure
fig, ax = plt.subplots(figsize=(14, 6))
ax.plot(df_daily['date'], df_daily['Total'], linewidth=2, label='Total POGOH', color='#2B4CFF', alpha=0.9)
ax.plot(df_daily['date'], df_daily['Campus'], linewidth=1.5, label='Campus Corridor', color='#FF9500', alpha=0.8)
ax.plot(df_daily['date'], df_daily['City'], linewidth=1.5, label='City-Wide', color='#34C759', alpha=0.8)

ax.set_title('FIG 1: Daily Ridership Timeseries (2024 Full Year)', fontsize=16, fontweight='bold', pad=20)
ax.set_xlabel('Date', fontsize=12, fontweight='bold')
ax.set_ylabel('Daily Trips', fontsize=12, fontweight='bold')
ax.legend(loc='upper right', frameon=True, shadow=True, fontsize=11)
ax.grid(True, alpha=0.3, linestyle='--')
ax.set_ylim(bottom=0)

# Annotate peak
max_idx = df_daily['Total'].idxmax()
max_date = df_daily.loc[max_idx, 'date']
max_val = df_daily.loc[max_idx, 'Total']
ax.annotate(f'Peak: {max_val} trips\n{max_date.strftime("%b %d")}',
            xy=(max_date, max_val), xytext=(20, 30), textcoords='offset points',
            fontsize=10, bbox=dict(boxstyle='round,pad=0.5', facecolor='yellow', alpha=0.7),
            arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0.3', color='black'))

plt.tight_layout()
plt.savefig(output_dir / 'fig1_daily_timeseries.png', dpi=150, bbox_inches='tight')
plt.show()
print("âœ“ Saved fig1_daily_timeseries.png")

**Finding:** Peak day is September 26 (3,800+ trips). Trough is January 4 (412 trips). Campus ridership drops 63% during winter break.

## 3. VIZ 2: Trip Archetypes (Seaborn)

**Research Question:** What behavioral segments exist in the ridership?

**Method:** K-Means clustering (k=4) on Duration, Displacement, Hour â†’ Manual labeling based on centroids.

In [None]:
# Prepare archetype DataFrame
df_arch = pd.DataFrame(archetypes)
df_arch = df_arch.rename(columns={'label': 'Archetype', 'count': 'Trips'})

# Calculate percentages
total_trips = df_arch['Trips'].sum()
df_arch['Percentage'] = (df_arch['Trips'] / total_trips) * 100

# Sort by trip count
df_arch = df_arch.sort_values('Trips', ascending=True)

# Create horizontal bar chart
fig, ax = plt.subplots(figsize=(10, 5))
colors = ['#2B4CFF', '#34C759', '#FF9500', '#FF2B8C']
bars = ax.barh(df_arch['Archetype'], df_arch['Trips'], 
               color=colors[:len(df_arch)], edgecolor='black', linewidth=1.5)

ax.set_title('FIG 2: Trip Behavioral Archetypes (K-Means Clustering)', fontsize=16, fontweight='bold', pad=20)
ax.set_xlabel('Number of Trips', fontsize=12, fontweight='bold')
ax.set_ylabel('Archetype', fontsize=12, fontweight='bold')
ax.grid(axis='x', alpha=0.3, linestyle='--')

# Add percentage labels
for i, (idx, row) in enumerate(df_arch.iterrows()):
    ax.text(row['Trips'] + 5000, i, f"{row['Percentage']:.1f}%",
            va='center', fontsize=11, fontweight='bold')

plt.tight_layout()
plt.savefig(output_dir / 'fig2_archetypes.png', dpi=150, bbox_inches='tight')
plt.show()
print("âœ“ Saved fig2_archetypes.png")

**Finding:** Commuter trips dominate (47.9%, 266K trips). Leisure trips are only 3.6% but have 73-minute average duration!

## 4. VIZ 3: Rider Type Distribution (Seaborn)

**Research Question:** Is this a member-driven system or casual-driven?

**Method:** Aggregate MEMBER vs CASUAL trips across all stations.

In [None]:
# Calculate totals per rider type
casual_total = sum(demographics['data']['CASUAL'])
member_total = sum(demographics['data']['MEMBER'])

df_demo = pd.DataFrame({
    'Rider Type': ['CASUAL', 'MEMBER'],
    'Trips': [casual_total, member_total]
})

# Calculate percentages
total = df_demo['Trips'].sum()
df_demo['Percentage'] = (df_demo['Trips'] / total) * 100

# Create bar chart
fig, ax = plt.subplots(figsize=(8, 5))
bars = ax.bar(df_demo['Rider Type'], df_demo['Trips'],
              color=['#FF9500', '#2B4CFF'], edgecolor='black', linewidth=2, width=0.6)

ax.set_title('FIG 3: Rider Type Distribution', fontsize=16, fontweight='bold', pad=20)
ax.set_ylabel('Number of Trips', fontsize=12, fontweight='bold')
ax.set_xlabel('Rider Type', fontsize=12, fontweight='bold')
ax.grid(axis='y', alpha=0.3, linestyle='--')

# Add value labels
for i, (idx, row) in enumerate(df_demo.iterrows()):
    ax.text(i, row['Trips'] + 5000, f"{row['Trips']:,}\n({row['Percentage']:.1f}%)",
            ha='center', va='bottom', fontsize=11, fontweight='bold')

plt.tight_layout()
plt.savefig(output_dir / 'fig3_demographics.png', dpi=150, bbox_inches='tight')
plt.show()
print("âœ“ Saved fig3_demographics.png")

**Finding:** 98.7% MEMBER trips! This is a utilitarian commute tool, not tourism/recreation system.

## 5. VIZ 4: Hourly Pattern (Bokeh Interactive)

**Research Question:** When do riders travel throughout the day?

**Method:** Aggregate trips by hour (0-23), render as interactive Bokeh line chart with hover tooltips.

In [None]:
# Calculate hourly totals from heatmap data (sum across all days)
hourly_trips = [sum(heatmap_raw['data'][h]) for h in range(24)]

df_hourly = pd.DataFrame({
    'hour': list(range(24)),
    'trips': hourly_trips
})

source = ColumnDataSource(df_hourly)

p = figure(
    title="FIG 4: Hourly Trip Distribution (24-Hour Pattern)",
    x_axis_label="Hour of Day",
    y_axis_label="Total Trips",
    width=1000,
    height=400,
    toolbar_location="above"
)

p.line('hour', 'trips', source=source, line_width=3, color='#2B4CFF', alpha=0.8)
p.circle('hour', 'trips', source=source, size=8, color='#2B4CFF', alpha=0.6)

# Add hover tool
hover = HoverTool(tooltips=[("Hour", "@hour:00"), ("Trips", "@trips{0,0}")])
p.add_tools(hover)

# Styling
p.title.text_font_size = '16pt'
p.title.text_font_style = 'bold'
p.xaxis.axis_label_text_font_size = '12pt'
p.yaxis.axis_label_text_font_size = '12pt'
p.xaxis.axis_label_text_font_style = 'bold'
p.yaxis.axis_label_text_font_style = 'bold'

output_file(output_dir / 'fig4_hourly_bokeh.html')
save(p)
print("âœ“ Saved fig4_hourly_bokeh.html")

**Finding:** Clear bimodal distributionâ€”morning rush (8-9 AM) and evening rush (4-6 PM). Peak hour is 5 PM.

## 6. VIZ 5: Day Ã— Hour Heatmap (Seaborn)

**Research Question:** Do weekdays vs weekends show different hourly patterns?

**Method:** Heatmap with hours (0-23) on Y-axis, days (Mon-Sun) on X-axis.

In [None]:
# Convert heatmap data to matrix (24 hours Ã— 7 days)
days = heatmap_raw['days']
hours = list(range(24))
matrix = heatmap_raw['data']  # Already 24 rows Ã— 7 cols

df_heatmap = pd.DataFrame(matrix, index=hours, columns=days)

# Create heatmap
fig, ax = plt.subplots(figsize=(14, 6))
sns.heatmap(df_heatmap, cmap='YlOrRd', annot=False, fmt='d',
            cbar_kws={'label': 'Trip Count'}, linewidths=0.5, ax=ax)

ax.set_title('FIG 5: Day Ã— Hour Demand Heatmap (Weekly Pattern)', fontsize=16, fontweight='bold', pad=20)
ax.set_xlabel('Day of Week', fontsize=12, fontweight='bold')
ax.set_ylabel('Hour of Day', fontsize=12, fontweight='bold')

plt.tight_layout()
plt.savefig(output_dir / 'fig5_heatmap.png', dpi=150, bbox_inches='tight')
plt.show()
print("âœ“ Saved fig5_heatmap.png")

**Finding:** "Weekend Cooling" visibleâ€”Saturday/Sunday show 30-40% lower volume. Peak demand: Weekdays at 4-6 PM.

## 7. VIZ 6: Station Archetypes (Seaborn 2Ã—2 Grid)

**Research Question:** Which stations specialize in which behaviors?

**Method:** For each archetype, identify top 3 stations by percentage. Display as 4 separate horizontal bar charts.

In [None]:
# Create 2Ã—2 subplot grid
fig, axes = plt.subplots(2, 2, figsize=(16, 10))
fig.suptitle('FIG 6: Behavioral Hotspots (Top 3 Stations per Archetype)',
             fontsize=18, fontweight='bold', y=0.995)

archetype_configs = [
    ('Commuter', 'ðŸš´ COMMUTER HUBS', '#2B4CFF', axes[0, 0]),
    ('Last-Mile', 'ðŸ”— LAST-MILE CONNECTORS', '#FF9500', axes[0, 1]),
    ('Errand', 'ðŸ›’ ERRAND CENTERS', '#34C759', axes[1, 0]),
    ('Leisure', 'ðŸŽ¨ LEISURE DESTINATIONS', '#FF2B8C', axes[1, 1])
]

for arch_key, title, color, ax in archetype_configs:
    data = station_archetypes[arch_key]
    
    # Truncate station names for display
    labels = [s[:35] + '...' if len(s) > 35 else s for s in data['stations']]
    
    bars = ax.barh(labels, data['percentages'], color=color, edgecolor='black', linewidth=2)
    
    ax.set_title(title, fontsize=13, fontweight='bold', pad=10)
    ax.set_xlabel('Percentage of Trips (%)', fontsize=11, fontweight='bold')
    ax.set_xlim(0, 75)
    ax.grid(axis='x', alpha=0.3, linestyle='--')
    ax.invert_yaxis()  # Highest at top
    
    # Add percentage + trip count labels
    for i, (pct, trips, total) in enumerate(zip(data['percentages'], data['trip_counts'], data['total_trips'])):
        ax.text(pct + 1, i, f"{pct:.1f}% ({trips:,}/{total:,})",
                va='center', fontsize=9, fontweight='bold')

plt.tight_layout()
plt.savefig(output_dir / 'fig6_station_archetypes.png', dpi=150, bbox_inches='tight')
plt.show()
print("âœ“ Saved fig6_station_archetypes.png")

**Finding:** Schenley Dr is 64.8% commuter (pure academic hub). Wilkinsburg P&R is 68.4% errand (suburban shopping). Station personalities require differentiated rebalancing strategies.

## 8. VIZ 7: Archetype Comparison (Bokeh Interactive)

**Method:** Vertical bar chart with hover tooltips showing exact counts.

In [None]:
# Prepare data
archetypes_list = [a['label'] for a in archetypes]
trips_list = [a['count'] for a in archetypes]
percentages = [(a['count'] / sum([x['count'] for x in archetypes])) * 100 for a in archetypes]
colors_list = ['#2B4CFF', '#34C759', '#FF9500', '#FF2B8C']

source = ColumnDataSource(data={
    'archetypes': archetypes_list,
    'trips': trips_list,
    'percentages': percentages,
    'colors': colors_list
})

p = figure(
    x_range=archetypes_list,
    title="FIG 7: Trip Archetype Distribution",
    width=900,
    height=500,
    toolbar_location="above"
)

p.vbar(x='archetypes', top='trips', source=source, width=0.7,
       color='colors', line_color='black', line_width=2)

# Add hover tool
hover = HoverTool(tooltips=[
    ("Archetype", "@archetypes"),
    ("Trips", "@trips{0,0}"),
    ("Percentage", "@percentages{0.0}%")
])
p.add_tools(hover)

# Styling
p.title.text_font_size = '16pt'
p.title.text_font_style = 'bold'
p.xaxis.axis_label = 'Trip Archetype'
p.yaxis.axis_label = 'Number of Trips'
p.xaxis.axis_label_text_font_size = '12pt'
p.yaxis.axis_label_text_font_size = '12pt'
p.xaxis.axis_label_text_font_style = 'bold'
p.yaxis.axis_label_text_font_style = 'bold'

output_file(output_dir / 'fig7_archetypes_bokeh.html')
save(p)
print("âœ“ Saved fig7_archetypes_bokeh.html")

## 9. VIZ 8: Top Stations by Rider Type (Seaborn Grouped Bars)

**Research Question:** Do top stations have different MEMBER vs CASUAL distributions?

**Method:** Select top 10 stations by total trips, show side-by-side MEMBER/CASUAL bars.

In [None]:
# Get top 10 stations by total trips
stations = demographics['stations']
casual = demographics['data']['CASUAL']
member = demographics['data']['MEMBER']

df_stations = pd.DataFrame({
    'Station': stations,
    'CASUAL': casual,
    'MEMBER': member
})
df_stations['Total'] = df_stations['CASUAL'] + df_stations['MEMBER']
df_stations = df_stations.nlargest(10, 'Total').sort_values('Total', ascending=True)

# Truncate station names
df_stations['Station_Short'] = df_stations['Station'].apply(lambda x: x[:30] + '...' if len(x) > 30 else x)

# Create grouped horizontal bar chart
fig, ax = plt.subplots(figsize=(12, 8))

x = np.arange(len(df_stations))
width = 0.35

bars1 = ax.barh(x - width/2, df_stations['CASUAL'], width, 
                label='Casual', color='#FF9500', edgecolor='black', linewidth=1)
bars2 = ax.barh(x + width/2, df_stations['MEMBER'], width, 
                label='Member', color='#2B4CFF', edgecolor='black', linewidth=1)

ax.set_title('FIG 8: Top 10 Stations by Rider Type', fontsize=16, fontweight='bold', pad=20)
ax.set_xlabel('Number of Trips', fontsize=12, fontweight='bold')
ax.set_ylabel('Station', fontsize=12, fontweight='bold')
ax.set_yticks(x)
ax.set_yticklabels(df_stations['Station_Short'], fontsize=9)
ax.legend(loc='lower right', fontsize=11)
ax.grid(axis='x', alpha=0.3, linestyle='--')

plt.tight_layout()
plt.savefig(output_dir / 'fig8_top_stations.png', dpi=150, bbox_inches='tight')
plt.show()
print("âœ“ Saved fig8_top_stations.png")

**Finding:** All top 10 stations are overwhelmingly MEMBER-driven. S Bouquet Ave has highest casual percentage (likely visitor destination).

## Summary

**Visualizations Generated:**

1. âœ“ `fig1_daily_timeseries.png` - Seaborn line chart (315KB)
2. âœ“ `fig2_archetypes.png` - Seaborn horizontal bars (52KB)
3. âœ“ `fig3_demographics.png` - Seaborn bar chart (48KB)
4. âœ“ `fig4_hourly_bokeh.html` - Bokeh interactive (8.4KB)
5. âœ“ `fig5_heatmap.png` - Seaborn heatmap (72KB)
6. âœ“ `fig6_station_archetypes.png` - Seaborn 2Ã—2 grid (193KB)
7. âœ“ `fig7_archetypes_bokeh.html` - Bokeh interactive (7KB)
8. âœ“ `fig8_top_stations.png` - Seaborn grouped bars (97KB)

**Next Step:** Assemble into `static_report.html` for class submission.

**Interactive Dashboard:** https://rzrizaldy.github.io/pgh-transit-atlas/