# Lecture 7: Data Storytelling & Visualization - Transforming Statistical Insights into Visual Intelligence

## Learning Objectives

By the end of this lecture, you will be able to:

- Define data visualization and explain its fundamental importance for communicating transportation insights
- Identify the key principles of effective data visualization design and their applications to bike-sharing analysis
- Distinguish between different visualization types and select appropriate charts for specific analytical purposes
- Create effective visualizations using Python and matplotlib to communicate transportation patterns

---

## 1. The Presentation That Changes Everything: Visual Storytelling for Transportation Consulting

Six weeks into your engagement, your bike-sharing client's CEO calls an emergency board meeting. "Our investors are flying in tomorrow," she explains urgently. "They want to see the data insights that will justify our Series A expansion. Can you present your findings in a way that will convince them to invest $5 million in our growth strategy?"

This is the defining moment every consultant dreams of and fears. You may have the analytical insights - **strong temperature correlation patterns**, seasonal demand variations showing significant growth from winter to spring, and rush hour peaks demonstrating clear commuter behavior. But these powerful statistical discoveries are meaningless unless you can transform them into **compelling visual narratives** that enable investors to immediately grasp the business opportunity and strategic potential.

Your statistical analysis represents months of rigorous work, but success now depends on your ability to communicate complex analytical findings to stakeholders who will make million-dollar decisions based on your presentations. **The difference between securing investment and losing the opportunity often comes down to visualization effectiveness and storytelling mastery.** Tomorrow's presentation will demonstrate whether your visualization mastery can convert sophisticated analytical insights into $5 million in growth capital for your client's strategic expansion.

## 2. Data Visualization Fundamentals: Building the Visual Foundation

Let's explore the essential foundations of data visualization that will transform your statistical analysis into compelling business communications. This section covers three critical areas: the theoretical foundations of visual perception, design principles for transportation data, and a comprehensive framework for selecting appropriate visualization types. We'll start by understanding how human vision and cognition process graphical information.

### 2.1. Basics of Visual Perception: What Your Eyes Do Best

You're preparing tomorrow's investor presentation. You have three critical insights to communicate: hourly demand peaks, the temperature-demand relationship, and seasonal growth patterns. Here's the question every consultant faces: **Which chart type will help your audience grasp each insight instantly?**

The answer lies in understanding what human vision does well—and what it struggles with.

**The Visual Hierarchy: What We See Most Accurately**

Your eyes process different visual elements with dramatically different accuracy levels. Understanding this hierarchy helps you choose the right chart every time:

1. **Position (Best)**: Comparing heights or positions along a common scale - line charts, bar charts, scatter plots
2. **Length**: Comparing bar lengths or distances
3. **Color**: Distinguishing categories or highlighting specific elements
4. **Area (Worst)**: Comparing sizes of circles or regions - pie charts, bubble charts

When you need stakeholders to **compare exact values** (like identifying the precise peak demand hour), use position. When you need to **distinguish different groups** (weekdays vs. weekends), add color. When you need both, combine them.

Let's see this principle in action with your bike-sharing data.

**Example 1: Position Shows Precise Patterns**

When you want stakeholders to identify **exact peak hours** for operational decisions, position along a shared axis gives them maximum accuracy:

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("https://raw.githubusercontent.com/pmarcelino/predictive-modeling/main/datasets/dataset.csv")
df['datetime'] = pd.to_datetime(df['datetime'])
df['hour'] = df['datetime'].dt.hour

hourly = df.groupby('hour')['count'].mean()
hourly.plot(kind='line', marker='o', figsize=(10, 5))
plt.title('Average Hourly Bike Demand - Position Shows Peaks Clearly')
plt.xlabel('Hour of Day')
plt.ylabel('Rentals per Hour')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

**Why this works:** Your eyes immediately spot the 8am and 5pm peaks because vertical position (height) is the most accurate way your brain compares quantities. Operations can see exactly when to deploy rebalancing crews—no guesswork required.

**Example 2: Adding Color to Distinguish Groups**

When comparing **two operational contexts** (not just precise values), color helps stakeholders immediately see which pattern applies:

In [None]:
df['daytype'] = df['workingday'].map({1: 'Weekday', 0: 'Weekend'})

weekday_hourly = df[df['daytype'] == 'Weekday'].groupby('hour')['count'].mean()
weekend_hourly = df[df['daytype'] == 'Weekend'].groupby('hour')['count'].mean()

plt.figure(figsize=(10, 5))
plt.plot(weekday_hourly.index, weekday_hourly.values, marker='o', label='Weekday', color='#1f77b4')
plt.plot(weekend_hourly.index, weekend_hourly.values, marker='s', label='Weekend', color='#ff7f0e')
plt.title('Demand by Day Type - Color Distinguishes Patterns')
plt.xlabel('Hour of Day')
plt.ylabel('Rentals per Hour')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

**Why this works:** Color (blue vs. orange) instantly signals "these are different operational contexts"—commuter-driven weekdays with dual peaks versus leisure-driven weekends with single midday peaks. But you still rely on **position** to read exact peak times. This combination tells operations: "You need two different capacity strategies."

### 2.2. Visualization Types and Selection Framework

Now that you understand visual perception principles, let's apply them systematically to transportation data analysis. Different analytical purposes require different visualization approaches, and selecting the wrong type can obscure critical insights or mislead stakeholders. We'll explore three fundamental visualization categories - **comparative, relationship, and distribution visualizations** - with specific selection criteria and transportation applications. This framework enables you to match visualization techniques to analytical goals, ensuring your presentations communicate insights effectively to diverse stakeholder audiences.

**1. Comparative Visualizations**

**Definition**: Comparative visualizations enable stakeholders to evaluate differences between categories, time periods, or operational conditions, forming the foundation of business decision-making.

**Bar charts** excel at category comparisons where precise value differences drive critical business decisions. When your statistical analysis reveals demand patterns - weekday mean of 193 rides per hour versus weekend mean of 189 rides per hour - bar chart visualization enables immediate comparison through visual height differences. This precise magnitude assessment directly supports capacity planning decisions, staffing optimization, and resource allocation strategies that maximize operational efficiency.

**Column charts** effectively present temporal comparisons across months or seasons. Summer demand averaging 237 rides per hour compared to winter demand averaging 125 rides per hour translates to clear visual differences that enable seasonal planning and resource allocation decisions.

**Python Example - Creating Comparative Visualizations:**

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Load the bike-sharing dataset
df = pd.read_csv("https://raw.githubusercontent.com/pmarcelino/predictive-modeling/main/datasets/dataset.csv")
df['datetime'] = pd.to_datetime(df['datetime'])

# Create weekday vs weekend comparison (bar chart)
weekday_mean = df[df['workingday'] == 1]['count'].mean()
weekend_mean = df[df['workingday'] == 0]['count'].mean()

fig, ax = plt.subplots(figsize=(8, 6))
categories = ['Weekday', 'Weekend']
means = [weekday_mean, weekend_mean]
ax.bar(categories, means, color=['#1f77b4', '#ff7f0e'])
ax.set_ylabel('Mean Hourly Rides', fontsize=12)
ax.set_title('Weekday vs Weekend Demand Comparison', fontsize=14, fontweight='bold')
ax.set_ylim(0, max(means) * 1.2)

# Add value labels on bars
for i, v in enumerate(means):
    ax.text(i, v + 5, f'{v:.0f}', ha='center', fontweight='bold')

plt.tight_layout()
plt.show()

# Print summary statistics
print(f"Weekday mean: {weekday_mean:.1f} rides per hour")
print(f"Weekend mean: {weekend_mean:.1f} rides per hour")
print(f"Weekday advantage: {((weekday_mean/weekend_mean - 1) * 100):.1f}%")

This bar chart visualization **reveals surprisingly similar demand patterns between weekdays and weekends at the hourly aggregation level**. Weekdays average 193 rides per hour compared to weekends' 189 rides per hour—only a 2.4% weekday advantage. This small difference indicates that while weekdays and weekends have different temporal patterns (as we'll see in the daily profile analysis), their average hourly demand remains nearly equivalent. The visual height comparison enables instant pattern recognition, while precise value labels support capacity planning decisions. This comparative visualization type excels when stakeholders need to evaluate categorical differences that drive resource allocation strategies.

**2. Relationship Visualizations**

**Definition**: Relationship visualizations reveal connections between variables that enable predictive understanding and operational optimization.

**Scatter plots** provide optimal presentation for continuous variable relationships. Your temperature-demand correlation of r = 0.394 achieves clarity through scatter plot positioning that enables pattern recognition and magnitude assessment simultaneously. Each data point represents specific temperature-demand combinations, while overall pattern reveals relationship strength and the substantial role of other factors.

**Line plots** excel at temporal relationship presentation where changes over time reveal trends, cycles, and inflection points. Daily demand patterns from 5am to midnight show clear commuting peaks and overnight valleys that enable operational understanding and strategic planning.

**Python Example - Relationship Visualization with Binned Scatter Plot:**

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

# Load the bike-sharing dataset
df = pd.read_csv("https://raw.githubusercontent.com/pmarcelino/predictive-modeling/main/datasets/dataset.csv")
df['datetime'] = pd.to_datetime(df['datetime'])

# Create temperature bins to aggregate data and reduce visual clutter
df['temp_bin'] = pd.cut(df['temp'], bins=20)
binned = df.groupby('temp_bin', observed=True)['count'].agg(['mean', 'std', 'count'])
binned['se'] = binned['std'] / np.sqrt(binned['count'])
binned['temp_center'] = binned.index.map(lambda x: x.mid)

# Create figure for professional presentation
fig, ax = plt.subplots(figsize=(10, 6))

# Plot binned means with 95% confidence intervals
ax.errorbar(binned['temp_center'], binned['mean'], 
            yerr=binned['se']*1.96,  # 95% confidence interval
            fmt='o', markersize=8, capsize=5, capthick=2,
            color='#2ECC71', ecolor='#2ECC71', alpha=0.7,
            label='Mean Demand (95% CI)')

# Add trend line using original (unbinned) data
slope, intercept, r_value, p_value, std_err = stats.linregress(df['temp'], df['count'])
line_x = np.array([df['temp'].min(), df['temp'].max()])
line_y = slope * line_x + intercept
ax.plot(line_x, line_y, 'r--', linewidth=2.5, label=f'Trend Line (r = {r_value:.3f})')

ax.set_xlabel('Temperature (°C)', fontsize=12, fontweight='bold')
ax.set_ylabel('Hourly Bike Rentals', fontsize=12, fontweight='bold')
ax.set_title('Temperature-Demand Relationship: Warmer Weather Drives Higher Ridership',
             fontsize=13, fontweight='bold')
ax.legend(loc='upper left', fontsize=10)
ax.grid(True, alpha=0.3)

# Add correlation annotation
ax.text(0.05, 0.95, f'Correlation: {r_value:.3f}\nR² = {r_value**2:.3f}',
        transform=ax.transAxes, fontsize=11, verticalalignment='top',
        bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))

plt.tight_layout()
plt.show()

# Print key insights
print(f"Temperature-demand correlation: r = {r_value:.3f}")
print(f"Temperature explains {(r_value**2)*100:.1f}% of demand variation")
print(f"For each 1°C increase, demand increases by approximately {slope:.1f} rentals")

When visualizing relationships with large datasets (17,000+ hourly observations), raw scatter plots create overwhelming visual clutter that obscures patterns. This **binned scatter approach aggregates observations into temperature bins**, calculating mean demand and confidence intervals for each bin. The result is a clean, professional visualization that immediately reveals the temperature-demand relationship while maintaining statistical rigor through confidence interval display.

This visualization **clearly demonstrates the positive temperature-demand relationship** through an upward trend of binned means. The correlation coefficient r = 0.394 indicates a moderate relationship, with temperature explaining 15.6% of demand variation. The confidence intervals (error bars) reveal consistent patterns across most temperature ranges, with slightly wider intervals at temperature extremes due to fewer observations. The trend line quantifies the average relationship: **for each 1°C temperature increase, demand increases by approximately 9.2 rentals per hour**. This aggregated presentation enables immediate pattern recognition for stakeholders while communicating statistical uncertainty appropriately. The visualization demonstrates that while temperature significantly influences demand, other factors (time of day, day of week, seasonality beyond temperature) play substantial roles—a critical insight for building comprehensive demand forecasting models that weather alone cannot provide.

**3. Distribution Visualizations**

**Definition**: Distribution visualizations reveal data spread, central tendencies, and outlier patterns essential for operational planning and risk management.

**Histograms** show demand frequency patterns that enable capacity planning and resource allocation. If analysis reveals that 45% of operating hours experience 200-400 rides per hour, 23% experience 401-600 rides per hour, and 15% exceed 600 rides per hour, histogram presentation enables tiered operational strategies appropriate for different demand conditions.

**Box plots** reveal distribution characteristics including median performance, variability ranges, and outlier identification. Hourly demand distributions with median = 145 rides, 25th percentile (Q1) = 42 rides, and 75th percentile (Q3) = 284 rides provide operational planning ranges and performance benchmarks through clear visual presentation.

**Python Example - Distribution Visualization with Histogram and Box Plot:**

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Load the bike-sharing dataset
df = pd.read_csv("https://raw.githubusercontent.com/pmarcelino/predictive-modeling/main/datasets/dataset.csv")
df['datetime'] = pd.to_datetime(df['datetime'])

# Create side-by-side distribution visualizations
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
fig.suptitle('Hourly Demand Distribution Analysis', fontsize=14, fontweight='bold')

# Panel 1: Histogram showing frequency distribution
axes[0].hist(df['count'], bins=40, color='#3498DB', edgecolor='black', alpha=0.7)
axes[0].set_xlabel('Hourly Bike Rentals', fontsize=11)
axes[0].set_ylabel('Frequency (Number of Hours)', fontsize=11)
axes[0].set_title('Demand Frequency Distribution', fontsize=12, fontweight='bold')
axes[0].axvline(x=df['count'].mean(), color='red', linestyle='--',
                linewidth=2, label=f'Mean: {df["count"].mean():.0f}')
axes[0].axvline(x=df['count'].median(), color='orange', linestyle='--',
                linewidth=2, label=f'Median: {df["count"].median():.0f}')
axes[0].legend()
axes[0].grid(axis='y', alpha=0.3)

# Panel 2: Box plot showing distribution characteristics
box_data = [df['count']]
bp = axes[1].boxplot(box_data, vert=True, patch_artist=True,
                     labels=['Hourly Demand'])
bp['boxes'][0].set_facecolor('#E74C3C')
bp['boxes'][0].set_alpha(0.7)
axes[1].set_ylabel('Hourly Bike Rentals', fontsize=11)
axes[1].set_title('Distribution Summary Statistics', fontsize=12, fontweight='bold')
axes[1].grid(axis='y', alpha=0.3)

# Add quartile annotations to box plot
q1 = df['count'].quantile(0.25)
q2 = df['count'].quantile(0.50)  # median
q3 = df['count'].quantile(0.75)
axes[1].text(1.15, q1, f'Q1: {q1:.0f}', fontsize=9, va='center')
axes[1].text(1.15, q2, f'Median: {q2:.0f}', fontsize=9, va='center', fontweight='bold')
axes[1].text(1.15, q3, f'Q3: {q3:.0f}', fontsize=9, va='center')

plt.tight_layout()
plt.show()

# Print distribution insights
print("=== Demand Distribution Summary ===")
print(f"Mean: {df['count'].mean():.1f} rentals per hour")
print(f"Median: {df['count'].median():.1f} rentals per hour")
print(f"Standard deviation: {df['count'].std():.1f} rentals")
print(f"25th percentile (Q1): {q1:.1f} rentals")
print(f"75th percentile (Q3): {q3:.1f} rentals")
print(f"Interquartile range (IQR): {(q3-q1):.1f} rentals")
print(f"\nInterpretation: 50% of hours fall between {q1:.0f} and {q3:.0f} rentals")

These distribution visualizations **reveal critical capacity planning insights** through complementary views. The histogram (left) shows that hourly demand follows a right-skewed distribution with most hours experiencing low-to-moderate demand (under 200 rentals), but a substantial tail extending to 800+ rentals during peak periods. The mean (192 rentals) exceeds the median (145 rentals), confirming the right-skewed pattern where high-demand periods pull the average upward. This pattern tells operations: "Plan for moderate demand most of the time, but maintain surge capacity for frequent high-demand periods." The box plot (right) provides precise statistical quartiles: 25% of hours see fewer than 42 rentals (overnight/early morning periods), 50% fall between 42 and 284 rentals (typical daytime operations), and 25% exceed 284 rentals (peak commute and weekend afternoon periods). This distribution intelligence enables **tiered operational strategies**: minimal staffing for Q1 periods (below 42 rentals), standard operations for Q2-Q3 periods (42-284 rentals), and surge capacity deployment for high-demand periods above Q3 (284+ rentals).

## 3. Transportation Data Visualization Applications

Now, let's apply these principles to specific transportation analysis challenges. This section demonstrates how theoretical concepts translate into practical visualization solutions for three critical areas: temporal patterns, weather relationships, and business performance. You'll see how to create effective visualizations that communicate complex transportation insights to business stakeholders.

### 3.1. Temporal Pattern Visualization Strategies

Transportation systems exhibit **complex temporal patterns operating simultaneously across multiple time scales**. Let's explore effective visualization strategies that reveal these multi-scale patterns while maintaining clarity and enabling business decision-making. We'll examine daily demand profiles, weekly and monthly patterns, and seasonal and annual trends.

**Daily Demand Profile Visualization**

Daily demand patterns represent **fundamental operational information requiring clear presentation** that enables immediate pattern recognition and decision-making. Your statistical analysis identified morning peaks averaging 363 rides per hour at 8am, evening peaks reaching 469 rides per hour at 5pm, and overnight minimums dropping to very low levels during early morning hours.

Line plot visualization provides optimal presentation for daily patterns because **temporal sequencing requires continuous connection between time points**. Hourly demand data from 5am to midnight creates natural temporal progression that line visualization preserves. Position along horizontal time axis enables accurate time identification, while line height communicates demand magnitude with precision suitable for operational planning.

Color coding enhances daily pattern visualization by distinguishing different operational conditions without interfering with temporal pattern recognition. Weekday demand profiles in blue and weekend profiles in red create immediate categorical distinction while preserving detailed temporal information. This dual-layer presentation enables **pattern comparison and operational strategy development**.

Multi-line presentation reveals seasonal variations in daily patterns. Spring daily profiles show earlier morning peaks and sustained evening demand reflecting extended daylight hours. Winter profiles demonstrate compressed peak periods and reduced overall demand levels. Seasonal line overlays enable strategic planning that accommodates systematic temporal variations.

**Python Example - Daily Demand Profile Visualization:**

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Load the bike-sharing dataset
df = pd.read_csv("https://raw.githubusercontent.com/pmarcelino/predictive-modeling/main/datasets/dataset.csv")
df['datetime'] = pd.to_datetime(df['datetime'])
df['hour'] = df['datetime'].dt.hour

# Calculate mean demand by hour for weekdays vs weekends
weekday_hourly = df[df['workingday'] == 1].groupby('hour')['count'].mean()
weekend_hourly = df[df['workingday'] == 0].groupby('hour')['count'].mean()

# Create line plot showing daily patterns
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(weekday_hourly.index, weekday_hourly.values, marker='o',
        linewidth=2, markersize=6, color='#1f77b4', label='Weekday')
ax.plot(weekend_hourly.index, weekend_hourly.values, marker='s',
        linewidth=2, markersize=6, color='#ff7f0e', label='Weekend')

ax.set_xlabel('Hour of Day', fontsize=12)
ax.set_ylabel('Mean Hourly Demand (rides)', fontsize=12)
ax.set_title('Daily Demand Profile: Weekday vs Weekend Patterns',
             fontsize=14, fontweight='bold')
ax.set_xticks(range(0, 24, 2))
ax.grid(True, alpha=0.3)
ax.legend(loc='upper left', fontsize=11)

# Highlight peak periods
ax.axvspan(7, 9, alpha=0.2, color='yellow', label='Morning Peak')
ax.axvspan(17, 19, alpha=0.2, color='orange', label='Evening Peak')

plt.tight_layout()
plt.show()

# Print key insights
print(f"Weekday morning peak (8am): {weekday_hourly[8]:.0f} rides")
print(f"Weekday evening peak (5pm): {weekday_hourly[17]:.0f} rides")
print(f"Weekend midday peak: {weekend_hourly.max():.0f} rides at {weekend_hourly.idxmax()}:00")

This line plot visualization **immediately reveals distinct operational patterns** between weekdays and weekends. Weekdays show clear bimodal patterns with morning commute peaks at 8am (480 rides) and evening peaks at 5pm (529 rides), while weekends exhibit single broad midday peaks at 1pm (388 rides) reflecting recreational usage. The sharp weekday peaks—reaching 100+ rides higher than weekend maximums—demonstrate commuter-driven demand requiring surge capacity deployment during rush hours. The temporal continuity preserved by line connections enables stakeholders to understand demand evolution throughout the day, supporting staffing optimization and capacity planning decisions.

**Weekly and Monthly Pattern Analysis**

Weekly patterns distinguish between **operational and recreational demand cycles** that require different business strategies. While overall hourly averages are similar (weekday mean of 193 rides versus weekend mean of 189 rides), the temporal distributions differ significantly, with weekdays showing commute peaks and weekends showing recreational patterns.

Bar chart visualization excels at weekly pattern presentation because discrete day categories enable precise comparisons essential for staffing and resource allocation decisions. Monday through Friday operational demands contrast clearly with Saturday-Sunday recreational patterns. **Bar height differences immediately communicate magnitude variations** that drive business decisions.

Monthly pattern visualization reveals seasonal cycles and growth trends essential for strategic planning. Line plots connecting monthly mean demand values show seasonal progressions: winter baseline averaging 125 rides per hour, spring growth reaching 184 rides per hour, and summer peaks averaging 237 rides per hour. **Temporal connection between months preserves seasonal transition understanding** while enabling year-over-year comparisons.

Combination visualizations integrate multiple temporal scales for comprehensive understanding. Monthly trend lines with embedded weekly pattern details provide strategic overview with operational specificity. This multi-level approach serves diverse stakeholder needs within integrated presentations.

**Python Example - Monthly Seasonal Pattern Visualization:**

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Load the bike-sharing dataset
df = pd.read_csv("https://raw.githubusercontent.com/pmarcelino/predictive-modeling/main/datasets/dataset.csv")
df['datetime'] = pd.to_datetime(df['datetime'])
df['month'] = df['datetime'].dt.month
df['year'] = df['datetime'].dt.year

# Calculate monthly mean demand
monthly_demand = df.groupby('month')['count'].mean()

# Create column chart for monthly patterns
fig, ax = plt.subplots(figsize=(12, 6))
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
          'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
colors = ['#5DADE2' if m in [12,1,2] else '#2ECC71' if m in [3,4,5]
          else '#E74C3C' if m in [6,7,8] else '#F39C12'
          for m in range(1, 13)]

ax.bar(months, monthly_demand.values, color=colors, edgecolor='black', linewidth=1.5)
ax.set_xlabel('Month', fontsize=12)
ax.set_ylabel('Mean Hourly Demand (rides)', fontsize=12)
ax.set_title('Seasonal Demand Patterns Across Annual Cycle',
             fontsize=14, fontweight='bold')
ax.grid(axis='y', alpha=0.3)

# Add season labels
ax.text(1, monthly_demand.max() * 0.95, 'Winter', fontsize=10,
        ha='center', style='italic', color='#5DADE2')
ax.text(4, monthly_demand.max() * 0.95, 'Spring', fontsize=10,
        ha='center', style='italic', color='#2ECC71')
ax.text(7, monthly_demand.max() * 0.95, 'Summer', fontsize=10,
        ha='center', style='italic', color='#E74C3C')
ax.text(10, monthly_demand.max() * 0.95, 'Fall', fontsize=10,
        ha='center', style='italic', color='#F39C12')

plt.tight_layout()
plt.show()

# Print seasonal insights
winter_months = [12, 1, 2]
spring_months = [3, 4, 5]
summer_months = [6, 7, 8]
fall_months = [9, 10, 11]

winter_mean = monthly_demand[winter_months].mean()
spring_mean = monthly_demand[spring_months].mean()
summer_mean = monthly_demand[summer_months].mean()
fall_mean = monthly_demand[fall_months].mean()

print(f"Winter mean: {winter_mean:.0f} rides per hour")
print(f"Spring mean: {spring_mean:.0f} rides per hour")
print(f"Summer mean: {summer_mean:.0f} rides per hour")
print(f"Fall mean: {fall_mean:.0f} rides per hour")
print(f"Spring growth from winter: {((spring_mean/winter_mean - 1) * 100):.1f}%")

This monthly column chart **reveals clear seasonal cycles** in bike-sharing demand. The color coding by season (winter blue, spring green, summer red, fall orange) enhances pattern recognition while maintaining quantitative precision. The visualization shows winter baseline around 125 rides per hour, spring growth reaching 184 rides per hour (a 46.8% increase), summer peaks at 237 rides per hour, and fall maintaining strong demand at 218 rides per hour. This seasonal intelligence enables **strategic resource allocation and annual planning** that optimizes business performance across the complete annual cycle, with summer showing nearly double (89% higher than) winter demand.

**Seasonal and Annual Trend Visualization**

Long-term patterns reveal **market development opportunities and strategic planning requirements**. Your statistical analysis identified systematic seasonal variations with spring showing 47% growth from winter baselines (125 to 184 rides per hour) and summer reaching 89% above winter levels (237 rides per hour).

Multi-year line plots reveal growth trajectories and market development patterns when analyzing data across multiple years. Such visualization preserves temporal continuity essential for trend extrapolation and strategic forecasting, enabling identification of **consistent growth patterns that support expansion planning** and investment decisions.

Seasonal overlay analysis shows consistent annual cycles. Winter-to-spring transitions show approximately 47% demand increases, spring-to-summer growth adds another 29%, while summer-to-fall patterns demonstrate modest 8% declines before the sharper drop back to winter levels. These seasonal patterns enable **predictive planning and resource optimization** across annual cycles.

Comparative visualization reveals performance variations across different years. Side-by-side annual comparisons identify exceptional performance periods and challenging conditions that inform strategic planning. Weather impact variations, special event effects, and competitive market changes become visible through systematic annual comparisons.

### 3.2. Weather Relationship Visualization Techniques

Weather represents **the most significant environmental factor affecting transportation demand**, requiring sophisticated visualization approaches that reveal complex relationships while supporting operational decision-making. Let's explore how to visualize temperature-demand relationships and integrate multiple weather variables for comprehensive understanding.

**Temperature-Demand Relationship Analysis**

Temperature correlation with bike demand shows **moderate positive relationships** but exhibits complex patterns requiring careful visualization design. Your statistical analysis revealed correlation coefficient r = 0.394, indicating that temperature explains approximately 15.6% of demand variation, highlighting the importance of other demand drivers.

Scatter plot visualization provides optimal temperature-demand relationship presentation because both variables represent continuous measurements requiring precise positioning. **Temperature along horizontal axis from 0°C to 35°C enables accurate temperature assessment**, while demand along vertical axis from 500 to 8,000 rides preserves demand magnitude relationships.

The scatter plot pattern immediately reveals relationship characteristics: **positive correlation evident in upward trend**, relationship strength visible in point clustering around trend line, and relationship consistency shown through point distribution patterns. Individual data points enable detailed analysis while overall pattern provides immediate relationship understanding.

Color coding enhances temperature analysis by revealing seasonal variations within temperature relationships. Spring data points in green, summer in red, fall in orange, and winter in blue show that **identical temperatures produce different demand levels across seasons**. This seasonal context provides operational insights beyond simple temperature-demand correlation.

Trend line overlay quantifies relationship strength and enables predictive applications. Best-fit line through scatter plot data with equation and correlation coefficient provides **precise relationship quantification** while maintaining visual pattern presentation. Confidence intervals around trend lines communicate uncertainty levels appropriate for business planning.

**Python Example - Temperature-Demand Binned Scatter with Seasonal Context:**

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

# Load the bike-sharing dataset
df = pd.read_csv("https://raw.githubusercontent.com/pmarcelino/predictive-modeling/main/datasets/dataset.csv")
df['datetime'] = pd.to_datetime(df['datetime'])

# Map seasons to colors for enhanced visualization
season_colors = {1: '#5DADE2', 2: '#2ECC71', 3: '#E74C3C', 4: '#F39C12'}
season_names = {1: 'Winter', 2: 'Spring', 3: 'Summer', 4: 'Fall'}

# Create figure for professional presentation
fig, ax = plt.subplots(figsize=(12, 7))

# Create temperature bins for each season to reduce visual clutter
for season in [1, 2, 3, 4]:
    season_data = df[df['season'] == season].copy()
    
    # Bin temperature data within each season
    season_data['temp_bin'] = pd.cut(season_data['temp'], bins=15)
    binned = season_data.groupby('temp_bin', observed=True)['count'].agg(['mean', 'std', 'count'])
    binned['se'] = binned['std'] / np.sqrt(binned['count'])
    binned['temp_center'] = binned.index.map(lambda x: x.mid)
    
    # Plot binned means with 95% confidence intervals by season
    ax.errorbar(binned['temp_center'], binned['mean'], 
                yerr=binned['se']*1.96,  # 95% confidence interval
                fmt='o', markersize=7, capsize=4, capthick=1.5,
                color=season_colors[season], ecolor=season_colors[season], 
                alpha=0.7, label=season_names[season])

# Add overall trend line using original (unbinned) data
slope, intercept, r_value, p_value, std_err = stats.linregress(df['temp'], df['count'])
line_x = np.array([df['temp'].min(), df['temp'].max()])
line_y = slope * line_x + intercept
ax.plot(line_x, line_y, 'k--', linewidth=2.5, label=f'Trend Line (r = {r_value:.3f})')

ax.set_xlabel('Temperature (°C)', fontsize=13, fontweight='bold')
ax.set_ylabel('Hourly Bike Rentals', fontsize=13, fontweight='bold')
ax.set_title('Temperature-Demand Relationship with Seasonal Context',
             fontsize=15, fontweight='bold')
ax.legend(title='Season', loc='upper left', fontsize=10, framealpha=0.9)
ax.grid(True, alpha=0.3)

# Add correlation statistics box
stats_text = f'Correlation: r = {r_value:.3f}\n'
stats_text += f'R² = {r_value**2:.3f}\n'
stats_text += f'Temperature explains {(r_value**2)*100:.1f}%\nof demand variation'
ax.text(0.98, 0.05, stats_text, transform=ax.transAxes,
        fontsize=10, ha='right', va='bottom',
        bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))

plt.tight_layout()
plt.show()

# Print detailed insights
print(f"Overall temperature-demand correlation: r = {r_value:.3f}")
print(f"Temperature explains {(r_value**2)*100:.1f}% of demand variation")
print(f"Trend equation: Demand = {slope:.1f} × Temperature + {intercept:.1f}")
print("\nSeasonal temperature correlations:")
for season in [1, 2, 3, 4]:
    season_data = df[df['season'] == season]
    season_corr = season_data['temp'].corr(season_data['count'])
    print(f"  {season_names[season]}: r = {season_corr:.3f}")

This **binned scatter approach with seasonal color coding** resolves the visual clutter problem inherent in plotting 17,000+ raw data points while preserving critical seasonal insights. By aggregating observations into temperature bins within each season, the visualization clearly reveals how temperature-demand relationships vary across winter (blue), spring (green), summer (red), and fall (orange) periods. The confidence intervals show pattern consistency within temperature ranges while highlighting data sparsity at temperature extremes.

The visualization **reveals sophisticated seasonal dynamics** beyond simple temperature effects. The overall correlation (r = 0.394) masks important seasonal variations: winter shows the strongest temperature sensitivity (r = 0.457), followed by spring (r = 0.404), summer (r = 0.366), and fall (r = 0.324). This declining pattern indicates that **temperature matters most during cold conditions**—when temperatures rise from 5°C to 15°C in winter, riders respond strongly. However, during warmer seasons, other factors (daylight hours, vacation patterns, weekend effects) dominate demand patterns, weakening the pure temperature effect. At similar temperatures around 20°C, summer demand (red points) often exceeds spring demand (green points), demonstrating that seasonal context matters beyond temperature alone. This professional presentation enables operational insights including season-specific weather forecasting models, recognition that temperature sensitivity varies systematically across the annual cycle, and understanding that comprehensive demand forecasting requires integrating temperature with seasonal indicators and temporal patterns.

**Multi-Weather Variable Integration**

Comprehensive weather analysis requires **integrating multiple environmental factors** that interact to influence transportation demand. Temperature, precipitation, humidity, and wind speed exhibit complex interactions requiring sophisticated visualization approaches.

Multi-variable scatter plots reveal interaction effects between different weather conditions. **Temperature-demand relationships vary under different precipitation conditions**: clear weather typically shows stronger temperature correlation, while precipitation days may exhibit weaker temperature effects as precipitation becomes a dominant demand driver.

Color coding enables third-variable integration within two-dimensional scatter plots. Temperature-demand scatter plot with color indicating precipitation levels reveals that **high demand requires both favorable temperature and absence of precipitation**. This multi-dimensional insight supports operational planning that considers comprehensive weather conditions.

Seasonal weather pattern analysis reveals systematic changes in weather sensitivity across annual cycles. Spring weather improvements have maximum psychological impact with strong demand response, while summer weather changes show diminishing demand benefits. **Fall weather deterioration produces gradual demand decline**, while winter conditions create sharp demand reductions.

Panel visualization presents multiple weather relationships simultaneously for comprehensive understanding. Temperature effects, precipitation impacts, humidity influences, and wind speed relationships displayed in coordinated panels provide complete weather analysis while maintaining individual relationship clarity.

### 3.3. Business Performance Visualization Design

Transportation business performance requires visualization approaches that **connect operational metrics to financial outcomes and strategic objectives**. Let's explore how professional performance visualization enables stakeholder understanding and supports evidence-based decision-making across operational efficiency, revenue patterns, and growth analysis.

**Operational Efficiency Metrics**

Operational efficiency visualization reveals **system utilization patterns and optimization opportunities** essential for business performance improvement. Understanding how demand varies across time periods enables strategic capacity planning and resource allocation decisions.

Efficiency trend visualization shows utilization patterns across temporal scales. Daily utilization profiles reveal peak efficiency periods and underutilization opportunities. Weekly patterns distinguish operational efficiency from recreational efficiency. **Monthly trends identify seasonal optimization requirements** and strategic planning needs.

Comparative efficiency analysis benchmarks current performance against optimal targets and competitive standards. Performance gaps become visible through side-by-side comparisons that highlight improvement opportunities and resource allocation priorities.

Efficiency distribution analysis reveals system performance consistency and reliability. Distribution of daily utilization rates shows operational stability and identifies **performance variability requiring management attention**. Consistent high performance versus volatile performance patterns require different strategic approaches.

**Python Example - Operational Efficiency Dashboard:**

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Load the bike-sharing dataset
df = pd.read_csv("https://raw.githubusercontent.com/pmarcelino/predictive-modeling/main/datasets/dataset.csv")
df['datetime'] = pd.to_datetime(df['datetime'])
df['date'] = df['datetime'].dt.date
df['hour'] = df['datetime'].dt.hour

# Calculate daily total demand
daily_demand = df.groupby('date')['count'].sum()

# Assume fleet size for utilization calculation
fleet_size = 3000  # bikes
df['utilization_rate'] = (df['count'] / fleet_size) * 100

# Create multi-panel efficiency dashboard
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('Operational Efficiency Dashboard', fontsize=16, fontweight='bold')

# Panel 1: Hourly utilization pattern
hourly_util = df.groupby('hour')['utilization_rate'].mean()
axes[0, 0].plot(hourly_util.index, hourly_util.values, marker='o',
                linewidth=2, markersize=6, color='#2ECC71')
axes[0, 0].set_xlabel('Hour of Day', fontsize=11)
axes[0, 0].set_ylabel('Utilization Rate (%)', fontsize=11)
axes[0, 0].set_title('Daily Utilization Profile', fontsize=12, fontweight='bold')
axes[0, 0].grid(True, alpha=0.3)
axes[0, 0].axhline(y=hourly_util.mean(), color='r', linestyle='--',
                    label=f'Mean: {hourly_util.mean():.1f}%')
axes[0, 0].legend()

# Panel 2: Utilization distribution
axes[0, 1].hist(df['utilization_rate'], bins=30, color='#3498DB',
                edgecolor='black', alpha=0.7)
axes[0, 1].set_xlabel('Utilization Rate (%)', fontsize=11)
axes[0, 1].set_ylabel('Frequency', fontsize=11)
axes[0, 1].set_title('Utilization Rate Distribution', fontsize=12, fontweight='bold')
axes[0, 1].axvline(x=df['utilization_rate'].mean(), color='r',
                   linestyle='--', linewidth=2, label=f'Mean: {df["utilization_rate"].mean():.1f}%')
axes[0, 1].axvline(x=df['utilization_rate'].median(), color='orange',
                   linestyle='--', linewidth=2, label=f'Median: {df["utilization_rate"].median():.1f}%')
axes[0, 1].legend()

# Panel 3: Weekly efficiency comparison
df['weekday'] = pd.to_datetime(df['date']).dt.dayofweek
weekly_util = df.groupby('weekday')['utilization_rate'].mean()
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
axes[1, 0].bar(days, weekly_util.values, color='#E74C3C', edgecolor='black')
axes[1, 0].set_xlabel('Day of Week', fontsize=11)
axes[1, 0].set_ylabel('Mean Utilization Rate (%)', fontsize=11)
axes[1, 0].set_title('Weekly Efficiency Pattern', fontsize=12, fontweight='bold')
axes[1, 0].grid(axis='y', alpha=0.3)

# Panel 4: Monthly efficiency trends
df['month'] = pd.to_datetime(df['date']).dt.month
monthly_util = df.groupby('month')['utilization_rate'].mean()
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
          'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
axes[1, 1].plot(months, monthly_util.values, marker='s',
                linewidth=2, markersize=8, color='#9B59B6')
axes[1, 1].set_xlabel('Month', fontsize=11)
axes[1, 1].set_ylabel('Mean Utilization Rate (%)', fontsize=11)
axes[1, 1].set_title('Seasonal Efficiency Trends', fontsize=12, fontweight='bold')
axes[1, 1].grid(True, alpha=0.3)
axes[1, 1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

# Print efficiency insights
print("=== Operational Efficiency Summary ===")
print(f"Overall mean utilization: {df['utilization_rate'].mean():.2f}%")
print(f"Peak hour utilization: {hourly_util.max():.2f}% at hour {hourly_util.idxmax()}")
print(f"Lowest utilization: {hourly_util.min():.2f}% at hour {hourly_util.idxmin()}")
print(f"Utilization variability (std): {df['utilization_rate'].std():.2f}%")
print(f"Best day: {days[weekly_util.idxmax()]} ({weekly_util.max():.2f}%)")
print(f"Best month: {months[monthly_util.idxmax()-1]} ({monthly_util.max():.2f}%)")

This multi-panel efficiency dashboard **provides comprehensive operational intelligence** through coordinated visualizations. The daily utilization profile (top-left) reveals peak efficiency at 5pm (15.63%) and dramatic overnight lows at 4am (0.21%), showing over 70-fold hourly variation. The distribution histogram (top-right) shows that most hours operate at modest utilization levels (mean 6.39%, median 4.83%), with a right-skewed distribution indicating occasional high-demand periods. The weekly pattern analysis (bottom-left) shows relatively consistent performance across days, peaking on Friday (6.59%). The monthly trend (bottom-right) reveals seasonal patterns with June showing peak utilization (8.07%) and winter months showing the lowest efficiency. This integrated presentation enables **strategic decision-making across multiple operational dimensions** while revealing that overall utilization rates remain below 10% on average, suggesting either overcapacity in the fleet or opportunities for demand stimulation during off-peak periods.

**Revenue and Growth Pattern Analysis**

Revenue visualization **connects operational performance to financial outcomes** essential for business sustainability and growth planning. High-demand periods generate disproportionate revenue concentration requiring strategic optimization approaches.

Revenue concentration analysis reveals business dependency patterns and risk management requirements. If peak periods representing 23% of operating days generate 45% of total revenue, visualization highlights **revenue concentration risks** and optimization opportunities during traditionally slow periods.

Growth trajectory visualization shows business development patterns and market opportunity assessment. Revenue growth trends, user base expansion patterns, and market share development require clear presentation that enables investor communication and strategic planning support.

Seasonal revenue analysis reveals financial planning requirements and cash flow management needs. **Revenue peaks during favorable weather periods**, revenue valleys during challenging conditions, and transition period performance create annual financial cycles requiring strategic financial management.

Comparative performance analysis benchmarks current business performance against market opportunities and competitive positioning. Market share trends, competitive performance comparisons, and growth opportunity identification enable **strategic positioning and investment priority decisions**.

## 4. From Insights to Impact: Professional Storytelling and Visualization Excellence

You've now mastered the fundamentals—understanding how visual perception works (position beats area every time), selecting the right chart types for your analytical purpose (scatter plots for relationships, line plots for time), and applying these principles to transportation challenges (daily demand profiles, temperature correlations, operational efficiency dashboards). But here's tomorrow's reality: **those investors won't fund your bike-sharing expansion based on beautiful charts alone.** They need a story that connects your analytical discoveries to business value.

Let's complete your visualization expertise by integrating everything you've learned into professional consulting delivery that wins that $5 million investment.

### 4.1. Building Your Analytical Narrative

Remember that emergency board meeting from Section 1? Your presentation needs to follow a clear journey that investors can follow instantly:

**The Three-Act Structure for Data Stories**

1. **Business Context (The Challenge)**: "Your bike-sharing system faces capacity planning uncertainty. Should you invest in 500 more bikes? Where? When?"

2. **Analytical Discovery (The Evidence)**: Here's where your visualizations from Section 3 come alive. Show the daily demand profile with those clear 8am (363 rides) and 5pm (469 rides) peaks (using position along the time axis for precision). Layer in the temperature-demand scatter plot revealing that r = 0.394 correlation (moderate but significant). Add the seasonal column chart showing 47% spring growth from winter and 89% summer growth from winter.

3. **Business Resolution (The Recommendation)**: "Deploy 300 bikes for weekday commuter peaks, 200 for weekend recreational demand. Spring expansion in March maximizes ROI based on temperature-demand patterns."

**Adapt to Your Audience**

Different stakeholders need different emphasis from the same analytical foundation:

- **Investors** (tomorrow's meeting): Focus on growth trajectories and market opportunity. That 89% summer peak over winter baseline? That's revenue expansion potential through seasonal optimization.
- **Operations managers**: Emphasize those hourly patterns for staffing optimization and rebalancing strategies.
- **Technical reviewers**: Include your statistical methodology and confidence intervals.

The visualization principles from Section 2 remain constant—position for precision, color for categories—but your narrative emphasis shifts based on who's receiving your $5 million pitch.

### 4.2. Quality Checklist: Ensuring Professional Excellence

Before you present tomorrow, evaluate your visualizations against these essential criteria that separate consulting-grade work from academic exercises:

**1. Immediate Clarity Test**

Can stakeholders understand your key insight within 5 seconds? Your weekday vs. weekend daily demand profile comparison (Section 3.1) passes this test because the line patterns immediately communicate the different temporal structures—bimodal weekday peaks versus single midday weekend peak—even though average hourly demands are similar. If your visualization requires extensive explanation, redesign it using the visual perception hierarchy from Section 2.1.

**2. Accuracy and Integrity Check**

- **Avoid scale manipulation**: Don't truncate the y-axis on your temperature-demand scatter plot to exaggerate correlation strength. The r = 0.394 relationship tells an honest story about moderate temperature effects alongside other demand drivers.
- **Maintain proportional spacing**: Your monthly seasonal patterns need consistent time intervals—don't compress winter months just because demand is lower.
- **Show uncertainty appropriately**: Include confidence intervals on trend lines without overwhelming the primary insight.

**3. Common Pitfalls to Avoid**

Based on the visualization types from Section 2.2, watch for these mistakes:

- **Chart type mismatches**: Don't use pie charts for your hourly demand profile (area comparison is inaccurate). Use line plots that leverage position along a common time axis.
- **Color overload**: Your temperature-demand scatter plot with seasonal color coding works because it adds meaningful context. Random decorative colors reduce clarity.
- **Complexity creep**: That operational efficiency dashboard in Section 3.3 uses four coordinated panels effectively. Adding six more panels would overwhelm rather than illuminate.

### 4.3. The Professional Edge

What distinguishes your $5 million pitch from a basic data presentation? **Integration of evidence types**:

- **Statistical rigor**: "Temperature correlation r = 0.627, statistically significant at p < 0.001"
- **Visual proof**: The scatter plot pattern that investors see instantly
- **Business logic**: "This means we can forecast demand based on weather predictions, optimizing bike deployment daily"

This three-layer approach—quantitative evidence (your statistical analysis), visual evidence (your visualization design from Sections 2-3), and business logic (operational reality)—creates **persuasive argumentation that wins investment decisions**.

**Your Pre-Presentation Checklist**

Before tomorrow's board meeting, verify that each visualization:

✓ Uses the most accurate visual encoding (position > length > color > area)  
✓ Matches chart type to analytical purpose (comparative/relationship/distribution)  
✓ Applies transportation-specific insights (temporal patterns, weather effects, efficiency metrics)  
✓ Tells a clear story without requiring extensive explanation  
✓ Connects directly to business decisions (capacity planning, seasonal optimization, ROI)  
✓ Maintains statistical integrity (accurate scales, appropriate uncertainty)  
✓ Targets your specific audience (investor focus on growth and returns)

---

## Summary and Transition to Programming Implementation

You've mastered essential data visualization principles: **chart selection strategies, design guidelines, and professional quality standards**. These skills transform statistical analysis results into compelling visual narratives that communicate insights effectively to business stakeholders.

Your ability to select appropriate chart types, apply effective design principles, and evaluate visualization quality prepares you to create professional presentations that drive strategic decisions and operational improvements in transportation consulting.

In the next lecture, you'll learn how to implement these visualization concepts using Python libraries, creating production-quality charts that communicate demand patterns, weather relationships, and business recommendations to your bike-sharing client.