# Exercise 2.6 - Interactive Charts with Plotly

## Citi Bike NYC Expansion Dashboard - Chart Development

**Author:** Saurabh Singh  
**Exercise:** Achievement 2, Exercise 2.6  
**Date:** February 2026

---

## Project Overview

### What are we doing?

This notebook develops interactive charts using plotly that will be integrated into a Streamlit dashboard. We're converting the static matplotlib and seaborn visualizations from previous exercises into interactive plotly versions.

### Why plotly?

**Interactive capabilities:**
- Hover tooltips show exact values automatically
- Zoom and pan functionality
- Responsive design (works on desktop, tablet, mobile)
- Easy integration with Streamlit dashboards

**Business value:**
- Stakeholders can explore data themselves
- No technical knowledge required to interact
- Professional, polished appearance
- Shareable via web browser

### Charts to create:

1. **Bar chart** - Top 20 most popular stations
2. **Dual-axis line chart** - Weather and ridership correlation

These will form the core of the interactive dashboard.

---

## 1. Import Libraries

In [None]:
import pandas as pd
import numpy as np
from plotly.subplots import make_subplots
import plotly.graph_objects as go
from datetime import datetime as dt

---

## 2. Load Data

Loading the merged dataset from Exercise 2.2.

In [None]:
# Load merged dataset
df = pd.read_csv('outputs/merged_citibike_weather_2022.csv')

In [None]:
df.head()

In [None]:
df.shape

---

## 3. Bar Chart - Top 20 Stations

### Purpose:

Identify the most popular starting stations to inform capacity expansion decisions.

### Data aggregation:

Using the same groupby pattern from previous exercises to count trips per station.

In [None]:
# Create value column for counting
df['value'] = 1

In [None]:
# Group by station and aggregate
df_groupby_bar = df.groupby('start_station_name', as_index=False).agg({'value': 'sum'})

In [None]:
df_groupby_bar.head()

In [None]:
# Get top 20 stations
top20 = df_groupby_bar.nlargest(20, 'value')

In [None]:
top20

### Create plotly bar chart:

**Design choices:**
- **Color scheme**: Blues gradient (consistent with previous exercises)
- **Orientation**: Horizontal bars (station names are long)
- **Interactivity**: Hover shows exact trip counts
- **Size**: 900x600 for dashboard visibility

In [None]:
# Create bar chart
fig = go.Figure(go.Bar(
    x=top20['value'], 
    y=top20['start_station_name'],
    orientation='h',
    marker={'color': top20['value'], 'colorscale': 'Blues'}
))

In [None]:
# Update layout
fig.update_layout(
    title='Top 20 Most Popular Bike Stations in NYC',
    xaxis_title='Number of Trips',
    yaxis_title='Start Station',
    width=900,
    height=600
)

In [None]:
# Display chart
fig.show()

### Save top20 data for dashboard:

In [None]:
# Save for Streamlit dashboard
top20.to_csv('outputs/top20.csv', index=False)

---

## 4. Dual-Axis Line Chart

### Purpose:

Show correlation between temperature and bike ridership to demonstrate seasonal demand patterns.

### Data preparation:

Aggregate trips by date to create daily counts.

In [None]:
# Convert date to datetime
df['date'] = pd.to_datetime(df['date'])

In [None]:
# Load weather data
df_weather = pd.read_csv('outputs/weather_data_2022.csv')

# Convert dates
df['date'] = pd.to_datetime(df['date'])
df_weather['date'] = pd.to_datetime(df_weather['date'])

# Merge weather data
df = df.merge(df_weather, on='date', how='left')

# Check if avgTemp is now present
print("Columns:", df.columns.tolist())
print("avgTemp present:", 'avgTemp' in df.columns)

In [None]:
# Create daily aggregation
df_daily = df.groupby('date', as_index=False).agg({
    'value': 'sum',
    'avgTemp': 'first'
})

In [None]:
# Rename for clarity
df_daily.rename(columns={'value': 'bike_rides_daily'}, inplace=True)

In [None]:
df_daily.head()

### Create dual-axis line chart:

**Design choices:**
- **Primary axis (left)**: Daily bike rides in blue
- **Secondary axis (right)**: Temperature in red
- **Why dual-axis**: Different scales (thousands of trips vs. degrees)
- **Interactivity**: Hover shows both values for any date

In [None]:
# Create subplot with secondary y-axis
fig_2 = make_subplots(specs=[[{"secondary_y": True}]])

In [None]:
# Add bike rides trace (primary y-axis)
fig_2.add_trace(
    go.Scatter(
        x=df_daily['date'],
        y=df_daily['bike_rides_daily'],
        name='Daily Bike Rides',
        marker={'color': 'blue'}
    ),
    secondary_y=False
)

In [None]:
# Add temperature trace (secondary y-axis)
fig_2.add_trace(
    go.Scatter(
        x=df_daily['date'],
        y=df_daily['avgTemp'],
        name='Daily Temperature',
        marker={'color': 'red'}
    ),
    secondary_y=True
)

In [None]:
# Update layout
fig_2.update_layout(
    title='Daily Bike Rides and Temperature Correlation - 2022',
    xaxis_title='Date',
    height=600
)

# Update y-axes titles
fig_2.update_yaxes(title_text='Number of Bike Rides', secondary_y=False)
fig_2.update_yaxes(title_text='Temperature (°C)', secondary_y=True)

In [None]:
# Display chart
fig_2.show()

### Save daily data for dashboard:

In [None]:
# Save for Streamlit dashboard
df_daily.to_csv('outputs/daily_data.csv', index=False)

---

## Summary

### Charts created:

1. ✅ **Bar chart**: Top 20 stations with interactive hover tooltips
2. ✅ **Dual-axis line chart**: Weather-ridership correlation with zoom capability

### Data files prepared for dashboard:

- `outputs/top20.csv` - Top 20 stations data
- `outputs/daily_data.csv` - Daily aggregated trip and weather data

### Next steps:

1. Create Streamlit `.py` file
2. Configure dashboard page settings
3. Add these plotly charts to the dashboard
4. Integrate the interactive map from Exercise 2.5
5. Test and refine dashboard layout

### Technical notes:

- **Plotly advantages**: Automatic interactivity, responsive design, professional appearance
- **Color consistency**: Blues palette maintained from previous exercises
- **Performance**: Aggregated data ensures fast rendering
- **Accessibility**: Hover tooltips provide exact values without cluttering visual

The interactive charts are ready for integration into the Streamlit dashboard!