# Achievement 2.4: Fundamentals of Visualizations – Part 2

This notebook uses Seaborn and Matplotlib to visualize Citi Bike and NYC weather data for 2022. It includes bar charts, dual-axis line plots, box plots, and facet grids. Seaborn's themes and color palettes help improve the clarity and appearance of the charts. A colorblind-friendly palette was chosen for maximum accessibility across dark and light display modes.


## Table of Contents
1. [Imports and Setup](#1.-Imports-and-Setup)
2. [Set Plot Theme and Palette](#2.-Set-Plot-Theme-and-Palette)
3. [Top 20 Starting Stations Bar Chart](#3.-Top-20-Starting-Stations-Bar-Chart)
4. [Dual-Axis Line Plot](#4.-Dual-Axis-Line-Plot)
5. [Trip Duration Box Plot](#5.-Trip-Duration-Box-Plot)
6. [FacetGrid by User Type](#6.-FacetGrid-by-User-Type)


## 1. Imports and Setup

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Load dataset
data_path = "/content/drive/MyDrive/citibike_project/citibike_weather_merged_2022.csv"
df = pd.read_csv(data_path, parse_dates=["started_at"])
print("✅ Data shape:", df.shape)
df.head()

## 2. Set Plot Theme and Palette

In [None]:
# Set global theme and colorblind-friendly palette
sns.set_theme(style="whitegrid", palette="colorblind")

**Note:** A colorblind-safe palette was selected to ensure accessibility across devices with both light and dark themes. This decision was also informed by personal awareness of colorblindness prevalence from military experience.


## 3. Top 20 Starting Stations Bar Chart

In [None]:
# Count trip starts by station
top_stations = df['start_station_name'].value_counts().head(20).reset_index()
top_stations.columns = ['start_station_name', 'trip_count']

# Plot bar chart
plt.figure(figsize=(12, 6))
sns.barplot(data=top_stations, y='start_station_name', x='trip_count')
plt.title("Top 20 Most Frequent Starting Stations - 2022")
plt.xlabel("Trip Count")
plt.ylabel("Start Station Name")
plt.tight_layout()
plt.show()

## 4. Dual-Axis Line Plot

In [None]:
# Ensure 'date' exists
df['date'] = pd.to_datetime(df['started_at']).dt.date

# Aggregate daily data
daily_weather = df.groupby('date')[['TMAX', 'TMIN']].mean().reset_index()
daily_trips = df.groupby('date').size().reset_index(name='trip_count')
merged = pd.merge(daily_weather, daily_trips, on='date')

# Create dual-axis plot
fig, ax1 = plt.subplots(figsize=(14, 5))
sns.lineplot(data=merged, x='date', y='trip_count', ax=ax1, color='tab:blue', label='Trip Count')
ax1.set_ylabel("Trip Count", color='tab:blue')
ax1.tick_params(axis='y', labelcolor='tab:blue')

# Second axis for temperature
ax2 = ax1.twinx()
sns.lineplot(data=merged, x='date', y='TMAX', ax=ax2, color='tab:red', label='Max Temp')
ax2.set_ylabel("Max Temperature (°F)", color='tab:red')
ax2.tick_params(axis='y', labelcolor='tab:red')

# Labels and title
ax1.set_xlabel("Date")
ax1.set_title("Daily Citi Bike Trips vs. Max Temperature (2022)")
plt.grid(True)
fig.tight_layout()
plt.show()

## 5. Trip Duration Box Plot

In [None]:
# Compute trip duration in minutes
df['ended_at'] = pd.to_datetime(df['ended_at'])
df['tripduration'] = (df['ended_at'] - df['started_at']).dt.total_seconds() / 60

# Remove extreme durations
df_filtered = df[df['tripduration'] < 120]

# Create box plot
plt.figure(figsize=(8, 5))
sns.boxplot(data=df_filtered, x='member_casual', y='tripduration')
plt.title("Trip Duration by User Type")
plt.xlabel("User Type")
plt.ylabel("Trip Duration (minutes)")
plt.tight_layout()
plt.show()

### Box Plot Analysis
From this chart, we see that casual users take longer rides on average than members. The median for casuals is higher, and the data spread is wider—meaning they take more varied trips. Members ride more consistently, with fewer long-duration outliers, likely due to work commutes or errands.


## 6. FacetGrid by User Type

In [None]:
# Group average daily trip duration by user type
daily_duration = df_filtered.groupby(['date', 'member_casual'])['tripduration'].mean().reset_index()

# Create facet grid of line plots
g = sns.FacetGrid(data=daily_duration, col='member_casual', height=4, aspect=1.6)
g.map_dataframe(sns.lineplot, x='date', y='tripduration', color='tab:green')
g.set_axis_labels("Date", "Avg Trip Duration (min)")
g.set_titles("User Type: {col_name}")
g.fig.suptitle("Daily Average Trip Duration by User Type (2022)", fontsize=14)
plt.subplots_adjust(top=0.85)
plt.show()

### FacetGrid Analysis
This chart shows seasonal differences in riding behavior. Casual users’ trip durations rise in warm months and drop in winter, while members are more steady throughout the year—suggesting commute-driven use for members and leisure-driven use for casuals.
