# Achievement 2.5: Advanced Geospatial Plotting with Kepler.gl

This notebook uses `kepler.gl` to visualize the top 50 Citi Bike routes in New York City from 2022. It incorporates weather data, station locations, and ride metadata to create an interactive dashboard for route analysis. You will see trip arcs, start/end stations, and dynamic filters—all styled for clarity and insight.

## Table of Contents
1. [Imports and Setup](#1.-Imports-and-Setup)
2. [Load and Prepare Data](#2.-Load-and-Prepare-Data)
3. [Aggregate Top 50 Routes](#3.-Aggregate-Top-50-Routes)
4. [Create Kepler Map](#4.-Create-Kepler-Map)
5. [Customization and Layers](#5.-Customization-and-Layers)
6. [Filters and Insights](#6.-Filters-and-Insights)
7. [Export HTML Map](#7.-Export-HTML-Map)

## 1. Imports and Setup

In [16]:
from keplergl import KeplerGl
import pandas as pd

## 2. Load and Prepare Data

We load the dataset created in Exercise 2.4 and filter it to prepare for aggregation.

In [17]:
# Load dataset
df = pd.read_csv("citibike_weather_merged_2022.csv", parse_dates=["started_at"])

# Add a column for trip count to allow aggregation
df["trip_count"] = 1
print("\u2705 Data shape:", df.shape)
df.head()

  df = pd.read_csv("citibike_weather_merged_2022.csv", parse_dates=["started_at"])


✅ Data shape: (29838806, 18)


Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual,date,PRCP,TMAX,TMIN,trip_count
0,BFD29218AB271154,electric_bike,2022-01-21 13:13:43.392,2022-01-21 13:22:31.463,West End Ave & W 107 St,7650.05,Mt Morris Park W & W 120 St,7685.14,40.802117,-73.968181,40.804038,-73.945925,member,2022-01-21,0.0,23.0,15.0,1
1,7C953F2FD7BE1302,classic_bike,2022-01-10 11:30:54.162,2022-01-10 11:41:43.422,4 Ave & 3 St,4028.04,Boerum Pl\t& Pacific St,4488.09,40.673746,-73.985649,40.688489,-73.99116,member,2022-01-10,0.0,42.0,26.0,1
2,95893ABD40CED4B8,electric_bike,2022-01-26 10:52:43.096,2022-01-26 11:06:35.227,1 Ave & E 62 St,6753.08,5 Ave & E 29 St,6248.06,40.761227,-73.96094,40.745168,-73.986831,member,2022-01-26,0.0,30.0,21.0,1
3,F853B50772137378,classic_bike,2022-01-03 08:35:48.247,2022-01-03 09:10:50.475,2 Ave & E 96 St,7338.02,5 Ave & E 29 St,6248.06,40.783964,-73.947167,40.745168,-73.986831,member,2022-01-03,0.0,39.0,24.0,1
4,7590ADF834797B4B,classic_bike,2022-01-22 14:14:23.043,2022-01-22 14:34:57.474,6 Ave & W 34 St,6364.1,5 Ave & E 29 St,6248.06,40.74964,-73.98805,40.745168,-73.986831,member,2022-01-22,0.0,30.0,14.0,1


## 3. Aggregate Top 50 Routes

Grouped by origin and destination station names and coordinates, summing the number of trips to identify the most frequent routes.

In [18]:
# Group by trip route
top_routes = df.groupby([
    "start_station_name", "start_lat", "start_lng",
    "end_station_name", "end_lat", "end_lng"
])["trip_count"].sum().reset_index()

# Sort and keep top 50
top_routes = top_routes.sort_values(by="trip_count", ascending=False).head(50)

# Merge back with full dataset to keep all original fields for top routes only
df["route"] = (
    df["start_station_name"] + " -> " + df["end_station_name"]
)
top50_df = df[df["route"].isin(
    top_routes["start_station_name"] + " -> " + top_routes["end_station_name"]
)]

print("\u2705 Top 50 trip data shape:", top50_df.shape)
top50_df.head()

✅ Top 50 trip data shape: (236998, 19)


Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual,date,PRCP,TMAX,TMIN,trip_count,route
2452,EF0CBDDC95319C42,classic_bike,2022-01-11 19:32:09.252,2022-01-11 19:34:21.964,Amsterdam Ave & W 73 St,7260.09,Amsterdam Ave & W 79 St,7311.02,40.779668,-73.98093,40.782939,-73.978652,member,2022-01-11,0.0,26.0,17.0,1,Amsterdam Ave & W 73 St -> Amsterdam Ave & W 7...
2453,2E5CB4185318D812,classic_bike,2022-01-14 17:40:08.285,2022-01-14 17:42:39.175,Amsterdam Ave & W 73 St,7260.09,Amsterdam Ave & W 79 St,7311.02,40.779668,-73.98093,40.782939,-73.978652,member,2022-01-14,0.0,43.0,22.0,1,Amsterdam Ave & W 73 St -> Amsterdam Ave & W 7...
2454,0DC9B946329DF832,electric_bike,2022-01-11 18:09:35.888,2022-01-11 18:11:32.954,Amsterdam Ave & W 73 St,7260.09,Amsterdam Ave & W 79 St,7311.02,40.779668,-73.98093,40.782939,-73.978652,member,2022-01-11,0.0,26.0,17.0,1,Amsterdam Ave & W 73 St -> Amsterdam Ave & W 7...
2455,ECD78BB5A277C8C3,classic_bike,2022-01-16 18:16:57.025,2022-01-16 18:23:01.743,Amsterdam Ave & W 73 St,7260.09,Amsterdam Ave & W 79 St,7311.02,40.779668,-73.98093,40.782939,-73.978652,member,2022-01-16,0.36,40.0,10.0,1,Amsterdam Ave & W 73 St -> Amsterdam Ave & W 7...
2457,856E0EA52160358D,classic_bike,2022-01-20 21:57:20.543,2022-01-20 21:59:04.827,Amsterdam Ave & W 73 St,7260.09,Amsterdam Ave & W 79 St,7311.02,40.779668,-73.98093,40.782939,-73.978652,member,2022-01-20,0.25,47.0,23.0,1,Amsterdam Ave & W 73 St -> Amsterdam Ave & W 7...


## 4. Create Kepler Map

Passed the full-detail dataframe (top 50 routes only) into Kepler.gl.

In [23]:
# Initialize the map and pass in dataset
map_50 = KeplerGl(height=600)
map_50.add_data(data=top50_df, name="Top 50 Routes - Full Detail")

# Display map
map_50

User Guide: https://docs.kepler.gl/docs/keplergl-jupyter


Out of range float values are not JSON compliant
Supporting this message is deprecated in jupyter-client 7, please make sure your message is JSON-compliant
  content = self.pack(content)


KeplerGl(data={'Top 50 Routes - Full Detail': {'index': [2452, 2453, 2454, 2455, 2457, 2458, 2467, 2477, 2479,…

## 5. Customization and Layers

In Kepler.gl's UI:

- **Start Station Layer (point):**
  - Colored by `trip_count`
  - Color scale: Quantile, Opacity: 0.7
  - Radius: 10

- **End Station Layer (point):**
  - Colored by `trip_count`
  - Color scale: Quantile, Opacity: 0.7
  - Radius: 10
  
- **Trip Flow by Volume (arc):**
  - Colored by `trip_count`
  - Color scale: Quantile, Opacity: 0.8
  - Stroke range: Min 2000, Max 11,000
  - Stroke based on: `trip_count`

> Chose to use trip count to color both arcs and points to visually reinforce which stations and routes are most frequently used.

## 6. Filters and Insights

Applied a filter on the `started_at` datetime column to explore how trip volume changes across seasons.

### Key Observations
- **High-volume trip clusters** center around Midtown and Lower Manhattan.
- **Arcs form dense corridors**, often following subway and business commuting patterns.
- **Uptick during warmer months** is observable with the `started_at` time filter.

These trends suggest a blend of commuting and recreational usage influenced by seasonality and urban infrastructure.

## 7. Export HTML Map

This cell saves the interactive map to a standalone HTML file for submission.

In [24]:
# Save map configuration and HTML
config = map_50.config
map_50.save_to_html(file_name="citibike_arc_map_2022.html", config=config)
print("\u2705 Map successfully saved as HTML")

Map saved to citibike_arc_map_2022.html!
✅ Map successfully saved as HTML
