# Kepler.gl Geospatial Mapping – Citi Bike NYC 2022

## Table of Contents
1. Setup & Imports
2. Load & Inspect Data
3. Aggregate Trips]
4. Create & Customize Kepler.gl Map
5. Final Summary - What the Map Shows

## 1. Setup & Imports <a name="setup-imports"></a>

In [1]:
import pandas as pd
import os
from keplergl import KeplerGl
from pyproj import CRS
import numpy as np
from matplotlib import pyplot as plt

  from pkg_resources import resource_string


## 2. Load & Inspect Data <a name="load-inspect-data"></a>

In [2]:
# Load the cleaned dataset (update filename if different)
df = pd.read_csv("/Users/emilsafarov/Library/CloudStorage/OneDrive-Personal/CF/CF_S2/nyc_citibike_dashboard_old/Data/Output/citibike_weather_merged_2022.csv")

# Preview the first few rows
df.head()

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual,date,avg_temp
0,3255D3E3F33CDC45,classic_bike,2022-03-18 15:38:17,2022-03-18 15:45:34,Mama Johnson Field - 4 St & Jackson St,HB404,South Waterfront Walkway - Sinatra Dr & 1 St,HB103,40.74314,-74.040041,40.736982,-74.027781,casual,2022-03-18,12.2
1,17FA5604A37338F9,electric_bike,2022-03-04 16:44:48,2022-03-04 16:50:45,Baldwin at Montgomery,JC020,Grove St PATH,JC005,40.723659,-74.064194,40.719586,-74.043117,member,2022-03-04,-2.7
2,7DEC9ADDB8D6BBE1,electric_bike,2022-03-13 17:44:32,2022-03-13 17:54:44,Baldwin at Montgomery,JC020,Grove St PATH,JC005,40.723659,-74.064194,40.719586,-74.043117,member,2022-03-13,-2.3
3,9D69F74EEF231A2E,classic_bike,2022-03-13 15:33:47,2022-03-13 15:41:22,Baldwin at Montgomery,JC020,Grove St PATH,JC005,40.723659,-74.064194,40.719586,-74.043117,member,2022-03-13,-2.3
4,C84AE4A9D78A6347,classic_bike,2022-03-11 12:21:18,2022-03-11 12:33:24,Baldwin at Montgomery,JC020,Grove St PATH,JC005,40.723659,-74.064194,40.719586,-74.043117,member,2022-03-11,4.9


In [3]:
df.shape

(895485, 15)

In [4]:
df.columns

Index(['ride_id', 'rideable_type', 'started_at', 'ended_at',
       'start_station_name', 'start_station_id', 'end_station_name',
       'end_station_id', 'start_lat', 'start_lng', 'end_lat', 'end_lng',
       'member_casual', 'date', 'avg_temp'],
      dtype='object')

## 3. Aggregate Trips <a name="aggregate-trips"></a>

In [5]:
# Add a value column to count each trip
df["value"] = 1

# Group by start and end station, then count trips
df_grouped = df.groupby(["start_station_name", "end_station_name"])["value"].count().reset_index()

# Rename for clarity
df_grouped.rename(columns={
    "start_station_name": "start_station",
    "end_station_name": "end_station",
    "value": "trips"
}, inplace=True)

In [6]:
# Recreate grouped data from scratch just in case
df["value"] = 1
df_grouped = df.groupby(["start_station_name", "end_station_name"])["value"].count().reset_index()
df_grouped.rename(columns={
    "start_station_name": "start_station",
    "end_station_name": "end_station",
    "value": "trips"
}, inplace=True)

# Create station coordinate reference – keep only 1 row per station name
start_coords = df.drop_duplicates(subset="start_station_name")[["start_station_name", "start_lat", "start_lng"]]
end_coords = df.drop_duplicates(subset="end_station_name")[["end_station_name", "end_lat", "end_lng"]]

# Merge in clean coordinates
df_grouped = df_grouped.merge(
    start_coords,
    left_on="start_station",
    right_on="start_station_name",
    how="left"
)

df_grouped = df_grouped.merge(
    end_coords,
    left_on="end_station",
    right_on="end_station_name",
    how="left"
)

# Drop redundant keys
df_grouped.drop(columns=["start_station_name", "end_station_name"], inplace=True)

# Final check
df_grouped.shape

(6953, 7)

### Note on Duplicated Rows
To ensure accurate aggregation of trips between stations, duplicated rows were dropped before exporting the map.

With duplicates included, the dataset contained over 1 million records, which would have inflated trip counts and potentially distorted the visual weight of arcs on the map.
While dropped for clarity in the main map, this raw trip volume could be visualized as a separate layer in future versions to highlight absolute usage density or temporal patterns across all records.

#### Note on Duplicated Rows
To ensure accurate aggregation of trips between stations, duplicated rows were dropped before exporting the map.

With duplicates included, the dataset contained over 1 million records, which would have inflated trip counts and potentially distorted the visual weight of arcs on the map.
While dropped for clarity in the main map, this raw trip volume could be visualized as a separate layer in future versions to highlight absolute usage density or temporal patterns across all records.

In [7]:
df_grouped.head(10)

Unnamed: 0,start_station,end_station,trips,start_lat,start_lng,end_lat,end_lng
0,11 St & Washington St,11 St & Washington St,1132,40.749985,-74.02715,40.749985,-74.02715
1,11 St & Washington St,12 Ave & W 40 St,1,40.749985,-74.02715,40.760875,-74.002777
2,11 St & Washington St,12 St & Sinatra Dr N,253,40.749985,-74.02715,40.750604,-74.02402
3,11 St & Washington St,14 St Ferry - 14 St & Shipyard Ln,395,40.749985,-74.02715,40.752961,-74.024353
4,11 St & Washington St,4 St & Grand St,350,40.749985,-74.02715,40.742258,-74.035111
5,11 St & Washington St,6 St & Grand St,314,40.749985,-74.02715,40.744398,-74.034501
6,11 St & Washington St,7 St & Monroe St,242,40.749985,-74.02715,40.746413,-74.037977
7,11 St & Washington St,8 St & Washington St,425,40.749985,-74.02715,40.745984,-74.028199
8,11 St & Washington St,9 St HBLR - Jackson St & 8 St,489,40.749985,-74.02715,40.747907,-74.038412
9,11 St & Washington St,Adams St & 11 St,315,40.749985,-74.02715,40.750916,-74.033541


## 4. Create & Customize Kepler.gl Map <a name="initialize-keplergl-map"></a>

In [8]:
from keplergl import KeplerGl
from keplergl.keplergl import data_to_json  # helper for safe export
import json

# Create clean map (don't add data directly)
map_final = KeplerGl(height=700)

# Add config manually
map_final.add_data(data=df_grouped, name="NYC Bike Trips")
config = map_final.config

# Manual export — bypassing broken internals
with open("nyc_bike_map_v2.html", "wb") as f:
    html = map_final._repr_html_(
        data={"NYC Bike Trips": df_grouped},
        config=config
    )
    f.write(html)

User Guide: https://docs.kepler.gl/docs/keplergl-jupyter


In [9]:
# (Optional) Save config to a separate JSON file for future reuse
with open("config.json", "w") as outfile:
    json.dump(config, outfile)

#### Note
Since Kepler.gl doesn’t render properly in JupyterLab and raised internal export issues, we manually added the data and configuration before exporting the map to HTML. This approach ensured the output map was correctly generated and allowed us to apply all customizations directly in the browser. The process remains focused on one cohesive step: map creation, styling, and export.

## 5. Final Summary – What the Map Shows <a name="map-observations"></a>

After plotting the bike trip data using an Arc Layer in Kepler.gl, the map reveals a clear and directional flow of trips:

The majority of trips originate in Jersey City and Hoboken, and also near the intersection of Highway 78 and 139 — dense, well-connected zones.
Many of these trips end in lower Manhattan, suggesting cross-river commuting, likely driven by proximity to job centers and transit hubs.
Some stations act as major bidirectional nodes, while others show heavily one-way traffic — highlighting strong commuter patterns.
By applying filters, the busiest start and end stations can be isolated, and lower-density zones can be flagged as potentially underserved.
Neighborhood Profile: Jersey City & Hoboken
Based on bestneighborhood.org, the neighborhoods where most trips start share the following traits:

Demographics: Predominantly White and Asian populations
Economic Indicators: High-income households, low to moderate unemployment
Housing: Mostly rental units with upper-market pricing
Urban Form: Transit-rich, walkable areas likely to support and depend on bike share access
This context helps explain the high trip volume and directional flows toward economic centers.

Suggestions for Further Investigation
Equity Analysis: Compare trip volume and infrastructure access across income and demographic lines.
Demand Management: Assess if popular Jersey City/Hoboken hubs are under pressure for bike availability.
Underserved Zones: Look for areas with low trip counts but growing populations or limited public transport.
Time Filters: Segment by hour to validate commute timing and potential for rebalancing bike supply.

## 6. Map use - information about layers

could be added in the future