### 2.5 Advanced Geospatial Plotting

#### Import libraries and load data

In [1]:
import pandas as pd
import os
from keplergl import KeplerGl
from pyproj import CRS
import numpy as np
from matplotlib import pyplot as plt

In [2]:
df = pd.read_csv('NY_data_sample.csv', index_col = 0)

In [3]:
df.head()

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,end_station_name,start_lat,start_lng,end_lat,end_lng,member_casual,date,avgTemp,value,bike_rides_daily,trip_duration
0,2993FF066FCB3D4D,electric_bike,2022-09-14 18:12:25.303,2022-09-14 18:18:43.395,W 18 St & 6 Ave,Sullivan St & Washington Sq,40.739713,-73.994564,40.730477,-73.999061,member,2022-09-14,22.9,1,1384,6.301533
1,F29D0D965A2E4F34,classic_bike,2022-09-14 11:01:04.111,2022-09-14 11:04:01.488,21 St & 4 Ave,4 Ave & 17 St,40.662584,-73.995554,40.665507,-73.993037,member,2022-09-14,22.9,1,1384,2.956283
2,414C54C4E7AF5121,electric_bike,2022-09-14 17:12:54.897,2022-09-14 17:21:17.592,W 13 St & 5 Ave,E 11 St & 3 Ave,40.735445,-73.99431,40.73127,-73.98849,casual,2022-09-14,22.9,1,1384,8.37825
3,EFD664E543812DAB,classic_bike,2022-09-14 05:04:20.448,2022-09-14 05:05:36.216,1 Ave & E 30 St,2 Ave & E 29 St,40.741444,-73.975361,40.741724,-73.978093,member,2022-09-14,22.9,1,1384,1.2628
4,02E14999E265B4B2,classic_bike,2022-09-14 23:09:16.578,2022-09-14 23:27:34.445,Henry St & Remsen St,Fulton St & Clermont Ave,40.69401,-73.994651,40.684157,-73.969223,member,2022-09-14,22.9,1,1384,18.297783


In [5]:
# Create a dataframe with unique station pairs and their coordinates
station_coords = df.drop_duplicates(
    subset=["start_station_name", "end_station_name"]
)[["start_station_name", "end_station_name", "start_lat", "start_lng", "end_lat", "end_lng"]]

# Drop the original lat/long columns from the main dataframe
df_drop_dups = df.drop(columns=["start_lat", "start_lng", "end_lat", "end_lng"])

# Merge the unique coordinates back into the main dataframe
df_drop_dups = df_drop_dups.merge(station_coords, on=["start_station_name", "end_station_name"], how="left")

# Verify the number of rows stays the same
assert len(df_drop_dups) == len(df)


#### Create a new column with the value of 1. Then create a new aggregated dataframe that contains 3 columns: starting station, ending station, and the count of trips between those stations.

In [6]:
# Create a value column and group by start and end station 

df_drop_dups['value'] = 1
df_group = df_drop_dups.groupby(['start_station_name', 'end_station_name',])['value'].count().reset_index()

In [7]:
df_group

Unnamed: 0,start_station_name,end_station_name,value
0,1 Ave & E 110 St,1 Ave & E 110 St,6
1,1 Ave & E 110 St,1 Ave & E 44 St,1
2,1 Ave & E 110 St,1 Ave & E 78 St,1
3,1 Ave & E 110 St,1 Ave & E 94 St,3
4,1 Ave & E 110 St,2 Ave & E 104 St,5
...,...,...,...
149688,Yankee Ferry Terminal,Pioneer St & Van Brunt St,1
149689,Yankee Ferry Terminal,Soissons Landing,45
149690,Yankee Ferry Terminal,South St & Gouverneur Ln,1
149691,Yankee Ferry Terminal,South St & Whitehall St,2


In [8]:
print(df_group['value'].sum())
print(df_drop_dups.shape)

297668
(298382, 16)


In [9]:
df_group.rename(columns = {'value': 'trips'}, inplace = True)

In [10]:
df_group.head()

Unnamed: 0,start_station_name,end_station_name,trips
0,1 Ave & E 110 St,1 Ave & E 110 St,6
1,1 Ave & E 110 St,1 Ave & E 44 St,1
2,1 Ave & E 110 St,1 Ave & E 78 St,1
3,1 Ave & E 110 St,1 Ave & E 94 St,3
4,1 Ave & E 110 St,2 Ave & E 104 St,5


##### Create the appropriate dataframe with stations, trips and longitute and latitude

In [11]:
df_final=df_drop_dups.groupby(['start_station_name', 'end_station_name', 'start_lat', 'start_lng', 'end_lat', 'end_lng'])['value'].count().reset_index()

In [12]:
df_final.head()

Unnamed: 0,start_station_name,end_station_name,start_lat,start_lng,end_lat,end_lng,value
0,1 Ave & E 110 St,1 Ave & E 110 St,40.792327,-73.9383,40.792327,-73.9383,6
1,1 Ave & E 110 St,1 Ave & E 44 St,40.792327,-73.9383,40.75002,-73.969053,1
2,1 Ave & E 110 St,1 Ave & E 78 St,40.792327,-73.9383,40.771404,-73.953517,1
3,1 Ave & E 110 St,1 Ave & E 94 St,40.792327,-73.9383,40.781721,-73.94594,3
4,1 Ave & E 110 St,2 Ave & E 104 St,40.792327,-73.9383,40.789211,-73.943708,5


In [13]:
df_final.rename(columns = {'value':'trips',}, inplace = True)

In [14]:
df_final.head()

Unnamed: 0,start_station_name,end_station_name,start_lat,start_lng,end_lat,end_lng,trips
0,1 Ave & E 110 St,1 Ave & E 110 St,40.792327,-73.9383,40.792327,-73.9383,6
1,1 Ave & E 110 St,1 Ave & E 44 St,40.792327,-73.9383,40.75002,-73.969053,1
2,1 Ave & E 110 St,1 Ave & E 78 St,40.792327,-73.9383,40.771404,-73.953517,1
3,1 Ave & E 110 St,1 Ave & E 94 St,40.792327,-73.9383,40.781721,-73.94594,3
4,1 Ave & E 110 St,2 Ave & E 104 St,40.792327,-73.9383,40.789211,-73.943708,5


##### Export new data frame to csv

In [15]:
df_final.to_csv('df_final_locations_for_map_.csv')

#### Initialize an instance of a kepler.gl map.

In [17]:
# Create KeplerGl instance

m = KeplerGl(height = 700, data={"data_1": df_final})
m

User Guide: https://docs.kepler.gl/docs/keplergl-jupyter


KeplerGl(data={'data_1':            start_station_name           end_station_name  start_lat  \
0            1…

In [18]:
config = m.config

In [19]:
config

{}

### Map Customization Explanation

Point Layer:

Color Adjustment: Points were customized with a magenta for to represent the starting stations and yellow to represent the ending stations. This choice ensures the points are distinct and easy to locate.

Arc Layer:

Color Palette: The arcs connecting starting and ending stations were customized with a warm red-orange gradient to emphasize the flow of trips.

Filters:

A filter was applied to highlight trips the higher number of trips. This customization allowed for the identification of the most common trips and high-activity zones within NYC.


### Add a filter to your map and use it to see what the most common trips are in New York City. What else makes an impression? For example, are there any zones that seem particularly busy? Using some additional research, write a few sentences to make sense of that output.

The most common trips in New York City are concentrated in Manhattan, particularly in high-traffic areas like Midtown and Downtown. Zones around Times Square, Central Park, and the Financial District stand out as particularly busy, reflecting their significance as hubs for both commuters and tourists. Additionally, there is a noticeable flow of trips between Manhattan and nearby areas like Jersey City and Hoboken, indicating a strong commuter connection.

Midtown Manhattan: This area sees heavy activity due to its dense concentration of offices, entertainment venues, and landmarks. The proximity of major transit hubs like Penn Station and Grand Central Terminal further amplifies its traffic.

Downtown Manhattan: The Financial District, with its corporate offices and tourist attractions like the 9/11 Memorial, also experiences significant trip volumes.

Commuter Zones: The connections between Manhattan and New Jersey highlight the importance of cross-river commuting, likely driven by professionals working in Manhattan but living in more affordable areas across the Hudson.