In [1]:
import pandas as pd
import os
from keplergl import KeplerGl
from pyproj import CRS 
import numpy as np
from matplotlib import pyplot as plt

In [13]:
df = pd.read_csv('merged_citibike_weather_sample_cleaned.csv', low_memory=False)

In [14]:
# Add a 'value' column to represent each trip
df['value'] = 1

In [15]:
# Group by starting and ending stations and count the number of trips
df_grouped = df.groupby(['start_station_name', 'end_station_name'])['value'].count().reset_index()

In [16]:
# Create a subset of the dataframe with station names and coordinates
station_coords = df[['start_station_name', 'end_station_name', 'start_lat', 'start_lng', 'end_lat', 'end_lng']].drop_duplicates(subset=['start_station_name', 'end_station_name'])

In [17]:
# Merge the coordinates back to the grouped dataframe
df_grouped_with_coords = pd.merge(df_grouped, station_coords, on=['start_station_name', 'end_station_name'], how='left')

In [18]:
# Rename the columns for clarity
df_grouped_with_coords.rename(columns={
    'start_station_name': 'Starting Station', 
    'end_station_name': 'Ending Station', 
    'value': 'Trip Count'}, 
    inplace=True)

In [19]:
# Save the result to a CSV file
df_grouped_with_coords.to_csv('trip_counts_with_coords.csv', index=False)

In [20]:
m = KeplerGl(height=700)

User Guide: https://docs.kepler.gl/docs/keplergl-jupyter


In [21]:
m.add_data(data=df_grouped_with_coords, name="Bike trips aggregated")

In [12]:
m

KeplerGl(data={'Bike trips aggregated': {'index': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…

### Map Customization

1. **Color Customization for Points**:
    - I set the **start station points** to be represented in **blue** and the **end station points** in **yellow**. This choice was made to visually differentiate between the start and end of bike trips, making it easier for the viewer to distinguish the two station types.
    
2. **Arc Layer for Bike Trips**:
    - I added an **Arc Layer** to connect the start and end stations. This layer visually represents the trips taken between stations.
    - I used a **purple-orange color gradient** for the arcs, where brighter colors represent more frequent trips between two stations. The thickness of the arcs was also adjusted to emphasize heavily trafficked routes.
    - This setup allows for an intuitive understanding of which stations are most connected by bike trips and which routes are the most popular.



### Filter Customization

**Filter Setup**:
    - I added a filter based on the **Trip Count** column to highlight only the most common trips (those with a trip count above 45).
    - By moving the slider to the right, I was able to exclude less frequent trips and focus on the busiest routes in the city.


### Observations from the Map:

1. **Midtown Manhattan**: As expected, the most frequent trips occur around **Midtown Manhattan**, especially near **Central Park**. This makes sense because it’s one of the busiest areas in the city, with a high concentration of both tourists and locals using bike-sharing services.

2. **Busy Areas Around Universities**: You can also see some prominent routes around **Columbia University** and **New York University (NYU)**. These areas have a high density of students and staff, who may be using bikes for shorter commutes around campus or to nearby subway stations.


### What Else Stands Out:

- **Tourist Areas**: Some of the busiest routes seem to be near **Times Square** and **Lower Manhattan**, near landmarks like the **World Trade Center** and **Battery Park**, indicating heavy bike traffic in these areas.
- **Lesser Activity in Uptown**: Uptown Manhattan, especially north of Central Park, has relatively fewer trips, which suggests that the bike-sharing program sees less use in those residential areas compared to the business-heavy Midtown and Lower Manhattan areas.

### Why These Patterns Make Sense:

**High Tourist Traffic**: Central Park, Times Square, and Lower Manhattan are some of the most visited spots in New York City, with tourists frequently opting to use bikes to travel between these locations.


### Conclusion:

The filter you applied helps highlight the areas of New York City with the most bike-sharing activity. These patterns suggest that bike-sharing is most popular in tourist hotspots and areas with high pedestrian traffic, like Midtown and the bridges connecting boroughs. A deeper analysis might reveal even more about the time of day or specific trip patterns, further enhancing the understanding of how bike trips are distributed across the city.

In [22]:
# Save the current configuration of the map
config = m.config

In [23]:
# Save the map to an HTML file with the configuration settings
m.save_to_html(file_name='NY_Bike_Trips_Customized.html', read_only=False, config=config)

Map saved to NY_Bike_Trips_Customized.html!
