Machine learning

predict number of rides at a given station/area
determine how many stations will be needed if the amount of rides increases?

Goal 
improve bikeshare connectivity in Prince George, Maryland.
KPIS
Geographic & Accessibility KPIs
These show spatial patterns of bikeshare use.
Top Origin & Destination Stations – Identify the most frequently used stations.
Most Popular Routes – Determine the most common start-to-end station pairs.
Trips per Square Mile – Identify high- and low-density usage areas.
Coverage & Accessibility – Percentage of the city covered by bikeshare stations.
Ward-to-Ward Trip Flow – Count trips between different wards to see mobility patterns.


In [3]:
import pandas as pd
import numpy  as np
import matplotlib.pyplot as plt
import statistics as st
import seaborn as sns
import datetime 
from geopy import distance
import folium
from folium.plugins import MarkerCluster
from folium.features import GeoJsonTooltip
from branca.colormap import LinearColormap
from collections import Counter
import json
from shapely.geometry import Point
import geopandas as gpd
from shapely.geometry import shape
from shapely.wkt import loads 

Predicting Ride Demand (Number of Rides per Station per Hour/Day)

Features: Station proximity, metro distance,, time of day, weekday/weekend.
Model: Poisson Regression, Time Series Regression (ARIMA, XGBoost), Neural Networks.
Insight: Helps predict peak usage times and optimize bike availability.

In [7]:
data_types = {
    "rideable_type": "category", 
    "start_station_name": "category", 
    "end_station_name": "category", 
    "member_casual":"category",
    # "ride_id":"uint32",
    "time_of_day":"category",
    "trip_type":"category"}

In [8]:
prince_george = pd.read_csv("prince_georgy_cabi.csv",parse_dates= ["started_at", "ended_at"],dtype=data_types, low_memory=False)

1. Defining the Problem
Target Variable: Number of rides per station per time unit (hour/day/week).
Features:
Station Proximity: Average distance to nearest stations.
Metro Distance: Distance to the nearest metro station.
Time Factors: Hour of the day, day of the week, season.
Area: DC vs. Maryland (as a categorical variable).
2. Data Preprocessing
Aggregate the data by station and time interval (hourly or daily).
Encode categorical features (e.g., one-hot encoding for area).
Normalize numerical features like distance.
Handle missing values if any exist.
3. Choose a Model
Linear Regression (for simple relationships).
Poisson Regression (good for count data like demand).
Random Forest / XGBoost (for non-linear relationships and interactions).
4. Model Training & Evaluation
Train the model on historical data.
Use RMSE or Mean Absolute Error (MAE) for evaluation.
Tune hyperparameters (if using tree-based models).

In [13]:
prince_george.columns
prince_george = prince_george.drop(columns=["Unnamed: 0","AREA_COVER", "index_right",'ACREAGE',
       'IMPRT_DATE', 'SHAPE_AREA', 'SHAPE_LEN'])

In [14]:
prince_george

Unnamed: 0,rideable_type,started_at,ended_at,start_station_name,end_station_name,member_casual,start_lat,start_lng,end_lat,end_lng,trip_duration_minutes,time_of_day,year,geometry,WARD,NAME_left,COUNTY,area,NAME_right
0,electric_bike,2022-01-01 01:14:29,2022-01-01 01:18:46,Capitol Heights Metro,,member,38.888528,-76.913045,38.880000,-76.920000,4.0,night,2022,POINT (-76.913045 38.888527833333335),,,Prince George's,Maryland,TOWN OF CAPITOL HEIGHTS
1,classic_bike,2022-01-01 06:27:29,2022-01-01 06:50:59,Chillum Rd & Riggs Rd / Riggs Plaza,The Mall at Prince Georges,member,38.961737,-76.995922,38.968842,-76.954171,24.0,morning,2022,POINT (-76.995922 38.961737),,,Prince George's,Maryland,CHILLUM
2,electric_bike,2022-01-01 08:08:08,2022-01-01 08:14:01,Baltimore Ave & Jefferson St,,casual,38.955485,-76.940117,38.970000,-76.940000,6.0,morning,2022,POINT (-76.94011666666667 38.955485),,,Prince George's,Maryland,CITY OF HYATTSVILLE
3,classic_bike,2022-01-01 09:51:55,2022-01-01 10:18:21,The Mall at Prince Georges,Chillum Rd & Riggs Rd / Riggs Plaza,member,38.968842,-76.954171,38.961737,-76.995922,26.0,morning,2022,POINT (-76.954171 38.968842),,,Prince George's,Maryland,CITY OF HYATTSVILLE
4,electric_bike,2022-01-01 10:28:21,2022-01-01 10:33:19,,Prince George's Plaza Metro,casual,38.960000,-76.950000,38.965742,-76.954803,5.0,morning,2022,POINT (-76.95 38.96),,,Prince George's,Maryland,CITY OF HYATTSVILLE
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
130381,electric_bike,2024-12-12 17:52:14.055000,2024-12-12 17:55:16.686,,,member,38.960000,-76.970000,38.960000,-76.960000,,,2024,POINT (-76.97 38.96),,,Prince George's,Maryland,CITY OF HYATTSVILLE
130382,electric_bike,2024-12-05 20:25:09.306000,2024-12-05 20:35:14.973,,,member,38.970000,-76.950000,38.960000,-76.940000,,,2024,POINT (-76.95 38.97),,,Prince George's,Maryland,CITY OF HYATTSVILLE
130383,electric_bike,2024-12-29 12:31:24.147000,2024-12-29 12:34:45.223,,,member,38.900000,-76.910000,38.900000,-76.910000,,,2024,POINT (-76.91 38.9),,,Prince George's,Maryland,CEDAR HEIGHTS
130384,electric_bike,2024-12-17 15:00:00.890000,2024-12-17 15:01:46.507,,,member,38.960000,-76.970000,38.960000,-76.970000,,,2024,POINT (-76.97 38.96),,,Prince George's,Maryland,CITY OF HYATTSVILLE


In [15]:
prince_george.isna().sum()

rideable_type                 0
started_at                    0
ended_at                      0
start_station_name        62863
end_station_name          63976
member_casual                 0
start_lat                     0
start_lng                     0
end_lat                     168
end_lng                     168
trip_duration_minutes     77877
time_of_day               77877
year                          0
geometry                      0
WARD                     130386
NAME_left                130386
COUNTY                        0
area                          0
NAME_right                    0
dtype: int64

In [17]:
prince_george["rideable_type"].value_counts()

rideable_type
electric_bike    93566
classic_bike     32922
docked_bike       3898
Name: count, dtype: int64

In [18]:
ebikes = prince_george[prince_george["rideable_type"] == "electric_bike"]

In [21]:
docked = prince_george[(prince_george["rideable_type"] == "classic_bike")&(prince_george["rideable_type"] == "docked_bike")]

In [22]:
docked.isna().sum()

rideable_type            0
started_at               0
ended_at                 0
start_station_name       0
end_station_name         0
member_casual            0
start_lat                0
start_lng                0
end_lat                  0
end_lng                  0
trip_duration_minutes    0
time_of_day              0
year                     0
geometry                 0
WARD                     0
NAME_left                0
COUNTY                   0
area                     0
NAME_right               0
dtype: int64

In [19]:
ebikes.isna().sum()

rideable_type                0
started_at                   0
ended_at                     0
start_station_name       62863
end_station_name         63445
member_casual                0
start_lat                    0
start_lng                    0
end_lat                      0
end_lng                      0
trip_duration_minutes    66921
time_of_day              66921
year                         0
geometry                     0
WARD                     93566
NAME_left                93566
COUNTY                       0
area                         0
NAME_right                   0
dtype: int64

all the station missing values correspond to ebikes.