# Fremont Bridge Cyclists Crossings
by Michael Kearns

# Business Understanding
Seattle is home to many cyclists who ride for both recreational or commuting purposes. Seattle has already developed extensive infrastructure for their residents that include designated walking/cycling paths and protected bike lines. Unique to Seattle geography is that North Seattle and Downtown are split by bodies of water that connect the Puget Sound to Lake Washington. This requires riders who travel from North Seattle to Downtown to cross a bridge that motor vehicles also use. I would like to build a model that can predict the number of cyclists that cross a bridge dependent on the daily weather. This project will specifically look at the Fremont Bridge, which has direct access to the South Lake Union and Downtown areas that host many major companies and business districts. The goal is this model could be used by the Seattle Department of Transportation to help improve or provide safer travel for cyclists across this bridge.

# Data Understanding
Data for this project will be collected from multiple sources. First, the city of Seattle tracks the number of cyclists that cross the Fremont bridge in both the northbound and southbound directions. Second, historical weather data from the National Center for Environmental Information (NCEI) is collected that includes daily temperature, precipitation levels, and snow levels. The data will range between 2013-2024. 
## Data Preparation
A key difference between the two datasets is the cyclist dataset has hourly recordings and the weather data is daily. Therefore, the cycling dataset will need to be converted to report daily numbers. All data considered will be numerical, and will have to be cleaned for possible cases of missing data or incorrectly recorded values.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#import csv file
bike = pd.read_csv('data/Fremont_Bridge_Bicycle_Counter_20250217.csv')

#check dataframe info
bike.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 107352 entries, 0 to 107351
Data columns (total 4 columns):
 #   Column                                                              Non-Null Count   Dtype  
---  ------                                                              --------------   -----  
 0   Date                                                                107352 non-null  object 
 1   Fremont Bridge Sidewalks, south of N 34th St Total                  107324 non-null  float64
 2   Fremont Bridge Sidewalks, south of N 34th St Cyclist West Sidewalk  107324 non-null  float64
 3   Fremont Bridge Sidewalks, south of N 34th St Cyclist East Sidewalk  107324 non-null  float64
dtypes: float64(3), object(1)
memory usage: 3.3+ MB


In [2]:
#Rename column names for simplicity
bike.columns = ['Date-Hr','Total','WestSidewalk','EastSidewalk']

In [3]:
bike['Date'] = bike['Date-Hr'].str[:10]
bike['Hr'] = bike['Date-Hr'].str[11:]

In [4]:
#Convert Date to datetime format
bike['Date'] = pd.to_datetime(bike['Date'])

In [5]:
#Create new dataframe that sums daily crossing between 2013 and 2024
bike_filtered = bike[bike['Date']>'2012-12-31']
bike_daily = bike.groupby(bike_filtered['Date'])[['Total','WestSidewalk','EastSidewalk']].sum()

In [6]:
#import weather data
weather = pd.read_csv('data/sea-tac_weather_2024.csv',header=1)

In [7]:
#Convert Date to datetime format
weather['Date'] = pd.to_datetime(weather['Date'])

In [8]:
#Filter weather data to match timeframe of bike data
weather_filtered = weather[(weather['Date']>'2012-12-31')&(weather['Date']<'2025-01-01')]

In [9]:
weather_filtered.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4383 entries, 23742 to 28124
Data columns (total 7 columns):
 #   Column                     Non-Null Count  Dtype         
---  ------                     --------------  -----         
 0   Date                       4383 non-null   datetime64[ns]
 1   TAVG (Degrees Fahrenheit)  4293 non-null   float64       
 2   TMAX (Degrees Fahrenheit)  4383 non-null   float64       
 3   TMIN (Degrees Fahrenheit)  4382 non-null   float64       
 4   PRCP (Inches)              4379 non-null   float64       
 5   SNOW (Inches)              4380 non-null   float64       
 6   SNWD (Inches)              4380 non-null   float64       
dtypes: datetime64[ns](1), float64(6)
memory usage: 273.9 KB


Remove missing data

In [10]:
#simplify column names
weather_filtered.columns = ['Date','TAVG','TMAX','TMIN','PRCP','SNOW','SNWD']

In [11]:
#drop rows with missing data
weather_filtered_clean = weather_filtered.dropna(axis = 0)
#reset weather dataframe
weather_filtered_clean.reset_index(inplace=True,drop = True)
weather_filtered_clean.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4286 entries, 0 to 4285
Data columns (total 7 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   Date    4286 non-null   datetime64[ns]
 1   TAVG    4286 non-null   float64       
 2   TMAX    4286 non-null   float64       
 3   TMIN    4286 non-null   float64       
 4   PRCP    4286 non-null   float64       
 5   SNOW    4286 non-null   float64       
 6   SNWD    4286 non-null   float64       
dtypes: datetime64[ns](1), float64(6)
memory usage: 234.5 KB


In [12]:
weather_filtered_clean.head()

Unnamed: 0,Date,TAVG,TMAX,TMIN,PRCP,SNOW,SNWD
0,2013-04-01,55.0,63.0,47.0,0.0,0.0,0.0
1,2013-04-02,53.0,57.0,48.0,0.0,0.0,0.0
2,2013-04-03,53.0,62.0,46.0,0.0,0.0,0.0
3,2013-04-04,54.0,58.0,50.0,0.33,0.0,0.0
4,2013-04-05,54.0,57.0,50.0,0.73,0.0,0.0


# Exploratory Data Analysis

# Modeling

# Final Model

# Conclusions

## Limitations

## Recommendations

## Next Steps