# Blue Bikes Visuals
*Finnley Autumn Rogers* | 2024-07-22

Please see README.md for data overview and description of visualization goals.

This notebook is meant to be the container for visualization code and descriptions of the creation process.

## Data Preprocessing



In [210]:
import pandas as pd
import numpy as np
import geopandas as gpd

import folium
import matplotlib as mpl

In [229]:
# read data
bb2019_path = "data/bluebikes_tripdata_2019.csv"
bb2020_path = "data/bluebikes_tripdata_2020.csv"

In [230]:
# bb2019 = pd.read_csv(bb2019_path, nrows= 200)
bb2020 = pd.read_csv(bb2020_path, nrows= 200)

In [231]:
bb2020.head(2)

Unnamed: 0,tripduration,starttime,stoptime,start station id,start station name,start station latitude,start station longitude,end station id,end station name,end station latitude,end station longitude,bikeid,usertype,postal code,year,month,birth year,gender
0,1793,2020-11-01 00:00:18.3990,2020-11-01 00:30:12.2630,186,Congress St at Northern Ave,42.3481,-71.03764,186,Congress St at Northern Ave,42.3481,-71.03764,4896,Customer,11214.0,2020,11,,
1,1832,2020-11-01 00:00:34.3330,2020-11-01 00:31:07.2920,186,Congress St at Northern Ave,42.3481,-71.03764,186,Congress St at Northern Ave,42.3481,-71.03764,5630,Customer,11220.0,2020,11,,


Lets start by plotting all the unique bike stations on a map to get a "lay of the land". 

In [232]:
# get unique stations
stations = bb2020[['start station name', 'start station latitude', 'start station longitude']].drop_duplicates(subset = ['start station name'])

# renaming columns for brevity
stations.columns = ['start_station', 'latitude', 'longitude']

In [233]:
# make geopandas dataframe for folium
mapping_df = gpd.GeoDataFrame(stations)
mapping_df.head(2)

Unnamed: 0,start_station,latitude,longitude
0,Congress St at Northern Ave,42.3481,-71.03764
3,Harvard Square at Mass Ave/ Dunster,42.373268,-71.118579


In [234]:
# make folium map
station_map = folium.Map(tiles='OpenStreetMap', zoom_start=20)

In [235]:
def plotPoint(point):
    '''
    ## plotPoint

    function to add a given point to a folium map

    ### Parameters

    - point (pandas df row): a row containing a latitude and longitude column

    ### Return

    - Boolean: True if plotting was successful, False if an error is raised
    '''

    folium.Marker(location = [point.latitude, point.longitude],
                  popup = point.start_station,
                  icon = folium.Icon(color = 'blue', icon='bicycle', prefix='fa')).add_to(station_map)

In [236]:
mapping_df.apply(plotPoint, axis = 1)

0      None
3      None
4      None
5      None
7      None
       ... 
187    None
191    None
195    None
196    None
197    None
Length: 95, dtype: object

In [237]:
station_map.fit_bounds(station_map.get_bounds())

In [238]:
# save map to outputs
station_map.save("outputs/bluebikes_station_locations.html")

# display inline
station_map

So this is nice, but a little context-less. How well do these stations serve the general population of Boston? 

Answering this is a 2 step process. I'll need to overlay my existing map with a choropleth of boston zip codes (the most granular area data I can get for population numbers). So I'll need shape data and population data at the zip code granularity. 

Shapefiles are easily accessible from [Mass.gov](https://www.mass.gov/info-details/massgis-data-zip-codes-5-digit-from-here-navteq#downloads-) and population data broken down by zip code is sourced from [Cubit](https://www.massachusetts-demographics.com/zip_codes_by_population).


In [262]:
# read zip data
# excel dataset so it'll be the worst format conceived for no reason.
zip_pop = pd.read_excel("data/mass_population_zip_2024.xlsx", header=5, dtype='str')[['Massachusetts', 7001399]]

# rename columns
zip_pop.columns = ['zip_code', 'population']

# fix population dtype
zip_pop.population = pd.to_numeric(zip_pop.population)

In [263]:
zip_pop.head(2)

Unnamed: 0,zip_code,population
0,1001,16045
1,1002,22992


In [264]:
zip_pop.dtypes

zip_code      object
population     int64
dtype: object

In [265]:
# read shapefiles
shps = gpd.read_file("data/zip_shapefiles/ZIPCODES_NT_POLY.shp")
shps.head(2)

Unnamed: 0,POSTCODE,PC_NAME,PC_TYPE,PA_NAME,PA_FIPS,CITY_TOWN,COUNTY,AREA_SQMI,SHAPE_AREA,SHAPE_LEN,geometry
0,2360,PLYMOUTH,NON UNIQUE,PLYMOUTH,54275,"PLYMOUTH, TOWN OF",PLYMOUTH,103.140309,267132200.0,108508.650013,"MULTIPOLYGON (((267077.975 859381.793, 267123...."
1,1230,GREAT BARRINGTON,NON UNIQUE,GREAT BARRINGTON,26780,"GREAT BARRINGTON, TOWN OF",BERKSHIRE,96.726569,250520700.0,131086.916737,"POLYGON ((43036.16 891528.799, 44309.248 89151..."


In [266]:
# join on postcode
# ensure geodataframe is left dataset
choro_df = shps.merge(zip_pop, left_on='POSTCODE', right_on='zip_code', how = 'left')
choro_df = choro_df[['POSTCODE', 'zip_code', 'population', 'geometry']].dropna()
choro_df['POSTCODE'] = choro_df['POSTCODE'].astype('str')
choro_df.dtypes

POSTCODE        object
zip_code        object
population     float64
geometry      geometry
dtype: object

Now we can overlay the existing map of bike stops with the population data

In [259]:
# make copy of station_map
pop_map = station_map

In [268]:
folium.Choropleth(
    geo_data = choro_df,
    name = 'choropleth',
    data = choro_df,
    columns = ['population', 'zip_code'],
    key_on="feature.properties.zip_code",
    fill_color="YlGn"
)

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

In [240]:
pop_map