Project 1, Group 3 - Places to Bike
Overview
Biking is becoming more prevalent as a form of commuting to work, exercise, traveling, and for leisure. In this project, our group explored cities that were considered "bike-friendly" based on the Places for Bikes city ratings. Citites were rated on five factors: Ridership, Safety, Network, Acceleration, and Reach. Each factors were scored on a five-point scale and weighted at 20% each.

People for Bikes organization defined the five factors as:

Ridership - reflects how many people in the community ride bikes

Safety - considers fatalities and injuries of people on bikes as well as those walking and driving.

Network - evaluates the quality of the bike network -- how completely it connects people to each other and local destinations using comfortable route

Reach - determines how well a community's low-stress network serves all members of the community

Acceleration - assesses how quickly a community is improving its biking infrastructure and how successful its encouragement programs are at getting people to ride

Some abbreviations to keep in mind:

ACS - U.S. Census American Community Survey
FARS - Fatality Analysis Reporting System
BNA - PlacesForBikes Bike Network Analysis

In [1]:
# Import dependencies
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

In [2]:
# Store filepath as variable
places_for_bikes = "data/places_for_bikes_results.csv"

In [4]:
# Read data with Pandas
bike_data = pd.read_csv(places_for_bikes, encoding="ISO-8859-1")

# Display 10 rows of data
bike_data.head(5)

Unnamed: 0,Places_ID_2019,City,City_Alt,ACS Bike-to-Work Mode Share,Land Area,Population,ACS Target,ACS Normalized Score,ACS Ridership Points,SMS Recreation Riding,...,ACS Bike-to-Work Mode Share Men,ACS Bike-to-Work Mode Share Women,ACS Gap,ACS Tier,ACS_Target,Distance.1,ACS Points,Total Reach Points,Bonus,Total Ponts
0,363,"SPRINGDALE, ARKANSAS","SPRINGDALE, AR",0.30%,41.8,77252,12.60%,1.7,0.1,12.40%,...,0.01,0.0,0.01,1.0,-0.02,0.03,1.0,4.0,0.5,3.0
1,116,"ENID, OKLAHOMA","ENID, OK",0.70%,73.7,50809,11.50%,5.5,0.3,13.00%,...,0.01,0.0,0.01,2.0,-0.05,0.06,1.2,3.8,0.0,2.4
2,108,"DUBUQUE, IOWA","DUBUQUE, IA",0.40%,30.0,58410,20.80%,2.0,0.1,16.50%,...,0.0,0.0,0.0,1.0,-0.02,0.02,1.7,3.8,0.0,2.1
3,357,"SOUTH BEND, INDIANA","SOUTH BEND, IN",1.30%,41.5,101928,12.60%,7.8,0.4,15.80%,...,0.02,0.01,0.02,3.0,-0.6,0.62,3.1,3.7,0.0,2.0
4,1205,"CRESTED BUTTE, COLORADO","CRESTED BUTTE, CO",41.10%,0.8,1385,51.00%,75.6,3.8,18.80%,...,0.36,0.52,-0.16,3.0,-0.6,0.44,3.6,3.6,0.0,2.6


In [5]:
# Renaming column
bike_data_df = bike_data.rename(columns={"Total Ponts":"Total Points", " Population ":"Population"})
bike_data_df.head()

Unnamed: 0,Places_ID_2019,City,City_Alt,ACS Bike-to-Work Mode Share,Land Area,Population,ACS Target,ACS Normalized Score,ACS Ridership Points,SMS Recreation Riding,...,ACS Bike-to-Work Mode Share Men,ACS Bike-to-Work Mode Share Women,ACS Gap,ACS Tier,ACS_Target,Distance.1,ACS Points,Total Reach Points,Bonus,Total Points
0,363,"SPRINGDALE, ARKANSAS","SPRINGDALE, AR",0.30%,41.8,77252,12.60%,1.7,0.1,12.40%,...,0.01,0.0,0.01,1.0,-0.02,0.03,1.0,4.0,0.5,3.0
1,116,"ENID, OKLAHOMA","ENID, OK",0.70%,73.7,50809,11.50%,5.5,0.3,13.00%,...,0.01,0.0,0.01,2.0,-0.05,0.06,1.2,3.8,0.0,2.4
2,108,"DUBUQUE, IOWA","DUBUQUE, IA",0.40%,30.0,58410,20.80%,2.0,0.1,16.50%,...,0.0,0.0,0.0,1.0,-0.02,0.02,1.7,3.8,0.0,2.1
3,357,"SOUTH BEND, INDIANA","SOUTH BEND, IN",1.30%,41.5,101928,12.60%,7.8,0.4,15.80%,...,0.02,0.01,0.02,3.0,-0.6,0.62,3.1,3.7,0.0,2.0
4,1205,"CRESTED BUTTE, COLORADO","CRESTED BUTTE, CO",41.10%,0.8,1385,51.00%,75.6,3.8,18.80%,...,0.36,0.52,-0.16,3.0,-0.6,0.44,3.6,3.6,0.0,2.6


In [6]:
# Splitting City column to two new columns - City Name and State
bike_split = bike_data_df
bike_split[['City Name','State']] = bike_split["City"].str.split(",", n=1, expand=True)
bike_split.head()

Unnamed: 0,Places_ID_2019,City,City_Alt,ACS Bike-to-Work Mode Share,Land Area,Population,ACS Target,ACS Normalized Score,ACS Ridership Points,SMS Recreation Riding,...,ACS Gap,ACS Tier,ACS_Target,Distance.1,ACS Points,Total Reach Points,Bonus,Total Points,City Name,State
0,363,"SPRINGDALE, ARKANSAS","SPRINGDALE, AR",0.30%,41.8,77252,12.60%,1.7,0.1,12.40%,...,0.01,1.0,-0.02,0.03,1.0,4.0,0.5,3.0,SPRINGDALE,ARKANSAS
1,116,"ENID, OKLAHOMA","ENID, OK",0.70%,73.7,50809,11.50%,5.5,0.3,13.00%,...,0.01,2.0,-0.05,0.06,1.2,3.8,0.0,2.4,ENID,OKLAHOMA
2,108,"DUBUQUE, IOWA","DUBUQUE, IA",0.40%,30.0,58410,20.80%,2.0,0.1,16.50%,...,0.0,1.0,-0.02,0.02,1.7,3.8,0.0,2.1,DUBUQUE,IOWA
3,357,"SOUTH BEND, INDIANA","SOUTH BEND, IN",1.30%,41.5,101928,12.60%,7.8,0.4,15.80%,...,0.02,3.0,-0.6,0.62,3.1,3.7,0.0,2.0,SOUTH BEND,INDIANA
4,1205,"CRESTED BUTTE, COLORADO","CRESTED BUTTE, CO",41.10%,0.8,1385,51.00%,75.6,3.8,18.80%,...,-0.16,3.0,-0.6,0.44,3.6,3.6,0.0,2.6,CRESTED BUTTE,COLORADO


In [7]:
# Re-organizing columns and keeping columns that are relevant to our research question
bike_df = bike_split[['City Name', 'State', 'Population', 'Total Points']]

bike_df.head(10)

Unnamed: 0,City Name,State,Population,Total Points
0,SPRINGDALE,ARKANSAS,77252,3.0
1,ENID,OKLAHOMA,50809,2.4
2,DUBUQUE,IOWA,58410,2.1
3,SOUTH BEND,INDIANA,101928,2.0
4,CRESTED BUTTE,COLORADO,1385,2.6
5,KALAMAZOO,MICHIGAN,75833,1.9
6,SPRINGFIELD,MISSOURI,165785,1.9
7,SAN JUAN CAPISTRANO,CALIFORNIA,35948,1.9
8,CHARLOTTESVILLE,VIRGINIA,46487,1.8
9,BENTONVILLE,ARKANSAS,44601,3.1


In [8]:
# Sort dataframe based on Total Points scored and Population size for Bike Friendliness
bike_total_points = bike_df.sort_values(["Total Points", "Population"], ascending=[False, False])
bike_total_points.head(10)

Unnamed: 0,City Name,State,Population,Total Points
69,BOULDER,COLORADO,106271,3.7
42,FORT COLLINS,COLORADO,159150,3.6
201,ARLINGTON,VIRGINIA,229534,3.4
244,EUGENE,OREGON,163135,3.4
407,MANHATTAN,NEW YORK,1653877,3.4
153,LAWRENCE,KANSAS,93954,3.3
372,PORTLAND,OREGON,630331,3.3
114,BROOKLYN,NEW YORK,2635121,3.3
297,MINNEAPOLIS,MINNESOTA,411452,3.2
204,MADISON,WISCONSIN,248856,3.2


In [9]:
obesity_raw_data = "data/obesity_among_adults.csv"

In [10]:
obesity_data = pd.read_csv(obesity_raw_data, encoding="ISO-8859-1")

obesity_data.head()

Unnamed: 0,Year,StateAbbr,StateDesc,CityName,GeographicLevel,DataSource,Category,UniqueID,Measure,Data_Value_Unit,...,Data_Value_Footnote,Low_Confidence_Limit,High_Confidence_Limit,PopulationCount,GeoLocation,CategoryID,MeasureId,CityFIPS,TractFIPS,Short_Question_Text
0,2016,CO,Colorado,Boulder,City,BRFSS,Unhealthy Behaviors,807850,Obesity among adults aged >=18 Years,%,...,,14.6,15.1,97385,"(40.0275510494, -105.25151776)",UNHBEH,OBESITY,807850,,Obesity
1,2016,CA,California,Fremont,City,BRFSS,Unhealthy Behaviors,626000,Obesity among adults aged >=18 Years,%,...,,15.6,15.8,214089,"(37.5278685405, -121.984121512)",UNHBEH,OBESITY,626000,,Obesity
2,2016,CA,California,Milpitas,City,BRFSS,Unhealthy Behaviors,647766,Obesity among adults aged >=18 Years,%,...,,15.7,16.2,66790,"(37.433869763, -121.892083025)",UNHBEH,OBESITY,647766,,Obesity
3,2016,CA,California,Irvine,City,BRFSS,Unhealthy Behaviors,636770,Obesity among adults aged >=18 Years,%,...,,16.2,16.5,212375,"(33.6780108904, -117.773633283)",UNHBEH,OBESITY,636770,,Obesity
4,2016,CA,California,San Francisco,City,BRFSS,Unhealthy Behaviors,667000,Obesity among adults aged >=18 Years,%,...,,17.1,17.2,805235,"(37.7559136611, -122.440987876)",UNHBEH,OBESITY,667000,,Obesity


In [11]:
obesity_data_rn = obesity_data.rename(columns={"StateDesc":"State", "CityName": "City Name", "Data_Value":"Obesity Rank", "PopulationCount" : "Population"})

obesity_data_rn.head()

Unnamed: 0,Year,StateAbbr,State,City Name,GeographicLevel,DataSource,Category,UniqueID,Measure,Data_Value_Unit,...,Data_Value_Footnote,Low_Confidence_Limit,High_Confidence_Limit,Population,GeoLocation,CategoryID,MeasureId,CityFIPS,TractFIPS,Short_Question_Text
0,2016,CO,Colorado,Boulder,City,BRFSS,Unhealthy Behaviors,807850,Obesity among adults aged >=18 Years,%,...,,14.6,15.1,97385,"(40.0275510494, -105.25151776)",UNHBEH,OBESITY,807850,,Obesity
1,2016,CA,California,Fremont,City,BRFSS,Unhealthy Behaviors,626000,Obesity among adults aged >=18 Years,%,...,,15.6,15.8,214089,"(37.5278685405, -121.984121512)",UNHBEH,OBESITY,626000,,Obesity
2,2016,CA,California,Milpitas,City,BRFSS,Unhealthy Behaviors,647766,Obesity among adults aged >=18 Years,%,...,,15.7,16.2,66790,"(37.433869763, -121.892083025)",UNHBEH,OBESITY,647766,,Obesity
3,2016,CA,California,Irvine,City,BRFSS,Unhealthy Behaviors,636770,Obesity among adults aged >=18 Years,%,...,,16.2,16.5,212375,"(33.6780108904, -117.773633283)",UNHBEH,OBESITY,636770,,Obesity
4,2016,CA,California,San Francisco,City,BRFSS,Unhealthy Behaviors,667000,Obesity among adults aged >=18 Years,%,...,,17.1,17.2,805235,"(37.7559136611, -122.440987876)",UNHBEH,OBESITY,667000,,Obesity


In [12]:
obesity_data_df = obesity_data_rn[['City Name', 'State', 'Population', 'Obesity Rank']]
obesity_data_df = obesity_data_df.sort_values(["Obesity Rank"], ascending=[True])
obesity_data_df.head()

Unnamed: 0,City Name,State,Population,Obesity Rank
0,Boulder,Colorado,97385,14.9
1,Fremont,California,214089,15.7
2,Milpitas,California,66790,16.0
3,Irvine,California,212375,16.4
4,San Francisco,California,805235,17.1


In [13]:
bike_total_points.head()

Unnamed: 0,City Name,State,Population,Total Points
69,BOULDER,COLORADO,106271,3.7
42,FORT COLLINS,COLORADO,159150,3.6
201,ARLINGTON,VIRGINIA,229534,3.4
244,EUGENE,OREGON,163135,3.4
407,MANHATTAN,NEW YORK,1653877,3.4


In [14]:
merge_data = pd.merge(bike_df, obesity_data_df, on="City Name", how="right")

merge_data.head()

Unnamed: 0,City Name,State_x,Population_x,Total Points,State_y,Population_y,Obesity Rank
0,Boulder,,,,Colorado,97385,14.9
1,Fremont,,,,California,214089,15.7
2,Milpitas,,,,California,66790,16.0
3,Irvine,,,,California,212375,16.4
4,San Francisco,,,,California,805235,17.1
