<a href="https://colab.research.google.com/github/georgejordan3/IBM_Capstone/blob/main/Bicycle_Cities.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Bicycle Cities - Evaluating the Best Cities in the US to Ride a Bicycle

George Jordan <br>
IBM Data Science Professional Certificate Capstone <br>
Last Updated: 2-26-21

<img src="https://www.confluence-denver.com/galleries/Features/2016/Issue_164/bike_lanes_04.jpg">

Credit: [Confluence Denver](https://www.confluence-denver.com/features/denver_bike_lanes_082416.aspx)


## Introduction 
I am a competitive cyclist and moving to Denver was partially motivated by my love for cycling. While I have a strong understanding of the riding experience here, I wonder how my experience compares to other cyclists in various parts of the US. I hope to tell a detailed story of these cities through the eyes of a cyclist and through the lens of data.

In this project, I will examine a list of the highest ranking bicycle cities in the country and apply my own analysis to gain further insight into the cities and their unique characteristics. I will use analytic tools as well as machine learning algorithms to see how these cities relate to each other and also gain insight into other supporting factors that have made these cities accessible via bicycle.

While this project is personally interesting to me, I believe that the insights here could be useful for a variety of business applications. Perhaps there is a company in one of these cities that is considering including provisions to support bike commuters. Maybe the existence of bicycle infrastructure is important to a company and an investigation into potential locations for an office would require such an analysis to foster that kind of culture in the workplace.

I believe that the bicycle is a very powerful tool to not only navigate a city but to also change it, through culture and infrastructure. I hope this project illuminates some of the impact that the bicycle has had on these cities.

## Data 
### PlacesForBikes City Ratings
- [PlacesForBikes City Ratings](https://cityratings.peopleforbikes.org/)

For this project, I will be first looking at the PlacesForBikes ratings to see their list of top cities for bicycles. They have made their data available for the public as well as an explanation to their methodology in ranking. It is from this data that I will select a number of cities to examine further, while also studying their decisions in ranking.

### Foursquare
- [Foursquare](https://foursquare.com/)

I will be using the Foursquare API to get an understanding of the selected cities from the rankings above. The most obvious query would be to find how many bike shops are within the city limits but also some other potential locations to be explored further. This data will be geospatially visualized using mapping libraries in Python.

### Strava
- [Strava](https://www.strava.com/about)

Strava is an app that tracks user's activity files from a variety of sports, including cycling. Strava currently has 55 million users, so there will be no shortage of data to examine in these popular cycling cities. By using the Strava API, I will be able to gain insight into the areas and density of the rides taken within the city.

### Zip Codes
- [Zip-Codes.com](https://www.zip-codes.com/)

In order to organize some this data, I will have to have quick access to a zipcode database for reference. I will use a webscraping tool in this project.

## Methodology 


### Using City Ranking

In [1]:
import pandas as pd
import folium
import requests
from pandas.io.json import json_normalize
from geopy.geocoders import Nominatim

For the City Ratings, I saved the files locally and then uploaded the files into the notebook.

In [2]:
# If I was using a file from some other hosting source, using !wget command would be apporpriate, but it seemed to be a waste of time to host the file online just to demonstrate the ability to upload a file.
pfbr = pd.read_excel("pfbr.xls")

With the data uploaded, getting an idea of the scope of the dataset is an important way to start structuring our analysis.

In [3]:
pfbr.shape

(567, 55)

We'll just be looking at the top three cities.



In [4]:
top_cities = pfbr.head(3)
top_cities

Unnamed: 0,Places_ID_2020,City,City_Alt,ACS Bike-to-Work Mode Share,Land Area,Population,ACS Target,ACS Normalized Score,ACS Ridership Points,SMS Recreation Riding,SMS Points,Community Survey Ridership Score,Total Ridership Points,Average Fatalities All Mode,All Mode Fatality Rate,All Mode Fatality Points,Average Fatalities Bike,Bike Fatality Rate,Bike Fatality Points,All Mode Injuries,All Mode Injury Rate,All Mode Injury Points,Bike Injuries,Bike Injury Rate,Bike Injury Points,All mode safety points,Bike Safety Points,Community Survey Safety Score,Total Safety Points,City Snapshot Points,Community Survey Acceleration Score,Total Acceleration Points,BNA,BNA Points,Community Survey Network Score,Total Network Points,Percent Communities of Concern,Number Underserved Communities,Average BNA,BNA Underserved Communities,BNA Gap,BNA Tier,BNA Target,Distance,BNA Points.1,ACS Bike-to-Work Mode Share Men,ACS Bike-to-Work Mode Share Women,ACS Gap,ACS Tier,ACS_Target,Distance.1,ACS Points,Total Reach Points,Bonus,Points with bonus
0,487,"SAN LUIS OBISPO, CALIFORNIA","SAN LUIS OBISPO, CA",0.093,13.1,47160,0.263,28.6,1.4,0.134,2.2,2.8,2.028725,1.2,0.3,4.0,0.2,1.1,4.0,,,,,,,2.0,2.0,2.7,2.13888,4.7,3.2,4.35786,61.6,4,3.2,3.83402,5.9,2,49.6,55.2,-5.6,,,,,0.1,0.0,0.1,3.0,-0.6,0.7,2.9,2.864338,0.5,3.544765
1,215,"MADISON, WISCONSIN","MADISON, WI",0.061,78.8,252086,0.115,52.4,2.6,0.182,3.0,3.0,2.863614,8.4,0.3,4.0,0.4,0.6,4.0,84.5,3.4,2.0,8.0,11.7,4.0,3.0,4.0,3.1,3.4208,3.2,3.5,3.25176,49.8,3,3.4,3.08294,11.5,22,47.5,46.2,1.3,1.0,-25.0,26.3,2.2,0.1,0.0,0.0,3.0,-0.6,0.6,3.0,2.431013,0.5,3.510025
2,337,"SANTA BARBARA, CALIFORNIA","SANTA BARBARA, CA",0.052,19.5,91325,0.263,19.3,1.0,0.157,2.6,2.8,1.990605,3.4,0.4,4.0,0.0,0.0,5.0,34.0,3.7,2.0,10.0,48.5,2.0,3.0,3.5,2.7,3.13792,3.8,3.4,3.7276,38.2,2,3.0,2.207,16.5,16,33.9,36.6,-2.8,1.0,-25.0,22.2,2.7,0.1,0.0,0.1,3.0,-0.6,0.7,3.0,2.733556,0.5,3.259336


I will use the addresses of the city hall's for each city to serve as the geospatial center for which we can find the coordinates.

In [5]:
san_luis = '990 Palm St, San Luis Obispo, CA 93401'
madison = '2120 Fish Hatchery Rd, Madison, WI 53713'
santa_barbara = '735 Anacapa St, Santa Barbara, CA 93101'

### Using Foursquare

Using my personalized token, I accessed Foursquare's API making queries into what kind of venues were close to the city halls of these cities. In doing so, I hoped to see if there was any kind of trend that could found in the areas that have high bike riding scores.

In [6]:
CLIENT_ID = 'AA5IFTXJZJCQ023SACSUMAGZ11WYQ1TWHRRMF0JLQBJAY3PC'
CLIENT_SECRET = 'FGJSLALMRKIDXQKL4HT5KQJSEXUDFOETQSC3B04UHGMY5ZNP'
ACCESS_TOKEN = 'ALDECKAYHJJ52RBIX0CCOHA3BSSEU5VPXGA1PTEVRXJTX0GF'
VERSION = '20210228' #Date of query
LIMIT = 30
CODE = 'J5K1MCNFFHAY4YP3LPZHQOY4D400AKRCZOE3R1CH5O4HBGT2#_=_'
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: AA5IFTXJZJCQ023SACSUMAGZ11WYQ1TWHRRMF0JLQBJAY3PC
CLIENT_SECRET:FGJSLALMRKIDXQKL4HT5KQJSEXUDFOETQSC3B04UHGMY5ZNP


#### San Luis Obispo

In [7]:
address = san_luis

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

35.28263142857143 -120.66234418367347


In [8]:
LIMIT = 500
radius = 2000
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=AA5IFTXJZJCQ023SACSUMAGZ11WYQ1TWHRRMF0JLQBJAY3PC&client_secret=FGJSLALMRKIDXQKL4HT5KQJSEXUDFOETQSC3B04UHGMY5ZNP&v=20210228&ll=35.28263142857143,-120.66234418367347&radius=2000&limit=500'

In [9]:
results = requests.get(url).json()

In [10]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [11]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Firestone Grill,BBQ Joint,35.281176,-120.66042
1,Louisa's Place,Breakfast Spot,35.280863,-120.66168
2,Scout Coffee Co.,Coffee Shop,35.278963,-120.662836
3,San Luis Obispo Farmers' Market,Farmers Market,35.27939,-120.663637
4,BarrelHouse Brewing SLO - Taproom,Brewery,35.280131,-120.663279


In [12]:
sl_venues = nearby_venues
sl_venues

Unnamed: 0,name,categories,lat,lng
0,Firestone Grill,BBQ Joint,35.281176,-120.660420
1,Louisa's Place,Breakfast Spot,35.280863,-120.661680
2,Scout Coffee Co.,Coffee Shop,35.278963,-120.662836
3,San Luis Obispo Farmers' Market,Farmers Market,35.279390,-120.663637
4,BarrelHouse Brewing SLO - Taproom,Brewery,35.280131,-120.663279
...,...,...,...,...
95,Thai-riffic,Thai Restaurant,35.270007,-120.670346
96,Mustang Lanes,Bowling Alley,35.300171,-120.658723
97,Mama's Meatball,Italian Restaurant,35.277951,-120.666726
98,San Luis Fish & BBQ,BBQ Joint,35.276328,-120.667009


#### Madison

In [13]:
address = madison

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

43.04292418483257 -89.40407878799554


In [14]:
LIMIT = 500
radius = 2000
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=AA5IFTXJZJCQ023SACSUMAGZ11WYQ1TWHRRMF0JLQBJAY3PC&client_secret=FGJSLALMRKIDXQKL4HT5KQJSEXUDFOETQSC3B04UHGMY5ZNP&v=20210228&ll=43.04292418483257,-89.40407878799554&radius=2000&limit=500'

In [15]:
results = requests.get(url).json()

In [16]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [17]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,KWIK TRIP #531,Convenience Store,43.03683,-89.40493
1,El Pastor Restaurant And Bar,Mexican Restaurant,43.043956,-89.394384
2,Lane's Bakery & Coffee,Bakery,43.040666,-89.395176
3,Culver's,Fast Food Restaurant,43.03603,-89.415063
4,Taquería Guadalajara,Mexican Restaurant,43.055595,-89.397383


In [19]:
m_venues = nearby_venues
m_venues

Unnamed: 0,name,categories,lat,lng
0,KWIK TRIP #531,Convenience Store,43.036830,-89.404930
1,El Pastor Restaurant And Bar,Mexican Restaurant,43.043956,-89.394384
2,Lane's Bakery & Coffee,Bakery,43.040666,-89.395176
3,Culver's,Fast Food Restaurant,43.036030,-89.415063
4,Taquería Guadalajara,Mexican Restaurant,43.055595,-89.397383
...,...,...,...,...
83,Delta Beer Lab,Brewery,43.037258,-89.382469
84,Madison Professional Dance Center,Dance Studio,43.027063,-89.395699
85,Big Dane Collective,Gym,43.026362,-89.397767
86,Bernie's Beach,Park,43.056980,-89.389277


#### Santa Barbara

In [20]:
address = santa_barbara

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

34.420073714285714 -119.6975069387755


In [21]:
LIMIT = 500
radius = 2000
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=AA5IFTXJZJCQ023SACSUMAGZ11WYQ1TWHRRMF0JLQBJAY3PC&client_secret=FGJSLALMRKIDXQKL4HT5KQJSEXUDFOETQSC3B04UHGMY5ZNP&v=20210228&ll=34.420073714285714,-119.6975069387755&radius=2000&limit=500'

In [22]:
results = requests.get(url).json()

In [23]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [24]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,McConnell's Fine Ice Creams,Ice Cream Shop,34.419388,-119.698888
1,Dune Coffee Roasters,Coffee Shop,34.418816,-119.695372
2,Handlebar Coffee,Coffee Shop,34.422274,-119.698528
3,Santa Barbara Certified Farmers Market,Farmers Market,34.419974,-119.69523
4,Blenders in the Grass,Juice Bar,34.419202,-119.698733


In [25]:
sb_venues = nearby_venues
sb_venues

Unnamed: 0,name,categories,lat,lng
0,McConnell's Fine Ice Creams,Ice Cream Shop,34.419388,-119.698888
1,Dune Coffee Roasters,Coffee Shop,34.418816,-119.695372
2,Handlebar Coffee,Coffee Shop,34.422274,-119.698528
3,Santa Barbara Certified Farmers Market,Farmers Market,34.419974,-119.695230
4,Blenders in the Grass,Juice Bar,34.419202,-119.698733
...,...,...,...,...
95,Renaud's Patisserie & Bistro,Bakery,34.425319,-119.705944
96,Wine+Beer,Wine Bar,34.423530,-119.706908
97,South Coast Deli,Deli / Bodega,34.425104,-119.709248
98,Crushcakes Cupcakery & Crushcafe,Cupcake Shop,34.425678,-119.705349


### Plotting

In [26]:
sl_map = folium.Map(location=[35.28263142857143, -120.66234418367347], zoom_start=12)
sl_map

In [27]:
m_map = folium.Map(location=[43.04292418483257, -89.40407878799554], zoom_start=12)
m_map

In [28]:
sb_map = folium.Map(location=[34.420073714285714, -119.6975069387755], zoom_start=12)
sb_map

### Data Wrangling

### Clustering

## Results

## Discussion

## Conclusion