# Where should I live this year?

## Introduction
I am a bit of a nomad.  I have moved from place to place roughly every 6 months for the last 10 years.  I am a "professional" (I use the term loosely as I only recieve parts & gear, but no paycheck) mountain biker.  My life more or less revolves around the sport.  As such, the majority of the places I have moved to have been because there is an abundance of excellent mountain bike trails in the area.  There are two major areas that I have spent more time at than most others.  They are Southwest Utah, and Whistler, British Colombia.  Both places are incredible.  Southern Utah is incredible in the winter due to the mild weather and lack of snow, while Whistler is the place to be in the summer due to the Whistler Bike Park (it's by far the largest bike park in the world).

Unfortunately, the world is currently in a complete shutdown due to the COVID-19 pandemic.  As much as I would love to return to Whistler in the summer, I am unsure if the bikepark will be open, so I would like to explore other options in British Colombia.

## Data

I hope to use Data Science in order to help me find a solution to the problem, "Where should I live this year?".  I will be leveraging the Foursquare API, as well as the Trailforks API in order to gather data that I can use for my analysis.  Foursquare has a wonderful database of location data, and Trailforks has a database of trail data.  I will try to pin point a few possible neighborhoods that I would like to live at based off of location data of avaiable nearby amenities, nearby hospitals (mountainbiking is quite dangerous), distance to major airports (I love to travel), and of course, how many high quailty trails are in the area.

In [327]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder
import folium
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
import mechanize
import http.cookiejar
import time
import random
import requests

## Web Scraping
I tried getting access to trailforks API, but they unfortunately did not respond to my email.  I instead built a webscraper to at least get the most important data.  I set my search query to only return the number of Black Diamond, Double Black Diamond, and Pro lines.  These are the types of trails that I enjoy the most.  Some areas may have many many more trails, but if they are only beginner style trails, then I won't be very interested!

In [None]:
# Browser
br = mechanize.Browser()

# Cookie Jar
cj = http.cookiejar.LWPCookieJar()
br.set_cookiejar(cj)

# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

br.addheaders = [('User-agent', 'Chrome')]

br.open('https://www.trailforks.com/login/')

br.select_form(nr=1)

# User credentials
br.form['username-login-loginlen'] = 'JessieJ'
br.form['password-password-lt200'] = '@kYg9@-f7vfqJJJ'

# Login
br.submit()

In [79]:
titles = []
trails = []
site_nums = []

In [None]:
#total of 37277 pages
for i in range(1,37278):
    response = br.open('https://www.trailforks.com/tools/trailspreadsheet/?difficulty=5,6,8&cols=title,difficulty,rating,total_checkins,global_rank,distance,alias&rid={}'.format(i))
    html = response.read()
    response.close()
    page_soup = soup(html, 'lxml')
    num_trails = page_soup.find(id='contentTotal')
    if num_trails: 
        for string in num_trails.stripped_strings:
            num_trails = (string)
        title = page_soup.find('h3')
        for string in title.stripped_strings:
            title = string[:-13] #removing " Preview Data" from string
        titles.append(title)
        trails.append(num_trails)
        site_nums.append(i)
    else:
        continue
    wait = random.random()

    time.sleep(wait)

Saving all data from scraper.  "trails" (amount of trails), "nums" (the directory num for trailforks.com), and "titles".

In [92]:
df = pd.DataFrame(titles, columns=["colummn"])
df.to_csv('titles.csv', index=False)
df1 = pd.DataFrame(trails, columns=["colummn"])
df1.to_csv('trails.csv', index=False)
df2 = pd.DataFrame(site_nums, columns=["colummn"])
df2.to_csv('site_nums.csv', index=False)

## More scraping
Postal codes for British Colombia

In [328]:
link ='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_V'

In [329]:
response = urlopen(link)
html = response.read()
response.close()

In [330]:
page_soup = soup(html, 'lxml')

In [331]:
tds = page_soup.find_all('td')

In [332]:
tds[1]

<td valign="top" width="11.1%"><b>V2A</b><br/><span style="font-size: smaller; line-height: 125%;"><a href="/wiki/Penticton" title="Penticton">Penticton</a></span>
</td>

In [377]:
postal_codes = []
city_names = []

In [378]:
for i in range(0,198):
    count = 0
    for string in tds[i].stripped_strings:
        if count == 0:
            postal_codes.append(string)
            count = 1
        elif count == 1:
            city_names.append(string)
            count = 2
        else:
            continue

Sanity check

In [379]:
len(postal_codes)

198

In [380]:
len(city_names)

198

## Loading Data

In [381]:
titles = pd.read_csv('titles.csv')
trails = pd.read_csv('trails.csv')

In [None]:
titles_list = titles.values.tolist()
titles_list = [item for sublist in titles_list for item in sublist]
titles_list

In [None]:
trails_list = trails.values.tolist()
trails_list = [item for sublist in trails_list for item in sublist]
trails_list

Cleaning up the city names from the list.

In [384]:
for i in range(0,len(city_names)):
    string = city_names[i]
    num = string.find(' (')
    if num > 1:
        city_names[i] = string[:num]
    else:
        continue

## Combing Data
Matching the city names from BC postal codes to the search queries from Trailforks.com and creating a dataframe.

In [385]:
matched_cities = []
num_trails = []
postal_code = []

In [386]:
for i in range(0,len(city_names)):
    if city_names[i] in titles_list:
        matched_cities.append(city_names[i])
        index = titles_list.index(city_names[i])
        num_trails.append(trails_list[index])
        postal_code.append(postal_codes[i])
    else:
        #print("** NOT IN **", city_names[i])
        continue

Some popular mountain biking towns got left out, so I'm adding them manually.

In [387]:
matched_cities.append('Pemberton')
num_trails.append(77)
postal_code.append('V0N 2L0')
matched_cities.append('Gibsons')
num_trails.append(53)
postal_code.append('V0N 1V0')

In [388]:
dict = {'City': matched_cities, 'Number_of_Trails': num_trails, 'Postal_Code': postal_code}
df = pd.DataFrame(dict)

In [389]:
df['Number_of_Trails'] = df['Number_of_Trails'].astype(str).astype(int)

Dropping rows with 0 trails and also duplicates.

In [390]:
indexNames = df[ df['Number_of_Trails'] == 0 ].index

In [391]:
df.drop(indexNames, inplace=True)

In [392]:
df = df.reset_index(drop=True)

In [393]:
df = df = df.drop_duplicates(subset='City', keep='first')

In [394]:
df = df.reset_index(drop=True)

In [None]:
df

## Geospatial Coordinates

In [396]:
latitudes = []
longitudes = []

In [397]:
for i in range(0,len(df)):
    g=geocoder.arcgis('{}, British Colombia, Canada'.format(df['Postal_Code'][i]))
    lat_lng_coords = g.latlng
    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]
    latitudes.append(latitude)
    longitudes.append(longitude)

In [398]:
df['Latitude'] = latitudes
df['Longitude'] = longitudes

Geocoder wasn't entirely accurate, so I ended up fixing a few of the cities by hand. It was fun.

In [402]:
df.at[6, 'Latitude'] = 50.268479
df.at[6, 'Longitude'] = -119.262568
df.at[7, 'Latitude'] = 50.678385
df.at[7, 'Longitude'] = -120.329835
df.at[8, 'Latitude'] = 49.700790
df.at[8, 'Longitude'] = -123.150588
df.at[13, 'Latitude'] = 49.285005
df.at[13, 'Longitude'] = -122.792470
df.at[14, 'Latitude'] = 50.115669
df.at[14, 'Longitude'] = -122.956799
df.at[15, 'Latitude'] = 52.140202
df.at[15, 'Longitude'] = -122.142379
df.at[16, 'Latitude'] = 49.049329
df.at[16, 'Longitude'] = -122.302905
df.at[21, 'Latitude'] = 50.031340
df.at[21, 'Longitude'] = -125.270810
df.at[25, 'Latitude'] = 50.109933
df.at[25, 'Longitude'] = -120.788705
df.at[26, 'Latitude'] = 53.915608
df.at[26, 'Longitude'] = -122.753380
df.at[28, 'Latitude'] = 49.491569
df.at[28, 'Longitude'] = -117.292110
df.at[29, 'Latitude'] = 48.780387
df.at[29, 'Longitude'] = -123.700198
df.at[32, 'Latitude'] = 49.879701
df.at[32, 'Longitude'] = -119.476071
df.at[33, 'Latitude'] = 49.154957
df.at[33, 'Longitude'] = -121.954321
df.at[33, 'Latitude'] = 49.154957
df.at[33, 'Longitude'] = -121.954321
df.at[42, 'Latitude'] = 48.378334
df.at[42, 'Longitude'] = -123.734941
df.at[46, 'Latitude'] = 49.400077
df.at[46, 'Longitude'] = -123.516955

In [524]:
df.head()

Unnamed: 0,City,Number_of_Trails,Postal_Code,Latitude,Longitude
0,Kimberley,15,V1A,49.691079,-115.952463
1,Penticton,50,V2A,49.49101,-119.574217
2,Surrey,14,V4A,49.032073,-122.821241
3,Burnaby,5,V5A,49.266244,-122.931096
4,Powell River,24,V8A,49.85739,-124.46747


In [310]:
g=geocoder.arcgis('British Colombia, Canada')
lat_lng_coords = g.latlng
latitude = lat_lng_coords[0]
longitude = lat_lng_coords[1]
print('The geographic location of British Colombia, Canada is: Lat: {}, Lon: {}.'.format(latitude,longitude))

The geographic location of British Colombia, Canada is: Lat: 49.049279947775744, Lon: -122.29502008050079.


## Visualizing the data
I've scaled the map markers to show which towns have more or less trails.

In [404]:
# create map of BC using latitude and longitude values
map_bc = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, City, Number_of_Trails in zip(df['Latitude'], df['Longitude'], df['City'], df['Number_of_Trails']):
    label = '{}, {}'.format(City, Number_of_Trails)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        #Scaling the radius to the number of trails
        radius=np.sqrt(Number_of_Trails),
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bc)  
    
map_bc

## Foursquare API

In [361]:
CLIENT_ID = ''
CLIENT_SECRET = ''
VERSION = '20180605' # Foursquare API version

Checking the first city.

In [191]:
city_latitude = df.loc[0, 'Latitude'] # City latitude value
city_longitude = df.loc[0, 'Longitude'] # City longitude value

city_name = df.loc[0, 'City'] # City name

print('Latitude and longitude values of {} are {}, {}.'.format(city_name, 
                                                               city_latitude, 
                                                               city_longitude))

Latitude and longitude values of Kimberley are 49.69107910900004, -115.95246287899994.


In [362]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 3000 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    city_latitude, 
    city_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=EWPDNM354JQYFXVCVRQGM0ESHNAJPFWHQYVCNYTTFLMAY05Q&client_secret=KBC0ZQ43YYBGPU00T0WMGAUI3O3TJ2LP4TF0SBFGA5BLY1CE&v=20180605&ll=49.69107910900004,-115.95246287899994&radius=3000&limit=100'

In [None]:
results = requests.get(url).json()
results

In [364]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [365]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Pedal & Tap,American Restaurant,49.685895,-115.983418
1,Shoppers Drug Mart,Pharmacy,49.68439,-115.981893
2,A&W,Fast Food Restaurant,49.684436,-115.981015
3,The Bean Tree Cafe,Café,49.685532,-115.982466
4,Platzl,Plaza,49.685953,-115.9836


In [366]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

6 venues were returned by Foursquare.


Now lets get the results for all of the cities!

In [531]:
def getNearbyVenues(names, latitudes, longitudes, radius=2000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City', 
                  'City Latitude', 
                  'City Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
bc_venues = getNearbyVenues(names=df['City'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

In [533]:
bc_venues.groupby('City').count()

Unnamed: 0_level_0,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abbotsford,78,78,78,78,78,78
Burnaby,36,36,36,36,36,36
Campbell River,26,26,26,26,26,26
Chilliwack,78,78,78,78,78,78
Comox,4,4,4,4,4,4
Coquitlam,100,100,100,100,100,100
Cranbrook,3,3,3,3,3,3
Delta,29,29,29,29,29,29
Duncan,38,38,38,38,38,38
Fort St. John,28,28,28,28,28,28


In [534]:
bc_venues.shape

(1195, 7)

In [535]:
bc_venues.head(10)

Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Kimberley,49.691079,-115.952463,BigHorn Electric,49.689229,-115.94058,Construction & Landscaping
1,Penticton,49.49101,-119.574217,Theo's Restaurant,49.49315,-119.589925,Restaurant
2,Penticton,49.49101,-119.574217,Earls,49.478106,-119.583512,Restaurant
3,Penticton,49.49101,-119.574217,Wild Scallion,49.501219,-119.59244,Vegetarian / Vegan Restaurant
4,Penticton,49.49101,-119.574217,The Copper Mug Pub,49.485062,-119.587979,Pub
5,Penticton,49.49101,-119.574217,The Pasta Factory,49.499574,-119.593845,Italian Restaurant
6,Penticton,49.49101,-119.574217,Penticton Farmer's Market,49.500265,-119.593121,Farmers Market
7,Penticton,49.49101,-119.574217,BRODO KITCHEN,49.495968,-119.591185,Soup Place
8,Penticton,49.49101,-119.574217,Cannery Brewing Co.,49.482953,-119.594019,Brewery
9,Penticton,49.49101,-119.574217,The Bench Market,49.502949,-119.586857,Coffee Shop


Adding the trails from Trailforks.com as 'venues'.

In [None]:
city_list = []
venue_list = []
for i in range(0, df.shape[0]):
    city = df['City'][i]
    iterations = df['Number_of_Trails'][i]
    for i in range(0,iterations):
        #y = pd.DataFrame([[city,'Trail']], columns=('City','Venue Category'))
        city_list.append(city)
        venue_list.append('Trail')
dict = {'City': city_list, 'Venue Category': venue_list}
df2 = pd.DataFrame(dict)
bc_venues_with_trails = bc_venues.append(df2, ignore_index=True)
bc_venues_with_trails

In [537]:
print('There are {} uniques categories.'.format(len(bc_venues_with_trails['Venue Category'].unique())))

There are 187 uniques categories.


## Preparing Data for Clustering
Converting categorical data into integers so we can use them for clustering.

In [541]:
# one hot encoding
bc_onehot = pd.get_dummies(bc_venues_with_trails[['Venue Category']], prefix="", prefix_sep="")

# add city column back to dataframe
bc_onehot['City'] = bc_venues_with_trails['City'] 

# move city column to the first column
fixed_columns = [bc_onehot.columns[-1]] + list(bc_onehot.columns[:-1])
bc_onehot = bc_onehot[fixed_columns]

bc_onehot.head(5)

Unnamed: 0,Yoga Studio,Airport,Airport Terminal,American Restaurant,Apres Ski Bar,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,...,Video Game Store,Video Store,Vietnamese Restaurant,Vineyard,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Winery,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [542]:
bc_grouped = bc_onehot.groupby('City').mean().reset_index()
bc_grouped.head()

Unnamed: 0,City,Yoga Studio,Airport,Airport Terminal,American Restaurant,Apres Ski Bar,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Video Game Store,Video Store,Vietnamese Restaurant,Vineyard,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Winery,Women's Store
0,Abbotsford,0.0,0.0,0.0,0.0,0.0,0.008696,0.008696,0.0,0.0,...,0.0,0.0,0.008696,0.0,0.0,0.0,0.0,0.0,0.0,0.008696
1,Burnaby,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0
2,Campbell River,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Castlegar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Chilliwack,0.0,0.008,0.0,0.008,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.008,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Checking the top 5 venues from each city.  Obviously it will be skewed towards trails, but that's OK because it's by far the most important thing to me.

In [None]:
num_top_venues = 5

for city in bc_grouped['City']:
    print("----"+city+"----")
    temp = bc_grouped[bc_grouped['City'] == city].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

In [544]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [545]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['City']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
cities_venues_sorted = pd.DataFrame(columns=columns)
cities_venues_sorted['City'] = bc_grouped['City']

for ind in np.arange(bc_grouped.shape[0]):
    cities_venues_sorted.iloc[ind, 1:] = return_most_common_venues(bc_grouped.iloc[ind, :], num_top_venues)

cities_venues_sorted.head()

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Abbotsford,Trail,Coffee Shop,Fast Food Restaurant,Sandwich Place,Restaurant
1,Burnaby,Trail,Burger Joint,Sandwich Place,Café,Park
2,Campbell River,Trail,Fast Food Restaurant,Coffee Shop,Grocery Store,Pharmacy
3,Castlegar,Trail,Fishing Spot,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant
4,Chilliwack,Trail,Fast Food Restaurant,Coffee Shop,Restaurant,Sandwich Place


# Clustering

In [546]:
# set number of clusters
kclusters = 3

bc_grouped_clustering = bc_grouped.drop('City', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bc_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 2, 1, 0, 0, 0, 1, 1, 0])

In [547]:
labels = kmeans.labels_.astype('int64')
labels.shape

(47,)

In [548]:
# add clustering labels
cities_venues_sorted.insert(0, 'Cluster Labels', labels.astype('int64'))

bc_merged = df

# merge bc_grouped with bc_data to add latitude/longitude for each city
bc_merged = bc_merged.join(cities_venues_sorted.set_index('City'), on='City')

In [549]:
bc_merged.head()

Unnamed: 0,City,Number_of_Trails,Postal_Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Kimberley,15,V1A,49.691079,-115.952463,1,Trail,Construction & Landscaping,Convenience Store,Cosmetics Shop,Fish & Chips Shop
1,Penticton,50,V2A,49.49101,-119.574217,2,Trail,Coffee Shop,Fast Food Restaurant,Grocery Store,Pizza Place
2,Surrey,14,V4A,49.032073,-122.821241,0,Trail,Coffee Shop,Café,Pizza Place,Japanese Restaurant
3,Burnaby,5,V5A,49.266244,-122.931096,0,Trail,Burger Joint,Sandwich Place,Café,Park
4,Powell River,24,V8A,49.85739,-124.46747,1,Trail,Fishing Spot,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant


In [550]:
bc_merged = bc_merged.dropna(axis=0)
bc_merged['Cluster Labels'] = bc_merged['Cluster Labels'].astype(int)

In [551]:
bc_merged['Cluster Labels'][0:10]

0    1
1    2
2    0
3    0
4    1
5    2
6    0
7    0
8    2
9    1
Name: Cluster Labels, dtype: int64

## Visualize Clusters

In [552]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=7)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster, nt in zip(bc_merged['Latitude'], bc_merged['Longitude'], bc_merged['City'], bc_merged['Cluster Labels'],bc_merged['Number_of_Trails']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster) +' Trails: ' + str(nt), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=np.sqrt(nt),
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Results

### Big city (purple)
Many of these locations seem to be in areas with bigger cities.

In [562]:
bc_merged.loc[bc_merged['Cluster Labels'] == 0, bc_merged.columns[[0]+[1] + list(range(5, bc_merged.shape[1]))]]

Unnamed: 0,City,Number_of_Trails,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,Surrey,14,0,Trail,Coffee Shop,Café,Pizza Place,Japanese Restaurant
3,Burnaby,5,0,Trail,Burger Joint,Sandwich Place,Café,Park
6,Vernon,59,0,Trail,Coffee Shop,Fast Food Restaurant,Grocery Store,Pharmacy
7,Kamloops,40,0,Trail,Hotel,Restaurant,Coffee Shop,Bank
10,Delta,3,0,Trail,Discount Store,Breakfast Spot,Fast Food Restaurant,Pub
13,Coquitlam,60,0,Trail,Coffee Shop,Sushi Restaurant,Vietnamese Restaurant,Sandwich Place
16,Abbotsford,37,0,Trail,Coffee Shop,Fast Food Restaurant,Sandwich Place,Restaurant
19,Ladysmith,4,0,Trail,Construction & Landscaping,Harbor / Marina,Fast Food Restaurant,Hotel
20,Port Moody,3,0,Trail,Beach,Lake,Park,Dog Run
22,Fort St. John,5,0,Trail,Coffee Shop,Fast Food Restaurant,Pizza Place,Bank


### Fishing! (green)
It seems this cluster is dominated by areas with lot's of access to fishing.

In [563]:
bc_merged.loc[bc_merged['Cluster Labels'] == 1, bc_merged.columns[[0]+[1] + list(range(5, bc_merged.shape[1]))]]

Unnamed: 0,City,Number_of_Trails,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Kimberley,15,1,Trail,Construction & Landscaping,Convenience Store,Cosmetics Shop,Fish & Chips Shop
4,Powell River,24,1,Trail,Fishing Spot,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant
9,Cranbrook,32,1,Trail,Construction & Landscaping,Home Service,Furniture / Home Store,Dog Run
12,Salmon Arm,11,1,Trail,Fishing Spot,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant
17,North Vancouver,77,1,Trail,Café,Pizza Place,Taco Place,Gluten-free Restaurant
18,Terrace,16,1,Trail,Fishing Spot,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant
23,Quesnel,7,1,Trail,Fishing Spot,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant
24,Courtenay,1,1,Trail,Fishing Spot,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant
31,Castlegar,6,1,Trail,Fishing Spot,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant
35,Trail,2,1,Trail,Fishing Spot,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant


### Rural Areas (red)
This cluster seems to be comprised of more rural cities/towns (with the big exception of Victoria).  I've spent a good deal of time in Whistler, Squamish, and Nelson (all of which are in this cluster), so that leads me to believe there might be other places from this cluster that I might like.  In fact, before starting this project, I was looking into Gibsons, and Duncan as potential places to live.

In [564]:
bc_merged.loc[bc_merged['Cluster Labels'] == 2, bc_merged.columns[[0]+[1] + list(range(5, bc_merged.shape[1]))]]

Unnamed: 0,City,Number_of_Trails,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,Penticton,50,2,Trail,Coffee Shop,Fast Food Restaurant,Grocery Store,Pizza Place
5,Victoria,72,2,Trail,Coffee Shop,Restaurant,Grocery Store,Department Store
8,Squamish,116,2,Trail,Coffee Shop,Hotel,Scenic Lookout,Restaurant
11,Kitimat,3,2,Trail,Construction & Landscaping,Hotel,Beach,Women's Store
14,Whistler,216,2,Trail,Hotel,Outdoors & Recreation,Park,Plaza
15,Williams Lake,48,2,Trail,Convenience Store,Coffee Shop,Paper / Office Supplies Store,Liquor Store
21,Campbell River,41,2,Trail,Fast Food Restaurant,Coffee Shop,Grocery Store,Pharmacy
25,Merritt,24,2,Trail,Café,Convenience Store,Inn,Fast Food Restaurant
27,Qualicum Beach,3,2,Trail,Food Truck,Construction & Landscaping,Cosmetics Shop,Dry Cleaner
28,Nelson,46,2,Trail,Coffee Shop,Restaurant,Pub,Fast Food Restaurant
