## Applied Data Science Capstone Project
### Week 3 - Segmenting and Clustering Neighborhoods in Toronto
### Kevin Spradlin
### July 12 - 15, 2021

## Part 1 - Webscrape Postal Codes, Boroughs, and Neighborhoods

In [1]:
!pip install bs4
!pip install html5lib
#!pip install requests



In [2]:
from bs4 import BeautifulSoup # this module helps in web scrapping.
import requests  # this module helps us to download a web page
import pandas as pd

### Retrieve the page with the Toronto postal code, borough, and neighborhood table

In [3]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

data  = requests.get(url).text

#soup = BeautifulSoup(data,"html5lib")
soup = BeautifulSoup(data,"html.parser")


In [4]:
#print(soup.prettify())

#find all html tables in the web page
tables = soup.find_all('table') # in html table is represented by the tag <table>

len(tables)


3

### Find the table with the postal codes.  It's the one with 'Not assigned' in some of its cells.

In [5]:
# find the table with the postal codes
for index,table in enumerate(tables):
    if ("Not assigned" in str(table)):
        table_index = index
print(table_index)

#print(tables[table_index].prettify())


0


### Extract the table's data into a list, then copy it into a _pandas_ dataframe.  You need to simplify some boroughs' names.

In [6]:
# extract the postal codes, boroughs, and neighborhoods into a list
table_contents = []

for row in tables[table_index].findAll("td"):
  cell = {}

  if row.span.text == "Not assigned":
    pass
  else:
    cell["PostalCode"] = row.p.text[:3]
    cell["Borough"] = (row.span.text).split('(')[0]
    cell["Neighborhood"] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
    table_contents.append(cell)

#print(table_contents)


In [7]:
# convert the list with postal codes, boroughs, and neighborhoods into a pandas dataframe
df = pd.DataFrame(table_contents)

#df.Borough.unique()

df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                     'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                     'EtobicokeNorthwest':'Etobicoke Northwest',
                                     'East YorkEast Toronto':'East York/East Toronto',
                                     'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

#df.Borough.unique()


In [8]:
# print the dimensions of the dataframe
df.shape

(103, 3)

### The dataframe has 103 rows (postal codes for boroughs and neighborhoods).

## Part 2 - Get the latitude and longitude of each postal code

In [9]:
!pip install geocoder
import geocoder




### I tried to use the geocoder module, but after 2 minutes it was still running.  Next, I tried using geocoder to get the coordinates for one postal code.  It ran for 2 minutes without getting a result.  So, I decided to just use the csv file (skip down three cells).

In [11]:
# loop through the rows in the dataframe, putting the combined results into a new dataframe
full_table_contents = []


for index, row in df.iterrows():
  #print(row['PostalCode'])

  cell = {}
  cell["PostalCode"] = row["PostalCode"]  
  cell["Borough"] = row["Borough"]  
  cell["Neighborhood"] = row["Neighborhood"]  


  lat_long_coords = None

  # keep querying until you get coordinates for the postal code
  while lat_long_coords is None:
    geo_query = geocoder.google(f"{row['PostalCode']:s}, Toronto, Ontario")
    lat_long_coords = geo_query.latlng


  cell["Latitude"] = lat_long_coords[0]
  cell["Longitude"] =lat_long_coords[1]

  full_table_contents.append(cell)


full_df = pd.DataFrame(full_table_contents)

full_df.head()


In [11]:
# note - tested geocoder with one postal code.  I didn't get a response after 2 minutes.
lat_long_coords = None

postal_code = "M5A"

while lat_long_coords is None:
  geo_query = geocoder.google("{postal_code:s}, Toronto, Ontario")
  lat_long_coords = geo_query.latlng


print(lat_long_coords)
  

### Read the data in the csv file into a dictionary, then combine it and the postal code/borough/neighborhood data into a new _toronto_data_ dataframe.

In [10]:
# read the coordinates from the csv file into a dictionary.
lat_long_coords = {}

geodata = open('Geospatial_Coordinates.csv', 'r')

for curr_line in geodata:
  postal_code, latitude, longitude = curr_line.split(',')

  lat_long_coords[postal_code] = (latitude, longitude[:-1], )


geodata.close()


In [11]:
# loop through the rows in the dataframe, putting the combined postal code/borough/neighborhood/latitude and longitude information into a new dataframe
full_table_contents = []


for index, row in df.iterrows():
  cell = {}
  cell["PostalCode"] = row["PostalCode"]  
  cell["Borough"] = row["Borough"]  
  cell["Neighborhood"] = row["Neighborhood"]  

  if row["PostalCode"] in lat_long_coords:
    cell["Latitude"] = lat_long_coords[row["PostalCode"]][0]
    cell["Longitude"] = lat_long_coords[row["PostalCode"]][1]

  full_table_contents.append(cell)


toronto_data = pd.DataFrame(full_table_contents)

toronto_data.head()


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7532586,-79.3296565
1,M4A,North York,Victoria Village,43.7258823,-79.3155716
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6542599,-79.3606359
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.4647633
4,M7A,Queen's Park,Ontario Provincial Government,43.6623015,-79.3894938


In [12]:
toronto_data.shape

(103, 5)

### The new dataframe has 103 rows, or the same number as the original dataframe.  So no rows were left out.

In [13]:
toronto_data.groupby('Borough').count()

Unnamed: 0_level_0,PostalCode,Neighborhood,Latitude,Longitude
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Central Toronto,9,9,9,9
Downtown Toronto,17,17,17,17
Downtown Toronto Stn A,1,1,1,1
East Toronto,4,4,4,4
East Toronto Business,1,1,1,1
East York,4,4,4,4
East York/East Toronto,1,1,1,1
Etobicoke,11,11,11,11
Etobicoke Northwest,1,1,1,1
Mississauga,1,1,1,1


### Downtown Toronto, North York, and Scarborough have the most neighborhoods.

## Part 3 - Explore and Cluster the Neighborhoods in Toronto

### Create a map of the neighborhoods in Toronto

In [14]:
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
!pip install sklearn
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
!pip install folium
import folium # map rendering library




### First set up an instance of geocoder for Toronto

In [15]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


### Now create the map of Toronto

In [16]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Borough'], toronto_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Define the Foursquare API credentials

In [17]:
CLIENT_ID = 'FFLI1QQCW1LJXXOSTNDXAYO32RNNNTMFZL4OOKJGLIHNWJUR' # your Foursquare ID
CLIENT_SECRET = 'LQLDT1TZOE4AK2RRK5EJ2JALLUAIZI5BQYQWFDJS3B25M2XU' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value


### Define a function to get venues near a neighborhood

In [18]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Gather the venues in each neighborhood into the _toronto_venues_ dataframe

In [19]:
toronto_venues = getNearbyVenues(names = toronto_data['Neighborhood'],
                                 latitudes = toronto_data['Latitude'],
                                 longitudes = toronto_data['Longitude'])

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Ontario Provincial Government
Islington Avenue
Malvern, Rouge
Don Mills North
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills South
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
The Danforth  East
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmount Park
Bayview Village
Downsview East
The Danforth

### Check the size of the _toronto_venues_ dataframe, the number of venues in each neighborhood, and the number of distinct venue categories

In [20]:
print(toronto_venues.shape)
toronto_venues.head()

(2150, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.7532586,-79.3296565,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.7532586,-79.3296565,KFC,43.754387,-79.333021,Fast Food Restaurant
2,Parkwoods,43.7532586,-79.3296565,Variety Store,43.751974,-79.333114,Food & Drink Shop
3,Victoria Village,43.7258823,-79.3155716,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.7258823,-79.3155716,Tim Hortons,43.725517,-79.313103,Coffee Shop


In [21]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,3,3,3,3,3,3
"Alderwood, Long Branch",9,9,9,9,9,9
"Bathurst Manor, Wilson Heights, Downsview North",24,24,24,24,24,24
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",25,25,25,25,25,25
...,...,...,...,...,...,...
Willowdale West,5,5,5,5,5,5
"Willowdale, Newtonbrook",1,1,1,1,1,1
Woburn,3,3,3,3,3,3
Woodbine Heights,7,7,7,7,7,7


In [22]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 278 uniques categories.


### Since there are about the same number of unique categories for all of Toronta as there were for all of Manhattan, I'll include all of Toronto's boroughs in my cluster analysis.

### First, reshape the venue data.

In [23]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
cols = list(toronto_onehot.columns.values)
cols.pop(cols.index('Neighborhood'))
toronto_onehot = toronto_onehot[['Neighborhood'] + cols]

toronto_onehot.head()


Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [24]:
toronto_onehot.shape

(2150, 278)

### Next, group rows by neighborhood and by taking the mean of the frequency of occurrence of each category


In [25]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,Willowdale West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
96,"Willowdale, Newtonbrook",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
97,Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
98,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [26]:
toronto_grouped.shape

(100, 278)

### Print each neighborhood with its five most common venues

In [27]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                       venue  freq
0  Latin American Restaurant  0.33
1                     Lounge  0.33
2             Breakfast Spot  0.33
3         Miscellaneous Shop  0.00
4                      Motel  0.00


----Alderwood, Long Branch----
                venue  freq
0         Pizza Place  0.22
1                 Gym  0.11
2      Sandwich Place  0.11
3  Athletics & Sports  0.11
4         Coffee Shop  0.11


----Bathurst Manor, Wilson Heights, Downsview North----
                venue  freq
0                Bank  0.08
1         Coffee Shop  0.08
2       Grocery Store  0.04
3  Chinese Restaurant  0.04
4         Gas Station  0.04


----Bayview Village----
                 venue  freq
0  Japanese Restaurant  0.25
1                 Café  0.25
2                 Bank  0.25
3   Chinese Restaurant  0.25
4          Music Venue  0.00


----Bedford Park, Lawrence Manor East----
                     venue  freq
0           Sandwich Place  0.08
1              Coffee Shop  0.08
2 

### Put this information into a new _pandas_ dataframe.  First write a function that puts each neighborhood's data into descending order.  Then use it to create the new dataframe.

In [28]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [29]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Latin American Restaurant,Lounge,Breakfast Spot,Miscellaneous Shop,Motel,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop
1,"Alderwood, Long Branch",Pizza Place,Gym,Sandwich Place,Athletics & Sports,Coffee Shop,Pub,Playground,Pharmacy,Performing Arts Venue,Movie Theater
2,"Bathurst Manor, Wilson Heights, Downsview North",Bank,Coffee Shop,Grocery Store,Chinese Restaurant,Gas Station,Supermarket,Sandwich Place,Frozen Yogurt Shop,Restaurant,Fried Chicken Joint
3,Bayview Village,Japanese Restaurant,Café,Bank,Chinese Restaurant,Music Venue,Movie Theater,Motel,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant
4,"Bedford Park, Lawrence Manor East",Sandwich Place,Coffee Shop,Italian Restaurant,Comfort Food Restaurant,Toy / Game Store,Breakfast Spot,Pizza Place,Butcher,Café,Liquor Store


### Now I can do the cluster modeling

### Run a k-means cluster to sort the data into 5 neighborhoods.

In [30]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1], dtype=int32)

### Use these clusters to create a new dataframe with the 10 most common venues

In [31]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.shape

(103, 16)

In [32]:
toronto_merged.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.7532586,-79.3296565,1.0,Fast Food Restaurant,Food & Drink Shop,Park,Accessories Store,Miscellaneous Shop,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop
1,M4A,North York,Victoria Village,43.7258823,-79.3155716,0.0,Pizza Place,Hockey Arena,Portuguese Restaurant,Coffee Shop,French Restaurant,Mobile Phone Shop,Motel,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6542599,-79.3606359,0.0,Coffee Shop,Bakery,Park,Pub,Breakfast Spot,Theater,Café,Yoga Studio,Cosmetics Shop,Spa
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.4647633,0.0,Clothing Store,Furniture / Home Store,Accessories Store,Vietnamese Restaurant,Event Space,Coffee Shop,Boutique,Performing Arts Venue,Park,Mediterranean Restaurant
4,M7A,Queen's Park,Ontario Provincial Government,43.6623015,-79.3894938,0.0,Coffee Shop,Sushi Restaurant,Yoga Studio,Café,Beer Bar,Fried Chicken Joint,Spa,Mexican Restaurant,Smoothie Shop,Burrito Place
5,M9A,Etobicoke,Islington Avenue,43.6678556,-79.5322424,,,,,,,,,,,
6,M1B,Scarborough,"Malvern, Rouge",43.8066863,-79.1943534,0.0,Fast Food Restaurant,Accessories Store,Miscellaneous Shop,Motel,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Middle Eastern Restaurant
7,M3B,North York,Don Mills North,43.7459058,-79.352188,0.0,Caribbean Restaurant,Gym,Café,Athletics & Sports,Japanese Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mexican Restaurant,Mobile Phone Shop,Modern European Restaurant
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.7063972,-79.309937,0.0,Pizza Place,Intersection,Pharmacy,Pet Store,Café,Spa,Bank,Gastropub,Athletics & Sports,Flea Market
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.6571618,-79.3789371,0.0,Coffee Shop,Clothing Store,Cosmetics Shop,Middle Eastern Restaurant,Japanese Restaurant,Café,Bubble Tea Shop,Theater,Pizza Place,Diner


In [33]:
toronto_merged[['Cluster Labels']].describe()

Unnamed: 0,Cluster Labels
count,100.0
mean,0.3
std,0.758787
min,0.0
25%,0.0
50%,0.0
75%,0.0
max,4.0


### The _toronto_merged_ dataframe has NaN values in the 'Cluster Labels' column.  Need to replace the NaN values with numpy arrays [0].  For some reason, couldn't get fillna() to work, so had to iterate through rows and check each one.

In [34]:
fill_array = np.array([0.0])

for index, row in toronto_merged.iterrows():
  if np.isnan(row['Cluster Labels']):
    print(str(row['Cluster Labels']))
 
    toronto_merged.loc[index, 'Cluster Labels'] = fill_array


toronto_merged[['Cluster Labels']].describe()

nan
nan
nan


Unnamed: 0,Cluster Labels
count,103.0
mean,0.291262
std,0.749262
min,0.0
25%,0.0
50%,0.0
75%,0.0
max,4.0


### Put the information in the new dataframe onto a map.

In [35]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine the clusters, to determine which venues determine the differences between them.

### Cluster 1 - Coffee shops appear to be most common venues in this cluster.

In [38]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0.0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,0.0,Pizza Place,Hockey Arena,Portuguese Restaurant,Coffee Shop,French Restaurant,Mobile Phone Shop,Motel,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant
2,Downtown Toronto,0.0,Coffee Shop,Bakery,Park,Pub,Breakfast Spot,Theater,Café,Yoga Studio,Cosmetics Shop,Spa
3,North York,0.0,Clothing Store,Furniture / Home Store,Accessories Store,Vietnamese Restaurant,Event Space,Coffee Shop,Boutique,Performing Arts Venue,Park,Mediterranean Restaurant
4,Queen's Park,0.0,Coffee Shop,Sushi Restaurant,Yoga Studio,Café,Beer Bar,Fried Chicken Joint,Spa,Mexican Restaurant,Smoothie Shop,Burrito Place
5,Etobicoke,0.0,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...
96,Downtown Toronto,0.0,Pizza Place,Coffee Shop,Café,Bakery,Pub,Restaurant,Italian Restaurant,Gastropub,Beer Store,Liquor Store
97,Downtown Toronto,0.0,Coffee Shop,Café,Hotel,Gym,Restaurant,Japanese Restaurant,Salad Place,Seafood Restaurant,Bakery,Steakhouse
99,Downtown Toronto,0.0,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Gay Bar,Restaurant,Yoga Studio,Men's Store,Bubble Tea Shop,Hotel,Smoke Shop
100,East Toronto Business,0.0,Light Rail Station,Spa,Auto Workshop,Comic Shop,Park,Recording Studio,Restaurant,Skate Park,Burrito Place,Farmers Market


### Cluster 2 - Parks are either 1st, 2nd, or 3rd most common venues in this cluster.

In [37]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1.0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,1.0,Fast Food Restaurant,Food & Drink Shop,Park,Accessories Store,Miscellaneous Shop,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop
21,York,1.0,Park,Women's Store,Bar,Accessories Store,Miscellaneous Shop,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop
35,East York/East Toronto,1.0,Convenience Store,Park,Accessories Store,Miscellaneous Shop,Motel,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop
49,North York,1.0,Park,Bakery,Construction & Landscaping,Basketball Court,Mobile Phone Shop,Motel,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant
61,Central Toronto,1.0,Photography Studio,Bus Line,Park,Swim School,Accessories Store,Miscellaneous Shop,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop
66,North York,1.0,Electronics Store,Convenience Store,Park,Accessories Store,Motel,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop
68,Central Toronto,1.0,Sushi Restaurant,Jewelry Store,Trail,Park,Accessories Store,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop
77,Etobicoke,1.0,Mobile Phone Shop,Park,Sandwich Place,Accessories Store,Motel,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Middle Eastern Restaurant
83,Central Toronto,1.0,Tennis Court,Lawyer,Park,Accessories Store,Miscellaneous Shop,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop
85,Scarborough,1.0,Playground,Intersection,Park,Mobile Phone Shop,Motel,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Miscellaneous Shop


### Cluster 3 - Pools tend to be most common venues in this cluster.

In [36]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2.0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
57,North York,2.0,Baseball Field,Accessories Store,Mobile Phone Shop,Motel,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Miscellaneous Shop,Museum
98,Etobicoke,2.0,Pool,River,Accessories Store,Mobile Phone Shop,Motel,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Miscellaneous Shop
101,Etobicoke,2.0,Pool,Construction & Landscaping,Baseball Field,Accessories Store,Mobile Phone Shop,Motel,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant


### Cluster 4 - Park is 1st most common venue and miscellaneous shops are 2nd or 3rd most common venues in this cluster.

In [39]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3.0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
40,North York,3.0,Park,Airport,Miscellaneous Shop,Motel,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Accessories Store
52,North York,3.0,Park,Miscellaneous Shop,Motel,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Accessories Store,Museum
64,York,3.0,Park,Miscellaneous Shop,Motel,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Accessories Store,Museum


### Cluster 5 - Scarborough is its own cluster

In [40]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4.0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,Scarborough,4.0,Playground,Mobile Phone Shop,Movie Theater,Motel,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Miscellaneous Shop,Martial Arts School
