# Segmenting and Clustering Toronto Neighborhoods
## Project by Isaias H. 
### June 3, 2020

## Project Description 
In this assignment we will be going over certain techniques that were learned in the course. First we will obtain the data from Wikipedia, then transform it into a dataframe using Pandas. We will then wrangle and clean the data. Following this step we will obtain a csv data file that contains information pertaining latitudes and longitudes, this will allow us to build a map using folium. Following this is the "cool" part where we connect to Foursquare and obtain venue information. Lets dive in and see how this works. 

### Import Data
One of the most important part of using Python and Jupiter Notebooks is the ability to import impotant libraries and packages. here are the libraries I imported

In [1]:
import pandas as pd 
import numpy as np
import requests as rq

from geopy.geocoders import Nominatim 
import matplotlib.cm as cm
import matplotlib.colors as colors
#!conda install -c anaconda scikit-learn
from sklearn.cluster import KMeans
#!conda install -c conda-forge folium=0.5.0 --yes
import folium
print("Libraries have been imported")

Libraries have been imported


### Obtain the dataset using Pandas
This method is easily one of the fastest in obtaining tables, the information was obtained by a Wikipedia page.

In [2]:
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


### Make a copy dataframe
Here I made a copy of the dataframe as sometimes it is wise to have an original and alter the copy.

In [3]:
df_copy = df
df_copy.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


### Drop rows which contain "Not Assigned"

In [4]:
df_copy = df.drop(df[(df.Borough == "Not assigned")].index)
df_copy.Neighborhood.replace("Not assigned", df_copy.Borough, inplace = True)
df_copy.Neighborhood.fillna(df_copy.Borough, inplace = True)
df_copy.head(15)

Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"
11,M3B,North York,Don Mills
12,M4B,East York,"Parkview Hill, Woodbine Gardens"
13,M5B,Downtown Toronto,"Garden District, Ryerson"


### .shape gives us the rows and column count of our dataframe

In [5]:
df_copy.shape

(103, 3)

### Geospatial data given as a csv file that can be accessed through this site

In [6]:
csv_data_given = pd.read_csv("http://cocl.us/Geospatial_data")
csv_data_given.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


### Geospatial data and df_copy were merged and given a new name (merge_data) 

In [7]:
csv_data_given.set_index("Postal Code")
df_copy.set_index("Postal Code")
merge_data = pd.merge(df_copy, csv_data_given)
merge_data.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


### Renamed a column from (Postal Code) to (PostalCode)

In [8]:
merge_data.rename(columns = {'Postal Code': 'PostalCode'}, inplace = True)
merge_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [9]:
print('This data frame has {} boroughs and {} neighborhoods.'.format(len(merge_data['Borough'].unique()), merge_data.shape[0]))  

This data frame has 10 boroughs and 103 neighborhoods.


### Getting coordinate information from a city using a geolocator

In [10]:
address = 'Toronto, CA'

geolocator = Nominatim(user_agent="ont_exp")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Toronto are 43.6534817, -79.3839347.


### Created a map using folium and the Toronto information

In [11]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=13)

for lat, lng, borough, neighborhood in zip(merge_data['Latitude'], merge_data['Longitude'], merge_data['Borough'], merge_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng], radius=5, popup=label, color='red', fill=True, fill_color='blue', fill_opacity=0.8, 
                        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Credentials Hidden information Below**

In [48]:
# The code was removed by Watson Studio for sharing.

Your credentails:
CLIENT_ID: CLIENT_ID
CLIENT_SECRET:CLIENT_SECRET


In [13]:
merge_data.loc[0, 'Neighborhood']

'Parkwoods'

### Information is going to be used to grab the location and name of the first row in the file

In [14]:
park_la = merge_data.loc[0, 'Latitude']
park_lo = merge_data.loc[0, 'Longitude']
park_name = merge_data.loc[0, 'Neighborhood']
print('Latitude and Longitude of {}, are {}, {}'.format(park_name, park_la, park_lo))

Latitude and Longitude of Parkwoods, are 43.7532586, -79.3296565


### Attributes are given a value to get proper formating and obtain information from foursquare

In [49]:
LIMIT = 100
RADIUS = 1000
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, park_la, park_lo, RADIUS, LIMIT)

### Information is given in json file
json files are sometimes hard to read and hard to decifer

In [16]:
results = rq.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5effc591ad353c77ab3927e2'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Parkwoods - Donalda',
  'headerFullLocation': 'Parkwoods - Donalda, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 28,
  'suggestedBounds': {'ne': {'lat': 43.762258609000014,
    'lng': -79.31721997969855},
   'sw': {'lat': 43.74425859099999, 'lng': -79.34209302030145}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b8991cbf964a520814232e3',
       'name': "Allwyn's Bakery",
       'location': {'address': '81 Underhill drive',
        'lat': 43.75984035203157,
        'lng': -79.32471879917513,
        'labeledLatLngs': [{'label': 'display'

### We define an argument that will allow us to get the shop name and venue type to help us create a dataframe from json file

In [17]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

### Import json and json_normalize

In [18]:
import json
from pandas.io.json import json_normalize

### Clean the json file and structure it as a pandas dataframe 

In [19]:
venues = results['response']['groups'][0]['items']
nearby_v = json_normalize(venues)
filter_col = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_v = nearby_v.loc[:, filter_col]
nearby_v['venue.categories'] = nearby_v.apply(get_category_type, axis=1)
nearby_v.columns = [col.split(".")[-1] for col in nearby_v.columns]
nearby_v.head(15)

Unnamed: 0,name,categories,lat,lng
0,Allwyn's Bakery,Caribbean Restaurant,43.75984,-79.324719
1,Brookbanks Park,Park,43.751976,-79.33214
2,Tim Hortons,Café,43.760668,-79.326368
3,Bruno's valu-mart,Grocery Store,43.746143,-79.32463
4,High Street Fish & Chips,Fish & Chips Shop,43.74526,-79.324949
5,A&W,Fast Food Restaurant,43.760643,-79.326865
6,Shoppers Drug Mart,Pharmacy,43.760857,-79.324961
7,Pizza Pizza,Pizza Place,43.760231,-79.325666
8,Food Basics,Supermarket,43.760549,-79.326045
9,Shoppers Drug Mart,Pharmacy,43.745315,-79.3258


In [20]:
print('{} venues were returned by Foursquare.'.format(nearby_v.shape[0]))

28 venues were returned by Foursquare.


### Make an method that will accept names, lat, long, and radius
this will give us the venue name, lat, long from Toronto

In [21]:
def getNearbyVenues(names, latitudes, longitudes, RADIUS=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            RADIUS, 
            LIMIT)
            
        # make the GET request
        results = rq.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_v = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_v.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_v)

### Grab the venues in Toronto by accessing the method and passing merge_data values

In [22]:
toronto_v = getNearbyVenues(names = merge_data['Neighborhood'], 
                            latitudes = merge_data['Latitude'],
                            longitudes = merge_data['Longitude'])

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

### Get the shape of the venues in the new dataframe

In [23]:
print(toronto_v.shape)
toronto_v.head()

(2129, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


### Group the data based on Neighborhoods and get a count of the values

In [24]:
toronto_v.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,5,5,5,5,5,5
"Alderwood, Long Branch",8,8,8,8,8,8
"Bathurst Manor, Wilson Heights, Downsview North",20,20,20,20,20,20
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",24,24,24,24,24,24
Berczy Park,58,58,58,58,58,58
"Birch Cliff, Cliffside West",4,4,4,4,4,4
"Brockton, Parkdale Village, Exhibition Place",22,22,22,22,22,22
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",19,19,19,19,19,19
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",15,15,15,15,15,15


In [25]:
print('There are {} unique categories.'.format(len(toronto_v['Venue Category'].unique())))

There are 268 unique categories.


### Create one hot encoding and pass some dummy variables 
#### Average the dummy variables within the Neighborhood

In [26]:
toront_1hot = pd.get_dummies(toronto_v[['Venue Category']], prefix = "", prefix_sep = "")
toront_1hot['Neighborhood'] = toronto_v['Neighborhood']
fix_col = [toront_1hot.columns[-1]]+list(toront_1hot.columns[:-1])
toront_1hot = toront_1hot[fix_col]
neighbor_group = toront_1hot.groupby('Neighborhood').mean().reset_index()
neighbor_group.head()

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [27]:
print(neighbor_group.shape)

(93, 268)


### Create a loop that displays best venues in each Neighborhood

In [28]:
top_5 = 5
for hood in neighbor_group['Neighborhood']:
    print('====='+hood+"====")
    temp = neighbor_group[neighbor_group['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue', 'freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending = False).reset_index(drop = True).head(top_5))
    print('\n')

=====Agincourt====
                       venue  freq
0             Clothing Store   0.2
1             Breakfast Spot   0.2
2                     Lounge   0.2
3  Latin American Restaurant   0.2
4               Skating Rink   0.2


=====Alderwood, Long Branch====
            venue  freq
0     Pizza Place  0.25
1  Sandwich Place  0.12
2             Gym  0.12
3            Pool  0.12
4    Dance Studio  0.12


=====Bathurst Manor, Wilson Heights, Downsview North====
                       venue  freq
0                       Bank  0.10
1                Coffee Shop  0.10
2  Middle Eastern Restaurant  0.05
3           Sushi Restaurant  0.05
4                Supermarket  0.05


=====Bayview Village====
                 venue  freq
0  Japanese Restaurant  0.25
1                 Café  0.25
2   Chinese Restaurant  0.25
3                 Bank  0.25
4          Yoga Studio  0.00


=====Bedford Park, Lawrence Manor East====
                     venue  freq
0               Restaurant  0.08
1       Ital

### Create method that helps us sort the values from best to worst

In [29]:
def return_common_v(row, top_5):
    row_cats = row.iloc[1:]
    row_cats_sort = row_cats.sort_values(ascending = False)
    return row_cats_sort.index.values[0:top_5]

### Allow a max value of choosing and create a dataframe 
Using the information from above a dataframe will be made based on the best common venues

In [30]:
top_5 = 10

indicators = ['st', 'nd', 'rd']
columns = ['Neighborhood']
for ind in np.arange(top_5):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
        
t_neigh_sort =  pd.DataFrame(columns = columns)
t_neigh_sort['Neighborhood'] = neighbor_group['Neighborhood']

for ind in np.arange(neighbor_group.shape[0]):
    t_neigh_sort.iloc[ind, 1:] = return_common_v(neighbor_group.iloc[ind, :], top_5)
    
t_neigh_sort.head(15)    

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Lounge,Latin American Restaurant,Breakfast Spot,Skating Rink,Clothing Store,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant
1,"Alderwood, Long Branch",Pizza Place,Dance Studio,Pub,Gym,Coffee Shop,Sandwich Place,Pool,Dessert Shop,Dim Sum Restaurant,Diner
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Frozen Yogurt Shop,Shopping Mall,Bridal Shop,Sandwich Place,Diner,Restaurant,Deli / Bodega,Supermarket
3,Bayview Village,Café,Bank,Chinese Restaurant,Japanese Restaurant,Women's Store,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
4,"Bedford Park, Lawrence Manor East",Coffee Shop,Restaurant,Sandwich Place,Italian Restaurant,Thai Restaurant,Pharmacy,Pizza Place,Pub,Café,Butcher
5,Berczy Park,Coffee Shop,Cocktail Bar,Restaurant,Cheese Shop,Seafood Restaurant,Beer Bar,Bakery,Café,Vegetarian / Vegan Restaurant,Diner
6,"Birch Cliff, Cliffside West",College Stadium,Skating Rink,General Entertainment,Café,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run
7,"Brockton, Parkdale Village, Exhibition Place",Café,Coffee Shop,Breakfast Spot,Grocery Store,Furniture / Home Store,Convenience Store,Performing Arts Venue,Pet Store,Climbing Gym,Restaurant
8,"Business reply mail Processing Centre, South C...",Light Rail Station,Auto Workshop,Comic Shop,Pizza Place,Recording Studio,Restaurant,Butcher,Burrito Place,Brewery,Skate Park
9,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Harbor / Marina,Airport Terminal,Coffee Shop,Sculpture Garden,Rental Car Location,Boat or Ferry,Bar,Airport Food Court


### Give cluster size of choosing, I went with 5 which seemed ok for the data that was given

In [31]:
k_clust = 5 
toronto_clust = neighbor_group.drop('Neighborhood', 1)
kmean = KMeans(n_clusters = k_clust, random_state = 0).fit(toronto_clust)
kmean.labels_[0:15]

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1], dtype=int32)

### Insert the clusters into the dataframe that includes a merge from merge_data and (to_neigh_sort)
to_neigh_sort was the sorting of neighborhoods, where best venues were given from 1- 10th best

In [32]:
t_neigh_sort.insert(0, 'Cluster Labels', kmean.labels_)
tor_merge = merge_data
tor_merge = tor_merge.join(t_neigh_sort.set_index('Neighborhood'), on = 'Neighborhood')
tor_merge.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,Food & Drink Shop,Park,Women's Store,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore
1,M4A,North York,Victoria Village,43.725882,-79.315572,1.0,Portuguese Restaurant,Coffee Shop,French Restaurant,Hockey Arena,Intersection,Pizza Place,Eastern European Restaurant,Drugstore,Donut Shop,Electronics Store
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1.0,Coffee Shop,Park,Pub,Bakery,Theater,Breakfast Spot,Café,Ice Cream Shop,Spa,Shoe Store
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,1.0,Clothing Store,Accessories Store,Furniture / Home Store,Boutique,Vietnamese Restaurant,Miscellaneous Shop,Coffee Shop,Event Space,Doner Restaurant,Diner
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,1.0,Coffee Shop,Diner,Sushi Restaurant,Yoga Studio,College Auditorium,Bar,Beer Bar,Smoothie Shop,Sandwich Place,Burrito Place


### Get rid of NAN values

In [36]:
tor_merge = tor_merge.dropna(axis = 0)


In [37]:
tor_merge.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,Food & Drink Shop,Park,Women's Store,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore
1,M4A,North York,Victoria Village,43.725882,-79.315572,1.0,Portuguese Restaurant,Coffee Shop,French Restaurant,Hockey Arena,Intersection,Pizza Place,Eastern European Restaurant,Drugstore,Donut Shop,Electronics Store
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1.0,Coffee Shop,Park,Pub,Bakery,Theater,Breakfast Spot,Café,Ice Cream Shop,Spa,Shoe Store
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,1.0,Clothing Store,Accessories Store,Furniture / Home Store,Boutique,Vietnamese Restaurant,Miscellaneous Shop,Coffee Shop,Event Space,Doner Restaurant,Diner
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,1.0,Coffee Shop,Diner,Sushi Restaurant,Yoga Studio,College Auditorium,Bar,Beer Bar,Smoothie Shop,Sandwich Place,Burrito Place


### Obtain info and make sure that Cluster Labels are Integers and not Float

In [38]:
tor_merge.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 97 entries, 0 to 102
Data columns (total 16 columns):
PostalCode                97 non-null object
Borough                   97 non-null object
Neighborhood              97 non-null object
Latitude                  97 non-null float64
Longitude                 97 non-null float64
Cluster Labels            97 non-null float64
1st Most Common Venue     97 non-null object
2nd Most Common Venue     97 non-null object
3rd Most Common Venue     97 non-null object
4th Most Common Venue     97 non-null object
5th Most Common Venue     97 non-null object
6th Most Common Venue     97 non-null object
7th Most Common Venue     97 non-null object
8th Most Common Venue     97 non-null object
9th Most Common Venue     97 non-null object
10th Most Common Venue    97 non-null object
dtypes: float64(3), object(13)
memory usage: 12.9+ KB


### If Cluster Labels are Float then assign them an Integer Values
This is highly important as this is needed for the colors of the clusters

In [39]:
col = ['Cluster Labels']
tor_merge[col] = tor_merge[col].applymap(np.int64)
print(tor_merge)

    PostalCode           Borough  \
0          M3A        North York   
1          M4A        North York   
2          M5A  Downtown Toronto   
3          M6A        North York   
4          M7A  Downtown Toronto   
6          M1B       Scarborough   
7          M3B        North York   
8          M4B         East York   
9          M5B  Downtown Toronto   
10         M6B        North York   
12         M1C       Scarborough   
13         M3C        North York   
14         M4C         East York   
15         M5C  Downtown Toronto   
16         M6C              York   
17         M9C         Etobicoke   
18         M1E       Scarborough   
19         M4E      East Toronto   
20         M5E  Downtown Toronto   
21         M6E              York   
22         M1G       Scarborough   
23         M4G         East York   
24         M5G  Downtown Toronto   
25         M6G  Downtown Toronto   
26         M1H       Scarborough   
27         M2H        North York   
28         M3H        North 

### Map the clusters and see the results

In [40]:
map_clust = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k_clust)
ys = [i + x + (i*x)**2 for i in range(k_clust)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(tor_merge['Latitude'], tor_merge['Longitude'], tor_merge['Neighborhood'], tor_merge['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clust)
       
map_clust

## Cluster 1 
10 rows available for this cluster

In [41]:
tor_merge.loc[tor_merge['Cluster Labels'] == 0, tor_merge.columns[[1] + list(range(5, tor_merge.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,0,Food & Drink Shop,Park,Women's Store,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore
21,York,0,Park,Pool,Women's Store,Greek Restaurant,Gourmet Shop,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop
35,East York,0,Park,Metro Station,Convenience Store,Women's Store,Doner Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Drugstore
61,Central Toronto,0,Park,Bus Line,Swim School,Dog Run,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Doner Restaurant,Farmers Market
66,North York,0,Park,Convenience Store,Women's Store,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Eastern European Restaurant
77,Etobicoke,0,Park,Sandwich Place,Department Store,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Doner Restaurant
83,Central Toronto,0,Park,Women's Store,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore
85,Scarborough,0,Park,Playground,Sculpture Garden,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Women's Store
91,Downtown Toronto,0,Park,Trail,Playground,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
98,Etobicoke,0,Park,River,Smoke Shop,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Women's Store


## Cluster 2
83 rows available, an excessive amount of rows for one cluster, the things that are in common are venues such as coffee shops in this cluster

In [42]:
tor_merge.loc[tor_merge['Cluster Labels'] == 1, tor_merge.columns[[1] + list(range(5, tor_merge.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,1,Portuguese Restaurant,Coffee Shop,French Restaurant,Hockey Arena,Intersection,Pizza Place,Eastern European Restaurant,Drugstore,Donut Shop,Electronics Store
2,Downtown Toronto,1,Coffee Shop,Park,Pub,Bakery,Theater,Breakfast Spot,Café,Ice Cream Shop,Spa,Shoe Store
3,North York,1,Clothing Store,Accessories Store,Furniture / Home Store,Boutique,Vietnamese Restaurant,Miscellaneous Shop,Coffee Shop,Event Space,Doner Restaurant,Diner
4,Downtown Toronto,1,Coffee Shop,Diner,Sushi Restaurant,Yoga Studio,College Auditorium,Bar,Beer Bar,Smoothie Shop,Sandwich Place,Burrito Place
7,North York,1,Gym,Asian Restaurant,Beer Store,Japanese Restaurant,Coffee Shop,Restaurant,Sandwich Place,Bike Shop,Sporting Goods Shop,Italian Restaurant
8,East York,1,Pizza Place,Gym / Fitness Center,Gastropub,Fast Food Restaurant,Intersection,Bank,Athletics & Sports,Breakfast Spot,Pet Store,Pharmacy
9,Downtown Toronto,1,Clothing Store,Coffee Shop,Bubble Tea Shop,Japanese Restaurant,Café,Middle Eastern Restaurant,Cosmetics Shop,Fast Food Restaurant,Tea Room,Ramen Restaurant
10,North York,1,Park,Pizza Place,Metro Station,Japanese Restaurant,Pub,Distribution Center,Dim Sum Restaurant,Diner,Discount Store,Dog Run
13,North York,1,Gym,Asian Restaurant,Beer Store,Japanese Restaurant,Coffee Shop,Restaurant,Sandwich Place,Bike Shop,Sporting Goods Shop,Italian Restaurant
14,East York,1,Park,Video Store,Skating Rink,Beer Store,Athletics & Sports,Curling Ice,Dance Studio,Pharmacy,Concert Hall,Dim Sum Restaurant


## Cluster 3
1 row 

In [43]:
tor_merge.loc[tor_merge['Cluster Labels'] == 2, tor_merge.columns[[1] + list(range(5, tor_merge.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Scarborough,2,Fast Food Restaurant,Dessert Shop,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop


## Cluster 4
1 row

In [44]:
tor_merge.loc[tor_merge['Cluster Labels'] == 3, tor_merge.columns[[1] + list(range(5, tor_merge.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Scarborough,3,Bar,Women's Store,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore,Field


## Cluster 5
2 rows

In [45]:
tor_merge.loc[tor_merge['Cluster Labels'] == 4, tor_merge.columns[[1] + list(range(5, tor_merge.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
57,North York,4,Baseball Field,Women's Store,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Field
101,Etobicoke,4,Home Service,Baseball Field,Women's Store,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dim Sum Restaurant


# Thank You !