## IBM Data Science Capstone Project Final Notebook
#### Topic : Opening a new Peruvian Restaurant in Toronto

In this notebook, I will be creating clusters to find the most suitable location to open an authentic Peruvian restaurant in Toronto, Canada.

In [133]:
#importing necessary libraries
import requests
import pandas as pd
import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


In [None]:
#to make it easier, we will store this in csv format.
#Export to .CSV
df_postcode.to_csv('Toronto_Postcodes.csv')

In [134]:
import numpy as np

Since GeoEncoder library is not working for me, I will use the csv file downloaded

In [135]:
#Read CSV file from link and load into dataframe
url_csv = 'http://cocl.us/Geospatial_data'
df_coordinates = pd.read_csv(url_csv)
df_coordinates.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [136]:
#use the previously cleaned data
df_neighborhoods = pd.read_csv('Toronto_Postcodes.csv',index_col=[0])
df_neighborhoods.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [137]:
# Make sure both dataframes have the same 
df_coordinates.rename(columns={'Postal Code': 'PostalCode'}, inplace=True)
df_neighborhoods.rename(columns={'Postcode': 'PostalCode'}, inplace=True)

In [138]:
# Merge both datasets
df_neighborhoods_coordinates = pd.merge(df_neighborhoods, df_coordinates, on='PostalCode')
df_neighborhoods_coordinates.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [139]:
# Check coordinates for a couple of neighborhoods
df_neighborhoods_coordinates[(df_neighborhoods_coordinates['PostalCode']=='M5G') |
                             (df_neighborhoods_coordinates['PostalCode']=='M2H') ]

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
17,M2H,North York,Hillcrest Village,43.803762,-79.363452
57,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383


In [140]:
#Export to .CSV
df_neighborhoods_coordinates.to_csv('Toronto_Postcodes_2.csv')

In [141]:
# Read .csv file from above
df = pd.read_csv('Toronto_Postcodes_2.csv', index_col=0)
df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [142]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(df['Borough'].unique()),
        df.shape[0]
    )
)

The dataframe has 11 boroughs and 103 neighborhoods.


In [143]:
df.rename(columns={'Neighbourhood': 'Neighborhood'}, inplace=True)

In [144]:
#count Bourough and Neighborhood
df.groupby('Borough').count()['Neighborhood']

Borough
Central Toronto      9
Downtown Toronto    19
East Toronto         5
East York            5
Etobicoke           11
Mississauga          1
North York          24
Queen's Park         1
Scarborough         17
West Toronto         6
York                 5
Name: Neighborhood, dtype: int64

In [145]:
df_toronto = df[df['Borough'].str.contains('Toronto')]
df_toronto.reset_index(inplace=True)
df_toronto.drop('index', axis=1, inplace=True)
df_toronto.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


In [146]:
#Check the number of neighborhoods
print(df_toronto.groupby('Borough').count()['Neighborhood'])

Borough
Central Toronto      9
Downtown Toronto    19
East Toronto         5
West Toronto         6
Name: Neighborhood, dtype: int64


In [147]:
#Create list with the Boroughs (to be used later)
boroughs = df_toronto['Borough'].unique().tolist()

In [148]:
#Obtain the coordinates from the dataset itself, just averaging Latitude/Longitude of the current dataset 
lat_toronto = df_toronto['Latitude'].mean()
lon_toronto = df_toronto['Longitude'].mean()
print('The geographical coordinates of Toronto are {}, {}'.format(lat_toronto, lon_toronto))

The geographical coordinates of Toronto are 43.66713498717947, -79.38987324871795


In [149]:
borough_color = {}
for borough in boroughs:
    borough_color[borough]= '#%02X%02X%02X' % tuple(np.random.choice(range(256), size=3)) #Random color

In [185]:

map_toronto = folium.Map(location=[lat_toronto, lon_toronto], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'], 
                                           df_toronto['Longitude'],
                                           df_toronto['Borough'], 
                                           df_toronto['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label_text)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=borough_color[borough],
        fill_color=borough_color[borough],
        fill_opacity=0.7).add_to(map_toronto)  
    
map_toronto

## Getting Venues Data using Foursquare

In [151]:
CLIENT_ID = '1OXXYKDWEZFZN4AQZZIWWNQM32MXPCX1LS5DIK2QGSBD0GFE' # your Foursquare ID
CLIENT_SECRET = 'HYZ02P1LWI0TB415NZIP42ETBHL0W5FWEW2HDS4U54LVTWUS' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

In [152]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [155]:
#Get venues for all neighborhoods in our dataset
toronto_venues = getNearbyVenues(names=df_toronto['Neighborhood'],
                                latitudes=df_toronto['Latitude'],
                                longitudes=df_toronto['Longitude'])

The Beaches
The Danforth West, Riverdale
The Beaches West, India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park, Summerhill East
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North, Forest Hill West
The Annex, North Midtown, Yorkville
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie
Dovercourt Village, Dufferin
Little Portugal, Trinity
Brockton, Exhibition Place, Parkdale Village
High Park, The Junction Sout

In [156]:
#Check size of resulting dataframe
toronto_venues.shape

(1616, 7)

In [157]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,The Beaches,43.676357,-79.293031,Glen Stewart Park,43.675278,-79.294647,Park
4,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood


In [158]:
#Number of venues per neighborhood
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",91,91,91,91,91,91
Berczy Park,58,58,58,58,58,58
"Brockton, Exhibition Place, Parkdale Village",23,23,23,23,23,23
Business Reply Mail Processing Centre 969 Eastern,17,17,17,17,17,17
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",14,14,14,14,14,14
"Cabbagetown, St. James Town",42,42,42,42,42,42
Central Bay Street,62,62,62,62,62,62
"Chinatown, Grange Park, Kensington Market",58,58,58,58,58,58
Christie,17,17,17,17,17,17
Church and Wellesley,77,77,77,77,77,77


In [159]:
#Number of unique venue categories
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 236 uniques categories.


In [160]:
#print out the list of categories
toronto_venues['Venue Category'].unique()[:100]

array(['Trail', 'Health Food Store', 'Pub', 'Park', 'Neighborhood',
       'Coffee Shop', 'Cosmetics Shop', 'Greek Restaurant',
       'Italian Restaurant', 'Ice Cream Shop', 'Yoga Studio', 'Brewery',
       'Fruit & Vegetable Store', 'Dessert Shop', 'Restaurant',
       'Pizza Place', 'Juice Bar', 'Bookstore', 'Bubble Tea Shop',
       'Furniture / Home Store', 'Grocery Store', 'Spa', 'Bakery',
       'Caribbean Restaurant', 'Café', 'Indian Restaurant', 'Lounge',
       'Frozen Yogurt Shop', 'American Restaurant', 'Liquor Store', 'Gym',
       'Fish & Chips Shop', 'Fast Food Restaurant', 'Sushi Restaurant',
       'Pet Store', 'Steakhouse', 'Burrito Place', 'Movie Theater',
       'Sandwich Place', 'Board Shop', 'Food & Drink Shop', 'Fish Market',
       'Gay Bar', 'Seafood Restaurant', 'Cheese Shop',
       'Middle Eastern Restaurant', 'Comfort Food Restaurant',
       'Stationery Store', 'Wine Bar', 'Thai Restaurant',
       'Coworking Space', 'Latin American Restaurant', 'Gastropub

In [161]:
# check if the results contain "Thai Restaurants"
#please note I changed the data to Latin American because I was previously writing the code using Asian but the number is so small
"Latin American Restaurant" in toronto_venues['Venue Category'].unique()

True

***Analyze Each Neighborhood***

In [162]:
# one hot encoding
to_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
to_onehot['Neighborhoods'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [to_onehot.columns[-1]] + list(to_onehot.columns[:-1])
to_onehot = to_onehot[fixed_columns]

print(to_onehot.shape)
to_onehot.head()

(1616, 237)


Unnamed: 0,Neighborhoods,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,The Beaches,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
1,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [163]:
to_grouped = to_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(to_grouped.shape)
to_grouped

(39, 237)


Unnamed: 0,Neighborhoods,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.021978,0.0,0.0,...,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.010989,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0
2,"Brockton, Exhibition Place, Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,0.071429,0.071429,0.142857,0.142857,0.142857,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.016129,0.0,0.0,0.016129,0.0,0.0,0.016129
7,"Chinatown, Grange Park, Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.034483,0.0,0.051724,0.017241,0.0,0.0,0.0
8,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Church and Wellesley,0.012987,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.025974


In [164]:
len(to_grouped[to_grouped["Latin American Restaurant"] > 0])

4

Create a new dataframe to find Latin America Restaurants only

In [165]:
to_latin = to_grouped[["Neighborhoods","Latin American Restaurant"]]

In [166]:
to_latin.head()

Unnamed: 0,Neighborhoods,Latin American Restaurant
0,"Adelaide, King, Richmond",0.010989
1,Berczy Park,0.0
2,"Brockton, Exhibition Place, Parkdale Village",0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.0
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0


## Cluster Neighborhoods

Run k-means to cluster the neighborhoods in Toronto into 3 clusters.

In [167]:
# set number of clusters
toclusters = 3

to_clustering = to_latin.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=toclusters, random_state=0).fit(to_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

In [168]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
to_merged = to_latin.copy()

# add clustering labels
to_merged["Cluster Labels"] = kmeans.labels_

In [169]:
to_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
to_merged.head()

Unnamed: 0,Neighborhood,Latin American Restaurant,Cluster Labels
0,"Adelaide, King, Richmond",0.010989,0
1,Berczy Park,0.0,1
2,"Brockton, Exhibition Place, Parkdale Village",0.0,1
3,Business Reply Mail Processing Centre 969 Eastern,0.0,1
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,1


In [170]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
to_merged = to_merged.join(toronto_venues.set_index("Neighborhood"), on="Neighborhood")

print(to_merged.shape)
to_merged.head()

(1616, 9)


Unnamed: 0,Neighborhood,Latin American Restaurant,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Adelaide, King, Richmond",0.010989,0,43.650571,-79.384568,Four Seasons Centre for the Performing Arts,43.650592,-79.385806,Concert Hall
0,"Adelaide, King, Richmond",0.010989,0,43.650571,-79.384568,Nathan Phillips Square,43.65227,-79.383516,Plaza
0,"Adelaide, King, Richmond",0.010989,0,43.650571,-79.384568,Rosalinda,43.650252,-79.385156,Vegetarian / Vegan Restaurant
0,"Adelaide, King, Richmond",0.010989,0,43.650571,-79.384568,The Keg Steakhouse + Bar - York Street,43.649987,-79.384103,Restaurant
0,"Adelaide, King, Richmond",0.010989,0,43.650571,-79.384568,Cafe Landwer,43.648753,-79.385367,Café


In [171]:
# sort the results by Cluster Labels
print(to_merged.shape)
to_merged.sort_values(["Cluster Labels"], inplace=True)
to_merged

(1616, 9)


Unnamed: 0,Neighborhood,Latin American Restaurant,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Adelaide, King, Richmond",0.010989,0,43.650571,-79.384568,Four Seasons Centre for the Performing Arts,43.650592,-79.385806,Concert Hall
10,"Commerce Court, Victoria Hotel",0.010000,0,43.648198,-79.379817,First Canadian Place,43.648482,-79.382443,Building
10,"Commerce Court, Victoria Hotel",0.010000,0,43.648198,-79.379817,Brookfield Place,43.646791,-79.378769,Shopping Mall
10,"Commerce Court, Victoria Hotel",0.010000,0,43.648198,-79.379817,Pravda Vodka Bar,43.648516,-79.374732,Cocktail Bar
10,"Commerce Court, Victoria Hotel",0.010000,0,43.648198,-79.379817,Leña,43.651722,-79.379205,Latin American Restaurant
...,...,...,...,...,...,...,...,...,...
34,Studio District,0.024390,2,43.659526,-79.340923,braised,43.660452,-79.343346,American Restaurant
34,Studio District,0.024390,2,43.659526,-79.340923,Leslieville Cheese Market,43.660546,-79.342302,Cheese Shop
34,Studio District,0.024390,2,43.659526,-79.340923,Purple Penguin Cafe,43.660501,-79.342565,Café
34,Studio District,0.024390,2,43.659526,-79.340923,eastside social,43.661289,-79.339155,Comfort Food Restaurant


Visualize the clusters

In [180]:
# create map
map_clusters = folium.Map(location=[lat_toronto, lon_toronto], zoom_start=11)

# set color scheme for the clusters
x = np.arange(toclusters)
ys = [i+x+(i*x)**2 for i in range(toclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(to_merged['Neighborhood Latitude'], to_merged['Neighborhood Longitude'], to_merged['Neighborhood'], to_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [181]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

## Examine Clusters

In [182]:
#Cluster 0
to_merged.loc[to_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Latin American Restaurant,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Adelaide, King, Richmond",0.010989,0,43.650571,-79.384568,Four Seasons Centre for the Performing Arts,43.650592,-79.385806,Concert Hall
10,"Commerce Court, Victoria Hotel",0.010000,0,43.648198,-79.379817,First Canadian Place,43.648482,-79.382443,Building
10,"Commerce Court, Victoria Hotel",0.010000,0,43.648198,-79.379817,Brookfield Place,43.646791,-79.378769,Shopping Mall
10,"Commerce Court, Victoria Hotel",0.010000,0,43.648198,-79.379817,Pravda Vodka Bar,43.648516,-79.374732,Cocktail Bar
10,"Commerce Court, Victoria Hotel",0.010000,0,43.648198,-79.379817,Leña,43.651722,-79.379205,Latin American Restaurant
...,...,...,...,...,...,...,...,...,...
0,"Adelaide, King, Richmond",0.010989,0,43.650571,-79.384568,The Burger's Priest,43.648643,-79.387539,Fast Food Restaurant
0,"Adelaide, King, Richmond",0.010989,0,43.650571,-79.384568,Kojin,43.649398,-79.386091,Colombian Restaurant
0,"Adelaide, King, Richmond",0.010989,0,43.650571,-79.384568,King Taps,43.648476,-79.382058,Gastropub
0,"Adelaide, King, Richmond",0.010989,0,43.650571,-79.384568,Pilot Coffee Roasters,43.648835,-79.380936,Coffee Shop


In [183]:
#Cluster 1
to_merged.loc[to_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Latin American Restaurant,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
25,North Toronto West,0.0,1,43.715383,-79.405678,Roots,43.716194,-79.400661,Clothing Store
25,North Toronto West,0.0,1,43.715383,-79.405678,St. Clements - Yonge Parkette,43.712062,-79.404255,Park
25,North Toronto West,0.0,1,43.715383,-79.405678,Second Cup,43.714583,-79.400120,Café
25,North Toronto West,0.0,1,43.715383,-79.405678,Sporting Life,43.716277,-79.400248,Sporting Goods Shop
25,North Toronto West,0.0,1,43.715383,-79.405678,Starbucks,43.715590,-79.400450,Coffee Shop
...,...,...,...,...,...,...,...,...,...
14,"Design Exchange, Toronto Dominion Centre",0.0,1,43.647177,-79.381576,Pilot Coffee Roasters,43.648835,-79.380936,Coffee Shop
14,"Design Exchange, Toronto Dominion Centre",0.0,1,43.647177,-79.381576,Brick Street Bakery,43.648815,-79.380605,Bakery
14,"Design Exchange, Toronto Dominion Centre",0.0,1,43.647177,-79.381576,Mos Mos Coffee,43.648159,-79.378745,Café
14,"Design Exchange, Toronto Dominion Centre",0.0,1,43.647177,-79.381576,Equinox Bay Street,43.648100,-79.379989,Gym


In [184]:
#Cluster 2
to_merged.loc[to_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Latin American Restaurant,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
30,"Runnymede, Swansea",0.02381,2,43.651571,-79.484450,Book City (Bloor West),43.650211,-79.481220,Bookstore
30,"Runnymede, Swansea",0.02381,2,43.651571,-79.484450,Awai,43.650412,-79.478477,Vegetarian / Vegan Restaurant
30,"Runnymede, Swansea",0.02381,2,43.651571,-79.484450,RBC Royal Bank,43.650142,-79.480274,Bank
30,"Runnymede, Swansea",0.02381,2,43.651571,-79.484450,Max's Market,43.650525,-79.479145,Gourmet Shop
30,"Runnymede, Swansea",0.02381,2,43.651571,-79.484450,Bloom Restaurant,43.650307,-79.479836,Latin American Restaurant
...,...,...,...,...,...,...,...,...,...
34,Studio District,0.02439,2,43.659526,-79.340923,braised,43.660452,-79.343346,American Restaurant
34,Studio District,0.02439,2,43.659526,-79.340923,Leslieville Cheese Market,43.660546,-79.342302,Cheese Shop
34,Studio District,0.02439,2,43.659526,-79.340923,Purple Penguin Cafe,43.660501,-79.342565,Café
34,Studio District,0.02439,2,43.659526,-79.340923,eastside social,43.661289,-79.339155,Comfort Food Restaurant


## Observations

Most of Latin restaurants are in Cluster 1 which is around Adelaide, King, Richmond areas and lowest (close to zero) in Cluster 0 areas which are North Toronto West and Parkdale areas. Also, there are good opportunities to open near Studio Distric , Runnymeda and Swansea  as the competition seems to be low. Looking at nearby venues, it seems Cluster 2 might be a good location as there are not a lot of Latin Amercican restaurants in these areas. Therefore, this project recommends the entrepreneur to open an authentic Peruvian restaurant in these locations with little to no competition. Nonetheless, if the food is authentic, affordable and good taste, I am confident that it will have great following everywhere :)
