# Introduction

In Toronto, Canada there is a ever expanding demand for new restaurants as new tech companies roll in and residents demands new food options. As one of the fastest growing cities in the Canada there is no doubt a market for a new restutant despite the already established food industry. Every famous chain from Canada usually has some start in Toronto, but from all the 82 chains started in Canada there are only a handful of Pizzerias. This is why any new restaurant looking to get a hot start should start there get raving reviews from the hordes of foodies that explore the city in search of new food spots. One of the weakest points for food in Toronto however, is its Pizza scene with few truly amazing pizza places to choose from. This is where we must capitalize on the markets lack of exposure to Pizza before anyone else does to claim the spot as the top Pizza resuturant in Canada.

# Business Problem

Location, location, location. As much as people have begun to drive more because of COVID-19 the location is vital for success. The new restaurant must not be too far from the food scene to be a burden for customers, but it can't be washed away buy competing restaurant nearby if there is too much competition. To find the perfect balance there must be a balance in the amount of neighboring restaurant that will surround the potential location for this game changing restaurant. The best way to figure that out is to see how many amenities there are nearby to attract local and visiting tourists. From there we can see how many restaurants are in the surrounding area to be able to choose an exact spot for the new Pizzeria. With an estimated 1.8 Million new jobs opening within the restaurant industry it’s easy to see why now is the perfect time to open a new restaurant.

# Data Description

In this project we will use a list of neighborhoods within the Toronto area (via Wikipedia) to understand the general layout of the city. From there we will use a Geocoder package to find the geographical layout of the city and understand where populations of people are residing at. Finally the use of Foursquare will be used to find venue data for exsisting restaurants in certain areas, we will find out if there are already exsisting italian or pizza restaurants nearby that could compete with the location of the restaurants. 

### Import libraries for data analysis

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis

import requests # library to handle requests
from pandas.io.json import json_normalize
import json
!pip install geopy
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors


# import k-means from clustering stage
from sklearn.cluster import KMeans
from bs4 import BeautifulSoup

print("installed packages")

installed packages


In [2]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!pip install folium
import folium

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/a4/f0/44e69d50519880287cc41e7c8a6acc58daa9a9acf5f6afc52bcc70f69a6d/folium-0.11.0-py2.py3-none-any.whl (93kB)
[K     |████████████████████████████████| 102kB 9.1MB/s ta 0:00:011
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/13/fb/9eacc24ba3216510c6b59a4ea1cd53d87f25ba76237d7f4393abeaf4c94e/branca-0.4.1-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0


## Data Overiew

### Importing the data
Using BeautifulSoup we can parse the data into a pandas dataframe

In [3]:
source_wiki = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(source_wiki,'lxml')
print(soup.title)

<title>List of postal codes of Canada: M - Wikipedia</title>


In [4]:
table = soup.find("table")
table_rows = table.tbody.find_all("tr")

res = []
for tr in table_rows:
    td = tr.find_all("td")
    row = [tr.text for tr in td]
    
# Only process the cells that have an assigned borough then ignore cells with a borough that is Not assigned.
    if row != [] and row[1] != "Not assigned":
        # If a cell has a borough but a "Not assigned" neighborhood, then the neighborhood will be the same as the borough.
        if "Not assigned" in row[2]: 
            row[2] = row[1]
        res.append(row)

# Dataframe with 3 columns
data = pd.DataFrame(res, columns = ["Postal Code", "Borough", "Neighborhood"])
data.shape

(180, 3)

# Data Cleansing
Fixing the data so that it will be fit for data sorting/analysis

In [5]:
data["Neighborhood"] = data["Neighborhood"].str.replace("\n","")
data["Postal Code"] = data["Postal Code"].str.replace("\n","")
data["Borough"] = data["Borough"].str.replace("\n","")
data

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"


This will remove any data without Boroughs and move it out of the dataframe

In [6]:
data1=data[data['Borough']!='Not assigned']
data1.head()
data1.shape

(103, 3)

If a neighborhood does not have any information then it will be matched with the borough

In [7]:
data2 = data1.groupby(['Postal Code','Borough'], sort=False).agg(', '.join)
data2.reset_index(inplace=True)

# Replacing the name of the neighborhoods that are 'not assigned with Borough names
data2['Neighborhood'] = np.where(data2['Neighborhood'] == 'Not assigned',data2['Borough'], data2['Neighborhood'])

data2

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [8]:
latnlong = pd.read_csv('https://cocl.us/Geospatial_data')
latnlong.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Adding the postal codes to all the neighborhoods by merging the two dataframes.

In [9]:
data3 = pd.merge(data2,latnlong,on= 'Postal Code')
data3.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


Here the latitude and longitude are defined by the coordinates in Toronto.

In [10]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

43.6534817 -79.3839347


Whatever the search is that we are looking for is defined here to then look it up on Foursquare.

In [57]:
search_query = 'Italian restaurant'
radius = 100000
print(search_query)

Italian restaurant


The foursqaure API is then defined here to call all the information based on the search we are looking for.

In [58]:
client_id = '4FDSHEKANNSBXRTYHP55P55UZTIJDX5LF4FA01BSWKMQ10JB'
CLIENT_SECRET = 'H1DNKFCHPZXGKE50AVCSFSN1KYULWGNTCBUGRHOJOPBHMEPM' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 1000



url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(client_id, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=4FDSHEKANNSBXRTYHP55P55UZTIJDX5LF4FA01BSWKMQ10JB&client_secret=H1DNKFCHPZXGKE50AVCSFSN1KYULWGNTCBUGRHOJOPBHMEPM&ll=43.6534817,-79.3839347&v=20180604&query=Italian restaurant&radius=100000&limit=1000'

Below we now get all of our information to fill out the tables and gather the final information to begin our analysis. 

In [59]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f2e26d61db5810ffb346934'},
 'response': {'venues': [{'id': '4e74ce151838f918898efe72',
    'name': 'Roma Italian Restaurant',
    'location': {'address': '6350 Tomken Rd.',
     'crossStreet': 'Tristar',
     'lat': 43.652859135693596,
     'lng': -79.66803991906222,
     'labeledLatLngs': [{'label': 'display',
       'lat': 43.652859135693596,
       'lng': -79.66803991906222}],
     'distance': 22882,
     'cc': 'CA',
     'city': 'Mississauga',
     'state': 'ON',
     'country': 'Canada',
     'formattedAddress': ['6350 Tomken Rd. (Tristar)',
      'Mississauga ON',
      'Canada']},
    'categories': [{'id': '4bf58dd8d48988d10f941735',
      'name': 'Indian Restaurant',
      'pluralName': 'Indian Restaurants',
      'shortName': 'Indian',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/indian_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1596860160',
    'hasPerk': False},
   {'id': '4de024f0

After gathering all the information has to be transformed from a JSON value into a dataframe

In [60]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

Unnamed: 0,categories,delivery.id,delivery.provider.icon.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.name,delivery.url,hasPerk,id,location.address,...,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'id': '4bf58dd8d48988d10f941735', 'name': 'I...",,,,,,,False,4e74ce151838f918898efe72,6350 Tomken Rd.,...,"[6350 Tomken Rd. (Tristar), Mississauga ON, Ca...","[{'label': 'display', 'lat': 43.65285913569359...",43.652859,-79.66804,,,ON,Roma Italian Restaurant,v-1596860160,
1,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",,,,,,,False,4de024f0b0fbe2cfa5fee3c4,,...,"[Toronto ON, Canada]","[{'label': 'display', 'lat': 43.67656199554484...",43.676562,-79.355699,,,ON,Florentina's Italian Restaurant,v-1596860160,
2,"[{'id': '4bf58dd8d48988d1c4941735', 'name': 'R...",,,,,,,False,4b4f6a73f964a520960527e3,24-40 Bradwick Dr,...,[24-40 Bradwick Dr (btw Keele St. & Dufferin S...,"[{'label': 'display', 'lat': 43.8182382, 'lng'...",43.818238,-79.485024,,L4K 1K9,ON,Junnio's Italian Restaurant,v-1596860160,
3,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",,,,,,,False,4b199b46f964a5205be023e3,2625 Weston,...,"[2625 Weston (401), Toronto ON, Canada]","[{'label': 'display', 'lat': 43.71194605276710...",43.711946,-79.53151,,,ON,Jolly II Italian Restaurant,v-1596860160,
4,"[{'id': '4bf58dd8d48988d14e941735', 'name': 'A...",,,,,,,False,4b107754f964a520147123e3,4505 Sheppard Ave E,...,"[4505 Sheppard Ave E, Scarborough ON M1S 1V3, ...","[{'label': 'display', 'lat': 43.788071, 'lng':...",43.788071,-79.265134,,M1S 1V3,ON,Joey Bravo's Italian Restaurant,v-1596860160,


Understanding the data begins with assigning the correct values for things like categories of all the venues. 

In [61]:
filtered_columns = ['name', 'categories', 'location.lat', 'location.lng']
data_filter = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        cat_list = row['categories']
    except:
        cat_list = row['venue.categories']
        
    if len(cat_list) == 0:
        return None
    else:
        return cat_list[0]['name']

# filter the category for each row
data_filter['categories'] = data_filter.apply(get_category_type, axis=1)

# clean column names by keeping only last term
data_filter.columns = [column.split('.')[-1] for column in data_filter.columns]

data_filter

Unnamed: 0,name,categories,lat,lng
0,Roma Italian Restaurant,Indian Restaurant,43.652859,-79.66804
1,Florentina's Italian Restaurant,Italian Restaurant,43.676562,-79.355699
2,Junnio's Italian Restaurant,Restaurant,43.818238,-79.485024
3,Jolly II Italian Restaurant,Italian Restaurant,43.711946,-79.53151
4,Joey Bravo's Italian Restaurant,American Restaurant,43.788071,-79.265134
5,Il Porcellino Italian Restaurant And Catering,Food Service,43.66758,-79.66792
6,Buda's Italian Restaurant,,43.703068,-79.646597
7,Mia Italian Restaurant,Italian Restaurant,43.688605,-79.672008
8,Marchellos italian restaurant,Italian Restaurant,43.887535,-79.499824
9,Roccos italian restaurant,,43.446402,-79.666352


# Data Cleaning 



### Data Exploration
We map our new values to understand what the spread of venues looks like. We can get a good idea of where our competition is situated to better pick a place to put our restaurant. 

In [62]:
data_filter.name

venues_map = folium.Map(location=[latitude, longitude], zoom_start=12) # generate map centred around the Conrad Hotel

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(data_filter.lat, data_filter.lng, data_filter.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

To better understand the neighborhoods we are looking into we will get all the venues nearby so we can better associate locations with whats nearby.

In [66]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT = 1000):
    
    venue_listing=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            client_id, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venue_listing.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venue_listing for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [67]:
toronto_venues = getNearbyVenues(names=data3['Neighborhood'],
                                   latitudes=data3['Latitude'],
                                   longitudes=data3['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

ValueError: Length mismatch: Expected axis has 1 elements, new values have 7 elements

Here we can merge our new set of venues and have them merged with our first data set with longitudes and latitudes to get a data frame with all of our values.

In [None]:
print(toronto_venues.shape)
toronto_venues.head()

In [None]:
toronto_venues.groupby('Neighborhood').count()

## Machine Learning
Below we will use the one hot method to better understand what exactly each neighborhood has to offer, the more things within a certain neighborhood the better chances that there will be more people to come to our restaurant.

In [21]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [22]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Agincourt,0.000000,0.000,0.000000,0.000000,0.0000,0.000,0.000,0.000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.000000
1,"Alderwood, Long Branch",0.000000,0.000,0.000000,0.000000,0.0000,0.000,0.000,0.000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.000000
2,"Bathurst Manor, Wilson Heights, Downsview North",0.000000,0.000,0.000000,0.000000,0.0000,0.000,0.000,0.000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.000000
3,Bayview Village,0.000000,0.000,0.000000,0.000000,0.0000,0.000,0.000,0.000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.000000
4,"Bedford Park, Lawrence Manor East",0.000000,0.000,0.000000,0.000000,0.0000,0.000,0.000,0.000,0.037037,...,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.000000
5,Berczy Park,0.000000,0.000,0.000000,0.000000,0.0000,0.000,0.000,0.000,0.000000,...,0.00,0.017544,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.000000
6,"Birch Cliff, Cliffside West",0.000000,0.000,0.000000,0.000000,0.0000,0.000,0.000,0.000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.000000
7,"Brockton, Parkdale Village, Exhibition Place",0.080000,0.000,0.000000,0.000000,0.0000,0.000,0.000,0.000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.000000
8,"Business reply mail Processing Centre, South C...",0.052632,0.000,0.000000,0.000000,0.0000,0.000,0.000,0.000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.000000
9,"CN Tower, King and Spadina, Railway Lands, Har...",0.000000,0.000,0.000000,0.062500,0.0625,0.125,0.125,0.125,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.000000


Here we will group our categories and find the frequency for the top five categories within a neighborhood

In [69]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                       venue  freq
0                     Lounge  0.25
1  Latin American Restaurant  0.25
2             Breakfast Spot  0.25
3               Skating Rink  0.25
4              Metro Station  0.00


----Alderwood, Long Branch----
            venue  freq
0     Pizza Place  0.25
1  Sandwich Place  0.12
2     Coffee Shop  0.12
3            Pool  0.12
4             Pub  0.12


----Bathurst Manor, Wilson Heights, Downsview North----
                venue  freq
0                Bank  0.09
1         Coffee Shop  0.09
2         Pizza Place  0.04
3      Ice Cream Shop  0.04
4  Frozen Yogurt Shop  0.04


----Bayview Village----
                 venue  freq
0                 Café  0.25
1                 Bank  0.25
2   Chinese Restaurant  0.25
3  Japanese Restaurant  0.25
4  Moroccan Restaurant  0.00


----Bedford Park, Lawrence Manor East----
                venue  freq
0  Italian Restaurant  0.11
1      Sandwich Place  0.07
2          Restaurant  0.07
3         Cof

In [24]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

## Data Analysis

Once we find our values we will sort them into a dataframe to better analyze the neighborhood

In [25]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neigh_venue_sort = pd.DataFrame(columns=columns)
neigh_venue_sort['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neigh_venue_sort.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neigh_venue_sort

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Latin American Restaurant,Lounge,Skating Rink,Breakfast Spot,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant
1,"Alderwood, Long Branch",Pizza Place,Gym,Coffee Shop,Pharmacy,Sandwich Place,Pub,Pool,Women's Store,Diner,Deli / Bodega
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Middle Eastern Restaurant,Frozen Yogurt Shop,Deli / Bodega,Supermarket,Sushi Restaurant,Restaurant,Shopping Mall,Mobile Phone Shop
3,Bayview Village,Chinese Restaurant,Café,Bank,Japanese Restaurant,Women's Store,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
4,"Bedford Park, Lawrence Manor East",Italian Restaurant,Sandwich Place,Coffee Shop,Restaurant,Thai Restaurant,Pub,Café,Indian Restaurant,Sushi Restaurant,Fast Food Restaurant
5,Berczy Park,Coffee Shop,Farmers Market,Cheese Shop,Cocktail Bar,Restaurant,Beer Bar,Café,Bakery,Seafood Restaurant,Pharmacy
6,"Birch Cliff, Cliffside West",General Entertainment,College Stadium,Café,Skating Rink,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run
7,"Brockton, Parkdale Village, Exhibition Place",Café,Yoga Studio,Bakery,Coffee Shop,Breakfast Spot,Convenience Store,Performing Arts Venue,Pet Store,Climbing Gym,Restaurant
8,"Business reply mail Processing Centre, South C...",Light Rail Station,Gym / Fitness Center,Garden Center,Skate Park,Restaurant,Recording Studio,Pizza Place,Park,Garden,Spa
9,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Lounge,Airport Service,Airport Terminal,Coffee Shop,Harbor / Marina,Plane,Rental Car Location,Sculpture Garden,Boutique,Bar


This will split the clusters and merge them into our dataframe

In [26]:
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

kmeans.labels_[0:10]

array([2, 0, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int32)

In [27]:
neigh_venue_sort

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Latin American Restaurant,Lounge,Skating Rink,Breakfast Spot,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant
1,"Alderwood, Long Branch",Pizza Place,Gym,Coffee Shop,Pharmacy,Sandwich Place,Pub,Pool,Women's Store,Diner,Deli / Bodega
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Middle Eastern Restaurant,Frozen Yogurt Shop,Deli / Bodega,Supermarket,Sushi Restaurant,Restaurant,Shopping Mall,Mobile Phone Shop
3,Bayview Village,Chinese Restaurant,Café,Bank,Japanese Restaurant,Women's Store,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
4,"Bedford Park, Lawrence Manor East",Italian Restaurant,Sandwich Place,Coffee Shop,Restaurant,Thai Restaurant,Pub,Café,Indian Restaurant,Sushi Restaurant,Fast Food Restaurant
5,Berczy Park,Coffee Shop,Farmers Market,Cheese Shop,Cocktail Bar,Restaurant,Beer Bar,Café,Bakery,Seafood Restaurant,Pharmacy
6,"Birch Cliff, Cliffside West",General Entertainment,College Stadium,Café,Skating Rink,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run
7,"Brockton, Parkdale Village, Exhibition Place",Café,Yoga Studio,Bakery,Coffee Shop,Breakfast Spot,Convenience Store,Performing Arts Venue,Pet Store,Climbing Gym,Restaurant
8,"Business reply mail Processing Centre, South C...",Light Rail Station,Gym / Fitness Center,Garden Center,Skate Park,Restaurant,Recording Studio,Pizza Place,Park,Garden,Spa
9,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Lounge,Airport Service,Airport Terminal,Coffee Shop,Harbor / Marina,Plane,Rental Car Location,Sculpture Garden,Boutique,Bar


Now we have locations, neighborhoods and the top ten venues

In [30]:
neigh_venue_sort.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = data3

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neigh_venue_sort.set_index('Neighborhood'), on='Neighborhood', how = 'right')

toronto_merged.head() 

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,1,Convenience Store,Park,Food & Drink Shop,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Women's Store
1,M4A,North York,Victoria Village,43.725882,-79.315572,2,Portuguese Restaurant,Hockey Arena,Coffee Shop,Pizza Place,Financial or Legal Service,French Restaurant,Department Store,Dessert Shop,Dim Sum Restaurant,Diner
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,2,Coffee Shop,Pub,Bakery,Park,Theater,Breakfast Spot,Café,Farmers Market,Restaurant,Performing Arts Venue
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,2,Furniture / Home Store,Clothing Store,Coffee Shop,Boutique,Gift Shop,Event Space,Vietnamese Restaurant,Accessories Store,Dessert Shop,Ethiopian Restaurant
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,2,Coffee Shop,Diner,Yoga Studio,College Auditorium,Bar,Beer Bar,Smoothie Shop,Sandwich Place,Burrito Place,Café


Lets compare our two maps to vizualize what is going on in each neighborhood. We can clearly see that cluster 2 is the busiest thus giving us the best chance for more foot traffic to our restaurant.

In [68]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [33]:
data_filter.name

venues_map = folium.Map(location=[latitude, longitude], zoom_start=10) # generate map centred around the Conrad Hotel

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(data_filter.lat, data_filter.lng, data_filter.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

Below we can reaffrim whhat we saw within our maps to best pick where we can put our restaurant. After seeing where most of the venues are located we can safely say that the second cluster gives us the best chance to grow our business.

In [34]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,1,Convenience Store,Park,Food & Drink Shop,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Women's Store
21,York,1,Park,Pool,Women's Store,Golf Course,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant
52,North York,1,Park,Women's Store,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore
64,York,1,Park,Women's Store,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore
66,North York,1,Convenience Store,Park,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Women's Store
85,Scarborough,1,Park,Playground,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop
91,Downtown Toronto,1,Park,Trail,Playground,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant


In [35]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,2,Portuguese Restaurant,Hockey Arena,Coffee Shop,Pizza Place,Financial or Legal Service,French Restaurant,Department Store,Dessert Shop,Dim Sum Restaurant,Diner
2,Downtown Toronto,2,Coffee Shop,Pub,Bakery,Park,Theater,Breakfast Spot,Café,Farmers Market,Restaurant,Performing Arts Venue
3,North York,2,Furniture / Home Store,Clothing Store,Coffee Shop,Boutique,Gift Shop,Event Space,Vietnamese Restaurant,Accessories Store,Dessert Shop,Ethiopian Restaurant
4,Downtown Toronto,2,Coffee Shop,Diner,Yoga Studio,College Auditorium,Bar,Beer Bar,Smoothie Shop,Sandwich Place,Burrito Place,Café
7,North York,2,Gym,Coffee Shop,Japanese Restaurant,Beer Store,Restaurant,Café,Athletics & Sports,Bubble Tea Shop,Sandwich Place,Bike Shop
13,North York,2,Gym,Coffee Shop,Japanese Restaurant,Beer Store,Restaurant,Café,Athletics & Sports,Bubble Tea Shop,Sandwich Place,Bike Shop
9,Downtown Toronto,2,Clothing Store,Coffee Shop,Café,Japanese Restaurant,Italian Restaurant,Cosmetics Shop,Bubble Tea Shop,Lingerie Store,Pizza Place,Theater
10,North York,2,Park,Sushi Restaurant,Japanese Restaurant,Pub,Women's Store,Dog Run,Dim Sum Restaurant,Diner,Discount Store,Distribution Center
12,Scarborough,2,Bar,Women's Store,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop,Farmers Market
14,East York,2,Park,Skating Rink,Beer Store,Athletics & Sports,Video Store,Dance Studio,Curling Ice,Bus Stop,Dog Run,Diner


In [36]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Scarborough,3,Fast Food Restaurant,Department Store,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant


In [37]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
57,North York,4,Food Service,Baseball Field,Women's Store,Doner Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop,Dessert Shop
101,Etobicoke,4,Baseball Field,Women's Store,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Farmers Market
