# THE BATTLE OF NEIGHBORHOODS PROJECT

##### Applied Data Science Capstone Project by Yukun Zhao

## Table of Contents
* [Introduction](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results](#results)
* [Discussion](#discussion)
* [Conclusion](#conclusion)

## Introduction <a name="introduction"></a>

In this project, we will try to find an optimal location for opening a Chinese restaurant in York Region, Ontario, Canada. 

We will look at population and location data to find a neighborhood in York Region that is suitable for opening the Chinese restaurant. 

In addition, we should also look into the competitions in the neighborhood. This could be be done using the location data to find out if there are opportunities to provide something that is not yet available.

## Data <a name="data"></a>

Factors that will influence our decision are:
* number of Chinese households in the city,
* number of existing restaurants in the neighborhood (Chinese or other asian restaurants),
* number of parks, local event venues, malls.

Following data sources will be needed to obtain the required information:
* 2016 Census of Canada from Statistics Canada,
* locations of malls and plazas in the city of interest,
* list of restaurants and their type and location will be obtained using **Foursquare Places API**.

## Methodology <a name="methodology"></a>

In this project, we will start with finding the city in York Region that has the largest Chinese community.

Next we will look at the malls and plazas in the city of interest. We will find the location and type (category) of every restaurant within 1km radius from the center of each place. We will also identify other venues, such as parks, gyms and dessert places in vicinity.

Third step in our analysis will be exploration of venues. We will find the venue categories, and use clustering to group the malls and plazas based these venue categories. These clusters should be a starting point for final 'street level' exploration and search for optimal venue location by our client.

In the final step, we will focus on those promising areas. We will look into the type of Chinese cuisines served in these areas. This should give our client some ideas what food should be served.

In [1]:
import numpy as np
import pandas as pd
pd.set_option('display.max_rows', 101)
pd.set_option('display.max_columns', None)

!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt

# import k-means from clustering stage
from sklearn.cluster import AgglomerativeClustering, KMeans

!pip install folium
import folium # map rendering library

print('Libraries imported.')

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/fd/a0/ccb3094026649cda4acd55bf2c3822bb8c277eb11446d13d384e5be35257/folium-0.10.1-py2.py3-none-any.whl (91kB)
[K     |████████████████████████████████| 92kB 20.6MB/s eta 0:00:01
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/81/6d/31c83485189a2521a75b4130f1fee5364f772a0375f81afff619004e5237/branca-0.4.0-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.4.0 folium-0.10.1
Libraries imported.


### 1. Find the city with the largest Chinese community

In [2]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,OBJECTID,DAUID,CSDUID,CSDNAME,TOT_VISIBLE_MINORITY_PVT_HH,TOT_VISIBLE_MINORITY_POP,SOUTH_ASIAN,CHINESE,BLACK,FILIPINO,LATIN_AMERICAN,ARAB,SOUTHEAST_ASIAN,WEST_ASIAN,KOREAN,JAPANESE,VISIBLE_MINORITY_NIE,MULTIPLE_VISIBLE_MINORITY,NOT_VISIBLE_MINORITY,SHAPE.STArea(),SHAPE.STLength()
0,1,35191319,3519028,Vaughan,1055.0,440.0,55.0,130.0,15.0,70.0,0.0,0.0,10.0,25.0,125.0,0.0,10.0,0.0,615.0,165700.6,1881.204066
1,2,35191314,3519028,Vaughan,1230.0,490.0,95.0,210.0,0.0,35.0,0.0,0.0,0.0,45.0,45.0,0.0,15.0,40.0,740.0,635367.9,3568.832137
2,3,35191037,3519028,Vaughan,6330.0,2565.0,925.0,505.0,225.0,335.0,105.0,120.0,85.0,135.0,45.0,20.0,45.0,25.0,3765.0,1405759.0,5055.080921
3,4,35190824,3519028,Vaughan,2150.0,455.0,280.0,70.0,35.0,15.0,0.0,0.0,20.0,10.0,0.0,0.0,25.0,10.0,1700.0,646140.7,4272.214232
4,5,35190825,3519028,Vaughan,565.0,100.0,70.0,0.0,0.0,0.0,15.0,0.0,10.0,10.0,0.0,0.0,0.0,0.0,465.0,253793.9,2756.348141


#### clean the census data

In [3]:
# drop the irrelevant columns
column_names = census2016.columns.values
drop_index = [0, 1, 2, 4, 5, 16, 17, 18, 19, 20]
drop_columns = []
for idx in drop_index:
    drop_columns.append(column_names[idx])
census2016.drop(labels = drop_columns, axis = 1, inplace = True)
census2016.head()

Unnamed: 0,CSDNAME,SOUTH_ASIAN,CHINESE,BLACK,FILIPINO,LATIN_AMERICAN,ARAB,SOUTHEAST_ASIAN,WEST_ASIAN,KOREAN,JAPANESE
0,Vaughan,55.0,130.0,15.0,70.0,0.0,0.0,10.0,25.0,125.0,0.0
1,Vaughan,95.0,210.0,0.0,35.0,0.0,0.0,0.0,45.0,45.0,0.0
2,Vaughan,925.0,505.0,225.0,335.0,105.0,120.0,85.0,135.0,45.0,20.0
3,Vaughan,280.0,70.0,35.0,15.0,0.0,0.0,20.0,10.0,0.0,0.0
4,Vaughan,70.0,0.0,0.0,0.0,15.0,0.0,10.0,10.0,0.0,0.0


In [4]:
# change CSDNAME to City
census2016.rename(columns = {'CSDNAME': 'CITY'}, inplace = True)
census2016.head()

Unnamed: 0,CITY,SOUTH_ASIAN,CHINESE,BLACK,FILIPINO,LATIN_AMERICAN,ARAB,SOUTHEAST_ASIAN,WEST_ASIAN,KOREAN,JAPANESE
0,Vaughan,55.0,130.0,15.0,70.0,0.0,0.0,10.0,25.0,125.0,0.0
1,Vaughan,95.0,210.0,0.0,35.0,0.0,0.0,0.0,45.0,45.0,0.0
2,Vaughan,925.0,505.0,225.0,335.0,105.0,120.0,85.0,135.0,45.0,20.0
3,Vaughan,280.0,70.0,35.0,15.0,0.0,0.0,20.0,10.0,0.0,0.0
4,Vaughan,70.0,0.0,0.0,0.0,15.0,0.0,10.0,10.0,0.0,0.0


In [5]:
# group the data by city
census2016_sum = census2016.groupby(by = 'CITY').sum()
census2016_sum

Unnamed: 0_level_0,SOUTH_ASIAN,CHINESE,BLACK,FILIPINO,LATIN_AMERICAN,ARAB,SOUTHEAST_ASIAN,WEST_ASIAN,KOREAN,JAPANESE
CITY,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Aurora,1840.0,5575.0,1200.0,935.0,670.0,575.0,300.0,1870.0,595.0,195.0
Chippewas of Georgina Island First Nation,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
East Gwillimbury,470.0,690.0,335.0,165.0,100.0,85.0,70.0,255.0,90.0,60.0
Georgina,685.0,670.0,695.0,335.0,375.0,100.0,130.0,185.0,95.0,70.0
King,930.0,445.0,395.0,160.0,170.0,125.0,140.0,320.0,160.0,70.0
Markham,58255.0,147710.0,9700.0,8945.0,1730.0,3240.0,2460.0,7940.0,4350.0,1015.0
Newmarket,3265.0,5565.0,2295.0,1835.0,1225.0,660.0,2275.0,1970.0,695.0,325.0
Richmond Hill,14910.0,56970.0,3850.0,3790.0,1580.0,3565.0,1415.0,20120.0,5485.0,625.0
Vaughan,30655.0,20790.0,8345.0,8645.0,7415.0,4325.0,6840.0,8650.0,5360.0,300.0
Whitchurch-Stouffville,5625.0,5965.0,995.0,1135.0,330.0,435.0,295.0,335.0,190.0,150.0


In [6]:
# sort by number of Chinese
census2016_sum.sort_values(by = ['CHINESE'], axis = 0, ascending = False, inplace = True)
census2016_sum

Unnamed: 0_level_0,SOUTH_ASIAN,CHINESE,BLACK,FILIPINO,LATIN_AMERICAN,ARAB,SOUTHEAST_ASIAN,WEST_ASIAN,KOREAN,JAPANESE
CITY,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Markham,58255.0,147710.0,9700.0,8945.0,1730.0,3240.0,2460.0,7940.0,4350.0,1015.0
Richmond Hill,14910.0,56970.0,3850.0,3790.0,1580.0,3565.0,1415.0,20120.0,5485.0,625.0
Vaughan,30655.0,20790.0,8345.0,8645.0,7415.0,4325.0,6840.0,8650.0,5360.0,300.0
Whitchurch-Stouffville,5625.0,5965.0,995.0,1135.0,330.0,435.0,295.0,335.0,190.0,150.0
Aurora,1840.0,5575.0,1200.0,935.0,670.0,575.0,300.0,1870.0,595.0,195.0
Newmarket,3265.0,5565.0,2295.0,1835.0,1225.0,660.0,2275.0,1970.0,695.0,325.0
East Gwillimbury,470.0,690.0,335.0,165.0,100.0,85.0,70.0,255.0,90.0,60.0
Georgina,685.0,670.0,695.0,335.0,375.0,100.0,130.0,185.0,95.0,70.0
King,930.0,445.0,395.0,160.0,170.0,125.0,140.0,320.0,160.0,70.0
Chippewas of Georgina Island First Nation,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


We now know that Markham has the largest Chinese community in York Region. Let us sort the number in Markham to see what the numbers are for each visible minority group.

In [7]:
# sort the columns by the values in Markham
census2016_sum.sort_values(by = ['Markham'], axis = 1, ascending = False, inplace = True)
census2016_sum

Unnamed: 0_level_0,CHINESE,SOUTH_ASIAN,BLACK,FILIPINO,WEST_ASIAN,KOREAN,ARAB,SOUTHEAST_ASIAN,LATIN_AMERICAN,JAPANESE
CITY,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Markham,147710.0,58255.0,9700.0,8945.0,7940.0,4350.0,3240.0,2460.0,1730.0,1015.0
Richmond Hill,56970.0,14910.0,3850.0,3790.0,20120.0,5485.0,3565.0,1415.0,1580.0,625.0
Vaughan,20790.0,30655.0,8345.0,8645.0,8650.0,5360.0,4325.0,6840.0,7415.0,300.0
Whitchurch-Stouffville,5965.0,5625.0,995.0,1135.0,335.0,190.0,435.0,295.0,330.0,150.0
Aurora,5575.0,1840.0,1200.0,935.0,1870.0,595.0,575.0,300.0,670.0,195.0
Newmarket,5565.0,3265.0,2295.0,1835.0,1970.0,695.0,660.0,2275.0,1225.0,325.0
East Gwillimbury,690.0,470.0,335.0,165.0,255.0,90.0,85.0,70.0,100.0,60.0
Georgina,670.0,685.0,695.0,335.0,185.0,95.0,100.0,130.0,375.0,70.0
King,445.0,930.0,395.0,160.0,320.0,160.0,125.0,140.0,170.0,70.0
Chippewas of Georgina Island First Nation,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


We now know that Markham has the largest Chinese community. The numbers for other communities are useful when we select the food that we will serve. We should now look at the neighborhoods in Markham.

### 2. Explore Markham

In [8]:
# load malls and plazas in York Region
body2 = client_60fde9a69b664de08697ac05914adba4.get_object(Bucket='capstone-donotdelete-pr-rtwgqykbmttzjy',Key='mall_plaza.xlsx')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

mall_plaza = pd.read_excel(body2)
mall_plaza.head()

Unnamed: 0,CITY,NAME,ADDRESS
0,Markham,Alderland Center,"7102-7158 Warden Ave, Markham, ON"
1,Markham,CF Markville,"5000 Hwy 7, Markham, ON"
2,Markham,First Markham Place,"3255 Highway 7 East, Markham, ON"
3,Markham,J-Town,"160 Steeles Ave E, Markham, ON"
4,Markham,King Square Shopping Mall,"9390 Woodbine Ave, Markham, ON"


In [9]:
# use geopy to get the geographic coordinates for these places
mall_plaza_latitude = []
mall_plaza_longitude = []
geolocator = Nominatim(user_agent = "markham_explorer")

for idx in mall_plaza.index.values:
    # use the address first
    address = mall_plaza.loc[idx, 'ADDRESS']
    location = geolocator.geocode(address)
    if location == None:
        # use name
        address = mall_plaza.loc[idx, 'NAME']
        location = geolocator.geocode(address)
        mall_plaza_latitude.append(location.latitude)
        mall_plaza_longitude.append(location.longitude)
    else:
        mall_plaza_latitude.append(location.latitude)
        mall_plaza_longitude.append(location.longitude)

# add the coordinates to the dataframe
mall_plaza['LATITUDE'] = mall_plaza_latitude
mall_plaza['LONGITUDE'] = mall_plaza_longitude
mall_plaza

Unnamed: 0,CITY,NAME,ADDRESS,LATITUDE,LONGITUDE
0,Markham,Alderland Center,"7102-7158 Warden Ave, Markham, ON",43.824235,-79.326603
1,Markham,CF Markville,"5000 Hwy 7, Markham, ON",43.865694,-79.291642
2,Markham,First Markham Place,"3255 Highway 7 East, Markham, ON",43.851747,-79.348356
3,Markham,J-Town,"160 Steeles Ave E, Markham, ON",43.801437,-79.405085
4,Markham,King Square Shopping Mall,"9390 Woodbine Ave, Markham, ON",43.872314,-79.365975
5,Markham,Langham Square,"8339 Kennedy Rd, Markham, ON",43.8574,-79.303767
6,Markham,Markham Town Square,"8601 Warden Ave, Markham, ON",43.857494,-79.331529
7,Markham,Market Village,"4390 Steeles Avenue East, Markham, ON",43.824568,-79.303737
8,Markham,Metro Square,"3636 Steeles Ave E, Markham, ON",43.817998,-79.331931
9,Markham,New Kennedy Square,"8360 Kennedy Rd, Markham, ON",43.859604,-79.302949


In [10]:
# see Markham neighborhoods on a map
address = 'Markham, ON'
geolocator = Nominatim(user_agent = "markham_explorer")
location = geolocator.geocode(address)
latitude_markham = location.latitude
longitude_markham = location.longitude

# create map object
map_markham = folium.Map(location=[latitude_markham, longitude_markham], zoom_start=11)

# add markers to map
for lat, lng, label in zip(mall_plaza['LATITUDE'], mall_plaza['LONGITUDE'], mall_plaza['NAME']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False).add_to(map_markham)  
    
map_markham

We can tell from the map that all the plazas and malls have parking lots, and they are close to some residential areas.

#### find the venues around these malls or plazas from Foursquare

In [11]:
# The code was removed by Watson Studio for sharing.

In [12]:
def getNearbyVenues(names, latitudes, longitudes, LIMIT = 100, radius = 1000):
    
    '''
    
    names: mall/plaza names,
    latitudes, longitudes: geospatial coordinates of the neighborhoods,
    Limit: maximum number of venues returned,
    radius: search radius in meters.
    
    '''
    
    venues_list = []
    count = 0
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name'],
            v['venue']['id']) for v in results])
        count += 1

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Mall/plaza', 
                  'Mall/plaza Latitude', 
                  'Mall/plaza Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category',
                  'Venue ID']
    
    #show number of neighborhood explored
    print('\n'+ " ".join([str(count), 'neighborhoods explored.']))
    
    return(nearby_venues)

In [13]:
markham_venues = getNearbyVenues(names = mall_plaza['NAME'],
                                   latitudes = mall_plaza['LATITUDE'],
                                   longitudes = mall_plaza['LONGITUDE'],
                                  LIMIT = 150, radius = 1000
                                )
print(markham_venues.shape)
markham_venues.head()

Alderland Center
CF Markville
First Markham Place
J-Town
King Square Shopping Mall
Langham Square
Markham Town Square
Market Village
Metro Square
New Kennedy Square
Pacific Mall
Peachtree Center
The Shops on Steeles and 404
Thornhill Square Shopping Center

14 neighborhoods explored.
(623, 8)


Unnamed: 0,Mall/plaza,Mall/plaza Latitude,Mall/plaza Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue ID
0,Alderland Center,43.824235,-79.326603,T&T Supermarket 大統華超級市場,43.821925,-79.326174,Supermarket,4af0b925f964a52092de21e3
1,Alderland Center,43.824235,-79.326603,Monte Carlo Inn & Suites Downtown Markham,43.828094,-79.325874,Hotel,4e248475d16474063224c3b0
2,Alderland Center,43.824235,-79.326603,Chatime,43.824342,-79.326062,Bubble Tea Shop,526176ea11d2a41c253a55e6
3,Alderland Center,43.824235,-79.326603,Yin Ji Chang Fen 銀記腸粉店,43.820277,-79.325142,Chinese Restaurant,58c1e6e801f4336e21f31b50
4,Alderland Center,43.824235,-79.326603,Asian Legend 味香村,43.818846,-79.331058,Chinese Restaurant,4babe7aaf964a520efd33ae3


#### analyze the venues

In [14]:
# check numbers of venues returned for each neighborhood
markham_venues_grouped = markham_venues.groupby('Mall/plaza', sort = False, as_index = False)[['Venue']].count()
markham_venues_grouped

Unnamed: 0,Mall/plaza,Venue
0,Alderland Center,41
1,CF Markville,68
2,First Markham Place,52
3,J-Town,10
4,King Square Shopping Mall,22
5,Langham Square,46
6,Markham Town Square,49
7,Market Village,58
8,Metro Square,63
9,New Kennedy Square,48


In [15]:
# get the number of unique venue categories
print('There are {} unique venue categories.'.format(len(markham_venues['Venue Category'].unique())))
markham_venues['Venue Category'].unique()

There are 131 unique venue categories.


array(['Supermarket', 'Hotel', 'Bubble Tea Shop', 'Chinese Restaurant',
       'Coffee Shop', 'Dessert Shop', 'Szechuan Restaurant', 'Tea Room',
       'Grocery Store', 'Ice Cream Shop', 'Sporting Goods Shop',
       'Asian Restaurant', 'Korean Restaurant', 'Bank', 'Sandwich Place',
       'Hong Kong Restaurant', 'Dim Sum Restaurant',
       'Xinjiang Restaurant', 'Gas Station', 'Dumpling Restaurant',
       'Pharmacy', 'Ramen Restaurant', 'Fast Food Restaurant',
       'Pizza Place', 'Market', 'Baseball Field', 'Toy / Game Store',
       'Diner', 'Gym / Fitness Center', 'Sushi Restaurant', 'Yoga Studio',
       'Cosmetics Shop', 'New American Restaurant', 'Seafood Restaurant',
       'Chocolate Shop', 'Clothing Store', 'Health Food Store',
       'Shopping Mall', 'Food Court', 'Burger Joint', 'Restaurant',
       'Bakery', 'Community Center', 'Cantonese Restaurant', 'Beer Store',
       'Electronics Store', 'Greek Restaurant', 'Big Box Store',
       'Discount Store', 'Café', 'Fried C

Looking at these categories, we are going to drop any venues that does not affect our decision. This includes:

In [16]:
drop_list = ['Bank', 'Gas Station', 'Pharmacy', 'Rental Service', 'Discount Store', 'Shipping Store', 'Intersection', 'Automotive Shop', 'Dry Cleaner',\
            'Candy Store', 'Convenience Store', 'Deli ', 'Distribution Center', 'Auto Dealership', 'Bus Station', 'Bus Stop', 'Road', 'Thrift']

for category in drop_list:
    # get the index in the dataframe that will be dropped
    markham_venues.drop(index = np.where(markham_venues['Venue Category'].str.contains(category, case = False))[0].tolist(), inplace = True)
    markham_venues.reset_index(drop = True, inplace = True)

print('There are {} unique venue categories.'.format(len(markham_venues['Venue Category'].unique())))
markham_venues['Venue Category'].unique()

There are 115 unique venue categories.


array(['Supermarket', 'Hotel', 'Bubble Tea Shop', 'Chinese Restaurant',
       'Coffee Shop', 'Dessert Shop', 'Szechuan Restaurant', 'Tea Room',
       'Grocery Store', 'Ice Cream Shop', 'Sporting Goods Shop',
       'Asian Restaurant', 'Korean Restaurant', 'Sandwich Place',
       'Hong Kong Restaurant', 'Dim Sum Restaurant',
       'Xinjiang Restaurant', 'Dumpling Restaurant', 'Ramen Restaurant',
       'Fast Food Restaurant', 'Pizza Place', 'Market', 'Baseball Field',
       'Toy / Game Store', 'Diner', 'Gym / Fitness Center',
       'Sushi Restaurant', 'Yoga Studio', 'Cosmetics Shop',
       'New American Restaurant', 'Seafood Restaurant', 'Chocolate Shop',
       'Clothing Store', 'Health Food Store', 'Shopping Mall',
       'Food Court', 'Burger Joint', 'Restaurant', 'Bakery',
       'Community Center', 'Cantonese Restaurant', 'Beer Store',
       'Electronics Store', 'Greek Restaurant', 'Big Box Store', 'Café',
       'Fried Chicken Joint', 'Jewelry Store', 'Insurance Office',
 

We do not need to distinguish between different types of Chinese restaurants at this stage. Rename all types that will affect our decision as "Chinese Restaurant".

In [17]:
from copy import deepcopy
markham_venues_mod = deepcopy(markham_venues)

# list of different Chinese restaurant
chinese_type = ['Chinese Restaurant', 'Szechuan Restaurant', 'Asian Restaurant', 'Hong Kong Restaurant', 'Dim Sum Restaurant', 'Xinjiang Restaurant',\
                'Dumpling Restaurant', 'Hotpot Restaurant', 'Noodle House', 'Cantonese Restaurant', 'Shanghai Restaurant', 'Taiwanese Restaurant', \
               'BBQ Joint', 'Cha Chaan Teng']
for restaurant in chinese_type:
    markham_venues_mod.loc[markham_venues_mod['Venue Category'].str.contains(restaurant, case = False), 'Venue Category'] = 'Chinese Restaurant'
print('There are {} unique venue categories.'.format(len(markham_venues['Venue Category'].unique())))
markham_venues_mod['Venue Category'].unique()

There are 115 unique venue categories.


array(['Supermarket', 'Hotel', 'Bubble Tea Shop', 'Chinese Restaurant',
       'Coffee Shop', 'Dessert Shop', 'Tea Room', 'Grocery Store',
       'Ice Cream Shop', 'Sporting Goods Shop', 'Korean Restaurant',
       'Sandwich Place', 'Ramen Restaurant', 'Fast Food Restaurant',
       'Pizza Place', 'Market', 'Baseball Field', 'Toy / Game Store',
       'Diner', 'Gym / Fitness Center', 'Sushi Restaurant', 'Yoga Studio',
       'Cosmetics Shop', 'New American Restaurant', 'Seafood Restaurant',
       'Chocolate Shop', 'Clothing Store', 'Health Food Store',
       'Shopping Mall', 'Food Court', 'Burger Joint', 'Restaurant',
       'Bakery', 'Community Center', 'Beer Store', 'Electronics Store',
       'Greek Restaurant', 'Big Box Store', 'Café', 'Fried Chicken Joint',
       'Jewelry Store', 'Insurance Office', 'Thai Restaurant', 'Gym',
       'Luggage Store', 'Gift Shop', 'Department Store', 'Park',
       'Halal Restaurant', 'Bookstore', 'Climbing Gym',
       'Furniture / Home Store', '

In [18]:
markham_venues_mod_grouped = markham_venues_mod.groupby('Mall/plaza', sort = False, as_index = False)[['Venue']].count()
markham_venues_mod_grouped

Unnamed: 0,Mall/plaza,Venue
0,Alderland Center,38
1,CF Markville,62
2,First Markham Place,51
3,J-Town,7
4,King Square Shopping Mall,17
5,Langham Square,38
6,Markham Town Square,43
7,Market Village,55
8,Metro Square,56
9,New Kennedy Square,41


J-Town has only 7 venues. Let us see if we should drop it.

In [19]:
markham_venues_mod[markham_venues_mod['Mall/plaza'] == 'J-Town']

Unnamed: 0,Mall/plaza,Mall/plaza Latitude,Mall/plaza Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue ID
151,J-Town,43.801437,-79.405085,Cayne's Super Housewares,43.805153,-79.412594,Furniture / Home Store,4ae48161f964a520119b21e3
152,J-Town,43.801437,-79.405085,Ambrosia Natural Foods,43.803955,-79.415948,Health Food Store,4ae48904f964a520659b21e3
153,J-Town,43.801437,-79.405085,Dear Born baby,43.804398,-79.415351,Kids Store,4bf167a0c9ced13ab61681c5
154,J-Town,43.801437,-79.405085,Grandview Park,43.804632,-79.411041,Baseball Field,4be34de421d5a593b2cc1811
155,J-Town,43.801437,-79.405085,Silent Sports,43.804655,-79.412508,Athletics & Sports,4aec6881f964a520c3c621e3
156,J-Town,43.801437,-79.405085,Sticker & More Warehouse Store,43.804855,-79.412684,Arts & Crafts Store,506f3954e4b07fcb1f8c3e82
157,J-Town,43.801437,-79.405085,Evex,43.803899,-79.415773,Luggage Store,4be9823d6295c9b64b458508


There is no restaurant in this area. Let us drop it.

In [20]:
markham_venues_mod.drop(index = markham_venues_mod[markham_venues_mod['Mall/plaza'] == 'J-Town'].index, inplace = True)
markham_venues_mod.reset_index(drop = True, inplace = True)

In [21]:
# one hot encoding for the venues in Markham
markham_onehot = pd.get_dummies(markham_venues_mod[['Venue Category']], columns = ['Venue Category'], prefix = "", prefix_sep = "")

# add neighborhood column back to dataframe
markham_onehot['Mall/plaza'] = markham_venues_mod['Mall/plaza'] 

# move neighborhood column to the first column
col = markham_onehot.columns.tolist()
markham_onehot = markham_onehot[col[-1:] + col[:-1]]

markham_onehot.head()

Unnamed: 0,Mall/plaza,Airport,American Restaurant,Bagel Shop,Bakery,Bar,Baseball Field,Beer Store,Big Box Store,Bookstore,Breakfast Spot,Bubble Tea Shop,Burger Joint,Café,Camera Store,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Coffee Shop,Community Center,Cosmetics Shop,Department Store,Dessert Shop,Diner,Dongbei Restaurant,Electronics Store,Fast Food Restaurant,Field,Flower Shop,Food,Food & Drink Shop,Food Court,Fried Chicken Joint,Furniture / Home Store,Gastropub,German Restaurant,Gift Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Health Food Store,Hobby Shop,Hookah Bar,Hotel,Ice Cream Shop,Indian Restaurant,Insurance Office,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Korean Restaurant,Lake,Liquor Store,Lounge,Luggage Store,Malay Restaurant,Market,Massage Studio,Medical Center,Mediterranean Restaurant,Mexican Restaurant,Miscellaneous Shop,New American Restaurant,Night Market,Optical Shop,Park,Pet Store,Pizza Place,Poke Place,Portuguese Restaurant,Pub,Ramen Restaurant,Restaurant,Sandwich Place,Seafood Restaurant,Shopping Mall,Shopping Plaza,Snack Place,Spa,Sporting Goods Shop,Sports Club,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Wings Joint,Yoga Studio
0,Alderland Center,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
1,Alderland Center,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Alderland Center,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Alderland Center,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Alderland Center,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### let's find out the occurance of each venue category

In [22]:
markham_grouped = markham_onehot.groupby('Mall/plaza').mean().reset_index()
markham_grouped

Unnamed: 0,Mall/plaza,Airport,American Restaurant,Bagel Shop,Bakery,Bar,Baseball Field,Beer Store,Big Box Store,Bookstore,Breakfast Spot,Bubble Tea Shop,Burger Joint,Café,Camera Store,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Coffee Shop,Community Center,Cosmetics Shop,Department Store,Dessert Shop,Diner,Dongbei Restaurant,Electronics Store,Fast Food Restaurant,Field,Flower Shop,Food,Food & Drink Shop,Food Court,Fried Chicken Joint,Furniture / Home Store,Gastropub,German Restaurant,Gift Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Health Food Store,Hobby Shop,Hookah Bar,Hotel,Ice Cream Shop,Indian Restaurant,Insurance Office,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Korean Restaurant,Lake,Liquor Store,Lounge,Luggage Store,Malay Restaurant,Market,Massage Studio,Medical Center,Mediterranean Restaurant,Mexican Restaurant,Miscellaneous Shop,New American Restaurant,Night Market,Optical Shop,Park,Pet Store,Pizza Place,Poke Place,Portuguese Restaurant,Pub,Ramen Restaurant,Restaurant,Sandwich Place,Seafood Restaurant,Shopping Mall,Shopping Plaza,Snack Place,Spa,Sporting Goods Shop,Sports Club,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Wings Joint,Yoga Studio
0,Alderland Center,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.342105,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.026316,0.026316,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.026316,0.0,0.0,0.0,0.0,0.026316,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.026316,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.026316,0.052632,0.052632,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0
1,CF Markville,0.0,0.0,0.0,0.016129,0.0,0.0,0.016129,0.016129,0.0,0.0,0.0,0.016129,0.016129,0.0,0.0,0.080645,0.016129,0.0,0.064516,0.080645,0.016129,0.048387,0.016129,0.0,0.0,0.0,0.048387,0.096774,0.0,0.0,0.0,0.0,0.016129,0.016129,0.0,0.0,0.0,0.016129,0.016129,0.032258,0.016129,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.016129,0.0,0.048387,0.0,0.0,0.0,0.0,0.064516,0.0,0.016129,0.016129,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.016129,0.016129,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.016129
2,First Markham Place,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.019608,0.019608,0.0,0.0,0.0,0.196078,0.0,0.019608,0.0,0.098039,0.0,0.0,0.019608,0.058824,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.019608,0.0,0.039216,0.0,0.0,0.0,0.0,0.019608,0.0,0.019608,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.078431,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.019608,0.039216,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.039216,0.019608,0.019608,0.0,0.0,0.019608,0.0,0.019608,0.019608,0.0,0.019608
3,King Square Shopping Mall,0.058824,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.058824,0.058824,0.0,0.058824,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0
4,Langham Square,0.0,0.0,0.0,0.026316,0.0,0.0,0.026316,0.0,0.0,0.026316,0.052632,0.0,0.026316,0.0,0.026316,0.263158,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.052632,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.026316,0.026316,0.026316,0.0,0.0,0.0,0.0,0.026316,0.0,0.026316,0.026316,0.026316,0.0,0.0,0.0,0.026316,0.026316,0.026316,0.0,0.0,0.0
5,Market Village,0.0,0.0,0.0,0.072727,0.0,0.0,0.0,0.0,0.0,0.0,0.036364,0.0,0.0,0.018182,0.0,0.309091,0.0,0.0,0.0,0.018182,0.0,0.018182,0.0,0.018182,0.0,0.018182,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.018182,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054545,0.018182,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.018182,0.018182,0.018182,0.018182,0.0,0.0,0.0,0.0,0.0,0.018182,0.018182,0.018182,0.0,0.036364,0.0,0.0,0.0,0.018182,0.0,0.018182,0.036364,0.0,0.0,0.0,0.036364,0.0,0.036364,0.0,0.0,0.0
6,Markham Town Square,0.0,0.0,0.0,0.046512,0.023256,0.0,0.0,0.0,0.0,0.023256,0.0,0.023256,0.023256,0.0,0.0,0.116279,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.023256,0.023256,0.0,0.023256,0.023256,0.0,0.023256,0.023256,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.046512,0.0,0.023256,0.0,0.0,0.023256,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.023256,0.023256,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.023256,0.023256,0.023256,0.0,0.023256,0.023256,0.023256,0.0,0.0,0.046512,0.023256,0.0,0.023256,0.023256,0.0,0.0,0.0,0.023256,0.023256,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Metro Square,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.053571,0.0,0.0,0.0,0.0,0.339286,0.0,0.0,0.0,0.053571,0.0,0.0,0.0,0.017857,0.017857,0.0,0.017857,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.017857,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.035714,0.0,0.0,0.0,0.017857,0.0,0.035714,0.0,0.035714,0.0,0.017857,0.0,0.017857,0.0,0.0,0.035714,0.035714,0.035714,0.0,0.0,0.017857,0.0,0.0,0.017857,0.0,0.0,0.0
8,New Kennedy Square,0.0,0.0,0.0,0.02439,0.0,0.0,0.02439,0.0,0.0,0.02439,0.073171,0.0,0.02439,0.0,0.02439,0.243902,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.04878,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.04878,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.02439,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.04878,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.02439,0.02439,0.02439,0.0,0.0,0.0,0.0,0.02439,0.0,0.02439,0.02439,0.02439,0.0,0.0,0.0,0.02439,0.02439,0.02439,0.0,0.0,0.0
9,Pacific Mall,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.017241,0.0,0.327586,0.0,0.0,0.0,0.017241,0.0,0.017241,0.0,0.017241,0.0,0.017241,0.0,0.017241,0.0,0.0,0.0,0.017241,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.017241,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.051724,0.017241,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.017241,0.017241,0.017241,0.017241,0.0,0.0,0.0,0.0,0.017241,0.017241,0.017241,0.017241,0.0,0.034483,0.0,0.0,0.0,0.0,0.017241,0.017241,0.034483,0.0,0.0,0.0,0.017241,0.0,0.034483,0.0,0.0,0.0


#### get the top 15 most common venues in each neighborhood

In [23]:
# define a function that return list of venue categories in descending order of popularity
def return_most_common_venues(row, num_top_venues):
    '''
    
    row: neighborhood,
    num_top_venues: number of venue categories returned.
    
    '''
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [24]:
# return a dataframe showing the top 15 most common venues categories
num_top_venues = 15

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Mall/plaza']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
venues_ranked = pd.DataFrame(columns = columns)
venues_ranked['Mall/plaza'] = markham_grouped['Mall/plaza']

for ind in np.arange(markham_grouped.shape[0]):
    venues_ranked.iloc[ind, 1:] = return_most_common_venues(markham_grouped.iloc[ind, :], num_top_venues)

venues_ranked

Unnamed: 0,Mall/plaza,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,Alderland Center,Chinese Restaurant,Pizza Place,Tea Room,Sushi Restaurant,Sandwich Place,Bubble Tea Shop,Gym / Fitness Center,Sporting Goods Shop,Dessert Shop,Fast Food Restaurant,Market,Coffee Shop,Ramen Restaurant,Korean Restaurant,Diner
1,CF Markville,Fast Food Restaurant,Coffee Shop,Chinese Restaurant,Restaurant,Clothing Store,Pizza Place,Cosmetics Shop,Electronics Store,Toy / Game Store,Grocery Store,Jewelry Store,Health Food Store,Gym,Fried Chicken Joint,Greek Restaurant
2,First Markham Place,Chinese Restaurant,Coffee Shop,Korean Restaurant,Dessert Shop,Furniture / Home Store,Sandwich Place,Sushi Restaurant,Yoga Studio,Department Store,Burger Joint,Portuguese Restaurant,Mexican Restaurant,Fast Food Restaurant,Liquor Store,Japanese Restaurant
3,King Square Shopping Mall,Japanese Restaurant,Airport,Bubble Tea Shop,Wings Joint,Gym / Fitness Center,Grocery Store,Greek Restaurant,Gift Shop,Field,Coffee Shop,Chinese Restaurant,Sporting Goods Shop,Pizza Place,Hookah Bar,Bakery
4,Langham Square,Chinese Restaurant,Park,Dessert Shop,Japanese Restaurant,Bubble Tea Shop,Breakfast Spot,Massage Studio,Cosmetics Shop,Fast Food Restaurant,Caribbean Restaurant,Café,Restaurant,Sandwich Place,Shopping Mall,Seafood Restaurant
5,Market Village,Chinese Restaurant,Bakery,Japanese Restaurant,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Tea Room,Bubble Tea Shop,Snack Place,Coffee Shop,Pizza Place,Pet Store,Park,Gym,Miscellaneous Shop,Optical Shop
6,Markham Town Square,Chinese Restaurant,Grocery Store,Bakery,Shopping Mall,Lake,Pizza Place,Pet Store,Cosmetics Shop,Dessert Shop,Diner,Lounge,Liquor Store,Electronics Store,Pub,Juice Bar
7,Metro Square,Chinese Restaurant,Bubble Tea Shop,Coffee Shop,Shopping Mall,Fast Food Restaurant,Korean Restaurant,Tea Room,Sushi Restaurant,Supermarket,Pizza Place,Sandwich Place,Dessert Shop,Electronics Store,Park,Ramen Restaurant
8,New Kennedy Square,Chinese Restaurant,Bubble Tea Shop,Dessert Shop,Park,Japanese Restaurant,Sandwich Place,Beer Store,Caribbean Restaurant,Café,Indian Restaurant,Restaurant,Seafood Restaurant,Shopping Mall,Breakfast Spot,Cosmetics Shop
9,Pacific Mall,Chinese Restaurant,Bakery,Japanese Restaurant,Vietnamese Restaurant,Tea Room,Bubble Tea Shop,Snack Place,Optical Shop,Cosmetics Shop,Pet Store,Park,Miscellaneous Shop,Coffee Shop,Korean Restaurant,Dessert Shop


### 3. Clustering using agglomerative clustering

In [25]:
# set number of clusters, best value by observation
kclusters = 5

markham_grouped_clustering = markham_grouped.drop('Mall/plaza', axis = 1)

# run agglomerative clustering
tree = AgglomerativeClustering(n_clusters = kclusters, affinity = 'cosine', linkage = 'complete').fit(markham_grouped_clustering)

# add clustering labels to venues_ranked
try:
    venues_ranked.insert(0, 'Cluster Labels', tree.labels_)
except:
    venues_ranked.drop(columns = 'Cluster Labels', inplace = True)
    venues_ranked.insert(0, 'Cluster Labels', tree.labels_)
venues_ranked

Unnamed: 0,Cluster Labels,Mall/plaza,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,0,Alderland Center,Chinese Restaurant,Pizza Place,Tea Room,Sushi Restaurant,Sandwich Place,Bubble Tea Shop,Gym / Fitness Center,Sporting Goods Shop,Dessert Shop,Fast Food Restaurant,Market,Coffee Shop,Ramen Restaurant,Korean Restaurant,Diner
1,2,CF Markville,Fast Food Restaurant,Coffee Shop,Chinese Restaurant,Restaurant,Clothing Store,Pizza Place,Cosmetics Shop,Electronics Store,Toy / Game Store,Grocery Store,Jewelry Store,Health Food Store,Gym,Fried Chicken Joint,Greek Restaurant
2,0,First Markham Place,Chinese Restaurant,Coffee Shop,Korean Restaurant,Dessert Shop,Furniture / Home Store,Sandwich Place,Sushi Restaurant,Yoga Studio,Department Store,Burger Joint,Portuguese Restaurant,Mexican Restaurant,Fast Food Restaurant,Liquor Store,Japanese Restaurant
3,3,King Square Shopping Mall,Japanese Restaurant,Airport,Bubble Tea Shop,Wings Joint,Gym / Fitness Center,Grocery Store,Greek Restaurant,Gift Shop,Field,Coffee Shop,Chinese Restaurant,Sporting Goods Shop,Pizza Place,Hookah Bar,Bakery
4,0,Langham Square,Chinese Restaurant,Park,Dessert Shop,Japanese Restaurant,Bubble Tea Shop,Breakfast Spot,Massage Studio,Cosmetics Shop,Fast Food Restaurant,Caribbean Restaurant,Café,Restaurant,Sandwich Place,Shopping Mall,Seafood Restaurant
5,0,Market Village,Chinese Restaurant,Bakery,Japanese Restaurant,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Tea Room,Bubble Tea Shop,Snack Place,Coffee Shop,Pizza Place,Pet Store,Park,Gym,Miscellaneous Shop,Optical Shop
6,4,Markham Town Square,Chinese Restaurant,Grocery Store,Bakery,Shopping Mall,Lake,Pizza Place,Pet Store,Cosmetics Shop,Dessert Shop,Diner,Lounge,Liquor Store,Electronics Store,Pub,Juice Bar
7,0,Metro Square,Chinese Restaurant,Bubble Tea Shop,Coffee Shop,Shopping Mall,Fast Food Restaurant,Korean Restaurant,Tea Room,Sushi Restaurant,Supermarket,Pizza Place,Sandwich Place,Dessert Shop,Electronics Store,Park,Ramen Restaurant
8,0,New Kennedy Square,Chinese Restaurant,Bubble Tea Shop,Dessert Shop,Park,Japanese Restaurant,Sandwich Place,Beer Store,Caribbean Restaurant,Café,Indian Restaurant,Restaurant,Seafood Restaurant,Shopping Mall,Breakfast Spot,Cosmetics Shop
9,0,Pacific Mall,Chinese Restaurant,Bakery,Japanese Restaurant,Vietnamese Restaurant,Tea Room,Bubble Tea Shop,Snack Place,Optical Shop,Cosmetics Shop,Pet Store,Park,Miscellaneous Shop,Coffee Shop,Korean Restaurant,Dessert Shop


In [26]:
# drop J-Town
markham_clustered = mall_plaza.set_index('NAME').drop(index = 'J-Town').reset_index().rename(columns = {'NAME': 'Mall/plaza'})

#merge with markham_venues_mod
markham_clustered = markham_clustered.join(venues_ranked.set_index('Mall/plaza'), on='Mall/plaza')
markham_clustered

Unnamed: 0,Mall/plaza,CITY,ADDRESS,LATITUDE,LONGITUDE,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,Alderland Center,Markham,"7102-7158 Warden Ave, Markham, ON",43.824235,-79.326603,0,Chinese Restaurant,Pizza Place,Tea Room,Sushi Restaurant,Sandwich Place,Bubble Tea Shop,Gym / Fitness Center,Sporting Goods Shop,Dessert Shop,Fast Food Restaurant,Market,Coffee Shop,Ramen Restaurant,Korean Restaurant,Diner
1,CF Markville,Markham,"5000 Hwy 7, Markham, ON",43.865694,-79.291642,2,Fast Food Restaurant,Coffee Shop,Chinese Restaurant,Restaurant,Clothing Store,Pizza Place,Cosmetics Shop,Electronics Store,Toy / Game Store,Grocery Store,Jewelry Store,Health Food Store,Gym,Fried Chicken Joint,Greek Restaurant
2,First Markham Place,Markham,"3255 Highway 7 East, Markham, ON",43.851747,-79.348356,0,Chinese Restaurant,Coffee Shop,Korean Restaurant,Dessert Shop,Furniture / Home Store,Sandwich Place,Sushi Restaurant,Yoga Studio,Department Store,Burger Joint,Portuguese Restaurant,Mexican Restaurant,Fast Food Restaurant,Liquor Store,Japanese Restaurant
3,King Square Shopping Mall,Markham,"9390 Woodbine Ave, Markham, ON",43.872314,-79.365975,3,Japanese Restaurant,Airport,Bubble Tea Shop,Wings Joint,Gym / Fitness Center,Grocery Store,Greek Restaurant,Gift Shop,Field,Coffee Shop,Chinese Restaurant,Sporting Goods Shop,Pizza Place,Hookah Bar,Bakery
4,Langham Square,Markham,"8339 Kennedy Rd, Markham, ON",43.8574,-79.303767,0,Chinese Restaurant,Park,Dessert Shop,Japanese Restaurant,Bubble Tea Shop,Breakfast Spot,Massage Studio,Cosmetics Shop,Fast Food Restaurant,Caribbean Restaurant,Café,Restaurant,Sandwich Place,Shopping Mall,Seafood Restaurant
5,Markham Town Square,Markham,"8601 Warden Ave, Markham, ON",43.857494,-79.331529,4,Chinese Restaurant,Grocery Store,Bakery,Shopping Mall,Lake,Pizza Place,Pet Store,Cosmetics Shop,Dessert Shop,Diner,Lounge,Liquor Store,Electronics Store,Pub,Juice Bar
6,Market Village,Markham,"4390 Steeles Avenue East, Markham, ON",43.824568,-79.303737,0,Chinese Restaurant,Bakery,Japanese Restaurant,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Tea Room,Bubble Tea Shop,Snack Place,Coffee Shop,Pizza Place,Pet Store,Park,Gym,Miscellaneous Shop,Optical Shop
7,Metro Square,Markham,"3636 Steeles Ave E, Markham, ON",43.817998,-79.331931,0,Chinese Restaurant,Bubble Tea Shop,Coffee Shop,Shopping Mall,Fast Food Restaurant,Korean Restaurant,Tea Room,Sushi Restaurant,Supermarket,Pizza Place,Sandwich Place,Dessert Shop,Electronics Store,Park,Ramen Restaurant
8,New Kennedy Square,Markham,"8360 Kennedy Rd, Markham, ON",43.859604,-79.302949,0,Chinese Restaurant,Bubble Tea Shop,Dessert Shop,Park,Japanese Restaurant,Sandwich Place,Beer Store,Caribbean Restaurant,Café,Indian Restaurant,Restaurant,Seafood Restaurant,Shopping Mall,Breakfast Spot,Cosmetics Shop
9,Pacific Mall,Markham,"4300 Steeles Ave E, Markham, ON",43.825488,-79.306128,0,Chinese Restaurant,Bakery,Japanese Restaurant,Vietnamese Restaurant,Tea Room,Bubble Tea Shop,Snack Place,Optical Shop,Cosmetics Shop,Pet Store,Park,Miscellaneous Shop,Coffee Shop,Korean Restaurant,Dessert Shop


In [27]:
# create map
map_clusters = folium.Map(location=[latitude_markham, longitude_markham], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.gnuplot(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
for lat, lon, neigh, cluster in zip(markham_clustered['LATITUDE'], markham_clustered['LONGITUDE'], markham_clustered['Mall/plaza'], markham_clustered['Cluster Labels']):
    label = folium.Popup(str(neigh) + '\n\n' 'Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius = 5,
        popup = label,
        color = rainbow[cluster],
        fill = True,
        fill_color = rainbow[cluster],
        fill_opacity = 1.5).add_to(map_clusters)
       
map_clusters

In [28]:
# sort by cluster label
markham_clustered.sort_values(by = ['Cluster Labels', 'Mall/plaza'], axis = 0, ascending = True, inplace = True)
markham_clustered.reset_index(drop = True, inplace = True)
markham_clustered

Unnamed: 0,Mall/plaza,CITY,ADDRESS,LATITUDE,LONGITUDE,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,Alderland Center,Markham,"7102-7158 Warden Ave, Markham, ON",43.824235,-79.326603,0,Chinese Restaurant,Pizza Place,Tea Room,Sushi Restaurant,Sandwich Place,Bubble Tea Shop,Gym / Fitness Center,Sporting Goods Shop,Dessert Shop,Fast Food Restaurant,Market,Coffee Shop,Ramen Restaurant,Korean Restaurant,Diner
1,First Markham Place,Markham,"3255 Highway 7 East, Markham, ON",43.851747,-79.348356,0,Chinese Restaurant,Coffee Shop,Korean Restaurant,Dessert Shop,Furniture / Home Store,Sandwich Place,Sushi Restaurant,Yoga Studio,Department Store,Burger Joint,Portuguese Restaurant,Mexican Restaurant,Fast Food Restaurant,Liquor Store,Japanese Restaurant
2,Langham Square,Markham,"8339 Kennedy Rd, Markham, ON",43.8574,-79.303767,0,Chinese Restaurant,Park,Dessert Shop,Japanese Restaurant,Bubble Tea Shop,Breakfast Spot,Massage Studio,Cosmetics Shop,Fast Food Restaurant,Caribbean Restaurant,Café,Restaurant,Sandwich Place,Shopping Mall,Seafood Restaurant
3,Market Village,Markham,"4390 Steeles Avenue East, Markham, ON",43.824568,-79.303737,0,Chinese Restaurant,Bakery,Japanese Restaurant,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Tea Room,Bubble Tea Shop,Snack Place,Coffee Shop,Pizza Place,Pet Store,Park,Gym,Miscellaneous Shop,Optical Shop
4,Metro Square,Markham,"3636 Steeles Ave E, Markham, ON",43.817998,-79.331931,0,Chinese Restaurant,Bubble Tea Shop,Coffee Shop,Shopping Mall,Fast Food Restaurant,Korean Restaurant,Tea Room,Sushi Restaurant,Supermarket,Pizza Place,Sandwich Place,Dessert Shop,Electronics Store,Park,Ramen Restaurant
5,New Kennedy Square,Markham,"8360 Kennedy Rd, Markham, ON",43.859604,-79.302949,0,Chinese Restaurant,Bubble Tea Shop,Dessert Shop,Park,Japanese Restaurant,Sandwich Place,Beer Store,Caribbean Restaurant,Café,Indian Restaurant,Restaurant,Seafood Restaurant,Shopping Mall,Breakfast Spot,Cosmetics Shop
6,Pacific Mall,Markham,"4300 Steeles Ave E, Markham, ON",43.825488,-79.306128,0,Chinese Restaurant,Bakery,Japanese Restaurant,Vietnamese Restaurant,Tea Room,Bubble Tea Shop,Snack Place,Optical Shop,Cosmetics Shop,Pet Store,Park,Miscellaneous Shop,Coffee Shop,Korean Restaurant,Dessert Shop
7,Peachtree Center,Markham,"8392 Kennedy Rd, Markham, ON",43.861031,-79.305318,0,Chinese Restaurant,Dessert Shop,Bubble Tea Shop,Japanese Restaurant,Park,Fast Food Restaurant,Gastropub,Massage Studio,Mediterranean Restaurant,Coffee Shop,Italian Restaurant,New American Restaurant,Indian Restaurant,Ice Cream Shop,Video Store
8,The Shops on Steeles and 404,Markham,"2900 Steeles Ave E, Markham, ON",43.813388,-79.357184,0,Chinese Restaurant,Coffee Shop,Bakery,Park,Warehouse Store,Hotel,Diner,Mediterranean Restaurant,Italian Restaurant,Caribbean Restaurant,Sushi Restaurant,Market,Karaoke Bar,Pub,Food & Drink Shop
9,Thornhill Square Shopping Center,Markham,"300 John St, Markham, ON",43.819444,-79.397679,1,Pizza Place,Japanese Restaurant,Breakfast Spot,Coffee Shop,Food & Drink Shop,Sandwich Place,Greek Restaurant,Supermarket,Café,Park,Gym / Fitness Center,Italian Restaurant,Bagel Shop,Halal Restaurant,Electronics Store


We notice that there is no Chinese restaurant at label 1, and it is not very common at label 3. \
Although Chinese restaurant is the most common at label 2 and 4, other cuisines are not common here. \
Therefore, we will only keep label 0 as our final candidates. Label 0 has variety of cuisines, and there are also dessert places. Other venues, such as stores, gyms and parks, are common.

In [29]:
final_candidates = markham_clustered[markham_clustered['Cluster Labels'] == 0].reset_index(drop = True)
final_candidates

Unnamed: 0,Mall/plaza,CITY,ADDRESS,LATITUDE,LONGITUDE,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,Alderland Center,Markham,"7102-7158 Warden Ave, Markham, ON",43.824235,-79.326603,0,Chinese Restaurant,Pizza Place,Tea Room,Sushi Restaurant,Sandwich Place,Bubble Tea Shop,Gym / Fitness Center,Sporting Goods Shop,Dessert Shop,Fast Food Restaurant,Market,Coffee Shop,Ramen Restaurant,Korean Restaurant,Diner
1,First Markham Place,Markham,"3255 Highway 7 East, Markham, ON",43.851747,-79.348356,0,Chinese Restaurant,Coffee Shop,Korean Restaurant,Dessert Shop,Furniture / Home Store,Sandwich Place,Sushi Restaurant,Yoga Studio,Department Store,Burger Joint,Portuguese Restaurant,Mexican Restaurant,Fast Food Restaurant,Liquor Store,Japanese Restaurant
2,Langham Square,Markham,"8339 Kennedy Rd, Markham, ON",43.8574,-79.303767,0,Chinese Restaurant,Park,Dessert Shop,Japanese Restaurant,Bubble Tea Shop,Breakfast Spot,Massage Studio,Cosmetics Shop,Fast Food Restaurant,Caribbean Restaurant,Café,Restaurant,Sandwich Place,Shopping Mall,Seafood Restaurant
3,Market Village,Markham,"4390 Steeles Avenue East, Markham, ON",43.824568,-79.303737,0,Chinese Restaurant,Bakery,Japanese Restaurant,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Tea Room,Bubble Tea Shop,Snack Place,Coffee Shop,Pizza Place,Pet Store,Park,Gym,Miscellaneous Shop,Optical Shop
4,Metro Square,Markham,"3636 Steeles Ave E, Markham, ON",43.817998,-79.331931,0,Chinese Restaurant,Bubble Tea Shop,Coffee Shop,Shopping Mall,Fast Food Restaurant,Korean Restaurant,Tea Room,Sushi Restaurant,Supermarket,Pizza Place,Sandwich Place,Dessert Shop,Electronics Store,Park,Ramen Restaurant
5,New Kennedy Square,Markham,"8360 Kennedy Rd, Markham, ON",43.859604,-79.302949,0,Chinese Restaurant,Bubble Tea Shop,Dessert Shop,Park,Japanese Restaurant,Sandwich Place,Beer Store,Caribbean Restaurant,Café,Indian Restaurant,Restaurant,Seafood Restaurant,Shopping Mall,Breakfast Spot,Cosmetics Shop
6,Pacific Mall,Markham,"4300 Steeles Ave E, Markham, ON",43.825488,-79.306128,0,Chinese Restaurant,Bakery,Japanese Restaurant,Vietnamese Restaurant,Tea Room,Bubble Tea Shop,Snack Place,Optical Shop,Cosmetics Shop,Pet Store,Park,Miscellaneous Shop,Coffee Shop,Korean Restaurant,Dessert Shop
7,Peachtree Center,Markham,"8392 Kennedy Rd, Markham, ON",43.861031,-79.305318,0,Chinese Restaurant,Dessert Shop,Bubble Tea Shop,Japanese Restaurant,Park,Fast Food Restaurant,Gastropub,Massage Studio,Mediterranean Restaurant,Coffee Shop,Italian Restaurant,New American Restaurant,Indian Restaurant,Ice Cream Shop,Video Store
8,The Shops on Steeles and 404,Markham,"2900 Steeles Ave E, Markham, ON",43.813388,-79.357184,0,Chinese Restaurant,Coffee Shop,Bakery,Park,Warehouse Store,Hotel,Diner,Mediterranean Restaurant,Italian Restaurant,Caribbean Restaurant,Sushi Restaurant,Market,Karaoke Bar,Pub,Food & Drink Shop


In [30]:
# remove irrelevant columns
final_candidates = final_candidates[final_candidates.columns.tolist()[0:5]]
final_candidates

Unnamed: 0,Mall/plaza,CITY,ADDRESS,LATITUDE,LONGITUDE
0,Alderland Center,Markham,"7102-7158 Warden Ave, Markham, ON",43.824235,-79.326603
1,First Markham Place,Markham,"3255 Highway 7 East, Markham, ON",43.851747,-79.348356
2,Langham Square,Markham,"8339 Kennedy Rd, Markham, ON",43.8574,-79.303767
3,Market Village,Markham,"4390 Steeles Avenue East, Markham, ON",43.824568,-79.303737
4,Metro Square,Markham,"3636 Steeles Ave E, Markham, ON",43.817998,-79.331931
5,New Kennedy Square,Markham,"8360 Kennedy Rd, Markham, ON",43.859604,-79.302949
6,Pacific Mall,Markham,"4300 Steeles Ave E, Markham, ON",43.825488,-79.306128
7,Peachtree Center,Markham,"8392 Kennedy Rd, Markham, ON",43.861031,-79.305318
8,The Shops on Steeles and 404,Markham,"2900 Steeles Ave E, Markham, ON",43.813388,-79.357184


### 4. Analyze the Chinese restaurants at the potential locations

In [31]:
# get the Chinese restaurants located in our final candidates
chinese_cuisine = markham_venues_mod[markham_venues_mod['Venue Category'] == 'Chinese Restaurant'].reset_index(drop = True)

for place in markham_clustered[markham_clustered['Cluster Labels'] != 0]['Mall/plaza'].tolist():
    chinese_cuisine.drop(index = chinese_cuisine[chinese_cuisine['Mall/plaza'] == place].index, inplace = True)
    chinese_cuisine.reset_index(drop = True, inplace = True)
chinese_cuisine.head()

Unnamed: 0,Mall/plaza,Mall/plaza Latitude,Mall/plaza Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue ID
0,Alderland Center,43.824235,-79.326603,Yin Ji Chang Fen 銀記腸粉店,43.820277,-79.325142,Chinese Restaurant,58c1e6e801f4336e21f31b50
1,Alderland Center,43.824235,-79.326603,Asian Legend 味香村,43.818846,-79.331058,Chinese Restaurant,4babe7aaf964a520efd33ae3
2,Alderland Center,43.824235,-79.326603,九鼎香 Legendary Hot Pot,43.820787,-79.326439,Chinese Restaurant,54f2600d498e6f1f6dc1f8b7
3,Alderland Center,43.824235,-79.326603,Tai Ping Hsiang B.B.Q 太平香便當,43.818845,-79.330906,Chinese Restaurant,4bf03b783a002d7fe1da85a4
4,Alderland Center,43.824235,-79.326603,Keung's Delight 強記雞粥,43.821007,-79.326432,Chinese Restaurant,4b74e269f964a520bbf62de3


In [32]:
# change the venue category back
markham_venues.drop_duplicates(subset = 'Venue ID', inplace = True)
for idx, id in enumerate(chinese_cuisine['Venue ID'].tolist()):
    chinese_cuisine.loc[idx, 'Venue Category'] = markham_venues[markham_venues['Venue ID'] == id]['Venue Category'].values[0]
chinese_cuisine.head()

Unnamed: 0,Mall/plaza,Mall/plaza Latitude,Mall/plaza Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue ID
0,Alderland Center,43.824235,-79.326603,Yin Ji Chang Fen 銀記腸粉店,43.820277,-79.325142,Chinese Restaurant,58c1e6e801f4336e21f31b50
1,Alderland Center,43.824235,-79.326603,Asian Legend 味香村,43.818846,-79.331058,Chinese Restaurant,4babe7aaf964a520efd33ae3
2,Alderland Center,43.824235,-79.326603,九鼎香 Legendary Hot Pot,43.820787,-79.326439,Szechuan Restaurant,54f2600d498e6f1f6dc1f8b7
3,Alderland Center,43.824235,-79.326603,Tai Ping Hsiang B.B.Q 太平香便當,43.818845,-79.330906,Asian Restaurant,4bf03b783a002d7fe1da85a4
4,Alderland Center,43.824235,-79.326603,Keung's Delight 強記雞粥,43.821007,-79.326432,Hong Kong Restaurant,4b74e269f964a520bbf62de3


In [33]:
# group the cuisine by location and category
chinese_cuisine_count = chinese_cuisine[['Mall/plaza','Venue', 'Venue Category']].groupby(by = ['Mall/plaza', 'Venue Category']).count()
chinese_cuisine_count

Unnamed: 0_level_0,Unnamed: 1_level_0,Venue
Mall/plaza,Venue Category,Unnamed: 2_level_1
Alderland Center,Asian Restaurant,1
Alderland Center,Chinese Restaurant,7
Alderland Center,Dim Sum Restaurant,1
Alderland Center,Dumpling Restaurant,1
Alderland Center,Hong Kong Restaurant,1
Alderland Center,Szechuan Restaurant,1
Alderland Center,Xinjiang Restaurant,1
First Markham Place,Asian Restaurant,1
First Markham Place,BBQ Joint,1
First Markham Place,Cha Chaan Teng,1


#### create final result dataframe 

In [34]:
# columns
mall_n_plaza = chinese_cuisine['Mall/plaza'].unique().tolist()
#index
chinese_cuisine_type = chinese_cuisine['Venue Category'].unique().tolist()
#count
count = np.zeros((len(chinese_cuisine_type), len(mall_n_plaza)), dtype = int)

for col_idx, mnp in enumerate(mall_n_plaza):
    for idx, category in enumerate(chinese_cuisine_count.loc[mnp].index.values.tolist()):
        count[chinese_cuisine_type.index(category), col_idx] = chinese_cuisine_count.loc[(mnp, category)].values[0]
final_count = pd.DataFrame(data = count, index = chinese_cuisine_type, columns = mall_n_plaza)
final_count

Unnamed: 0,Alderland Center,First Markham Place,Langham Square,Market Village,Metro Square,New Kennedy Square,Pacific Mall,Peachtree Center,The Shops on Steeles and 404
Chinese Restaurant,7,1,4,7,9,4,7,4,2
Szechuan Restaurant,1,0,0,0,1,0,0,0,0
Asian Restaurant,1,1,0,2,3,0,4,0,1
Hong Kong Restaurant,1,1,1,3,1,1,3,1,0
Dim Sum Restaurant,1,0,1,0,1,1,0,1,0
Xinjiang Restaurant,1,0,0,0,1,0,0,0,0
Dumpling Restaurant,1,0,0,1,1,0,1,0,0
BBQ Joint,0,1,0,0,1,0,0,0,1
Shanghai Restaurant,0,3,1,0,0,1,0,1,0
Taiwanese Restaurant,0,1,1,1,0,1,1,1,0


## Results <a name="results"></a>

In [35]:
final_candidates

Unnamed: 0,Mall/plaza,CITY,ADDRESS,LATITUDE,LONGITUDE
0,Alderland Center,Markham,"7102-7158 Warden Ave, Markham, ON",43.824235,-79.326603
1,First Markham Place,Markham,"3255 Highway 7 East, Markham, ON",43.851747,-79.348356
2,Langham Square,Markham,"8339 Kennedy Rd, Markham, ON",43.8574,-79.303767
3,Market Village,Markham,"4390 Steeles Avenue East, Markham, ON",43.824568,-79.303737
4,Metro Square,Markham,"3636 Steeles Ave E, Markham, ON",43.817998,-79.331931
5,New Kennedy Square,Markham,"8360 Kennedy Rd, Markham, ON",43.859604,-79.302949
6,Pacific Mall,Markham,"4300 Steeles Ave E, Markham, ON",43.825488,-79.306128
7,Peachtree Center,Markham,"8392 Kennedy Rd, Markham, ON",43.861031,-79.305318
8,The Shops on Steeles and 404,Markham,"2900 Steeles Ave E, Markham, ON",43.813388,-79.357184


In [36]:
final_count

Unnamed: 0,Alderland Center,First Markham Place,Langham Square,Market Village,Metro Square,New Kennedy Square,Pacific Mall,Peachtree Center,The Shops on Steeles and 404
Chinese Restaurant,7,1,4,7,9,4,7,4,2
Szechuan Restaurant,1,0,0,0,1,0,0,0,0
Asian Restaurant,1,1,0,2,3,0,4,0,1
Hong Kong Restaurant,1,1,1,3,1,1,3,1,0
Dim Sum Restaurant,1,0,1,0,1,1,0,1,0
Xinjiang Restaurant,1,0,0,0,1,0,0,0,0
Dumpling Restaurant,1,0,0,1,1,0,1,0,0
BBQ Joint,0,1,0,0,1,0,0,0,1
Shanghai Restaurant,0,3,1,0,0,1,0,1,0
Taiwanese Restaurant,0,1,1,1,0,1,1,1,0


 We have determined a list of nine potential areas for the restaurant. Final decision will be made by our client depending the availability of space and prices at these locations. We also have collected information regarding the types of Chinese cuisines at these locations. Although some data from Foursquare are not accurate, and our client might need to look into the menus served at these restaurants, there is adequate information to make a decision on the type of Chinese food that will be served.

## Discussion <a name="discussion"></a>

Our analysis shows that Markham has the largest Chinese community in York Region. With this large potential customer base, it is unsurprising that there are various types of Chinese restaurant opened in the city. With Foursquare search and clustering, we will select from one of nine locations to open the restaurant. As shown above, Chinese restaurant is the most common venue category at these locations. Although Chinese restaurants are common, other restaurants would actually help with the business even though the customers are initially attracted by these restaurants. In addition, there are also stores, parks and dessert places in vicinity, which also attracts customers. Lastly, these locations are either a mall or a plaza, and they are close to major roads. So customers could easily visit there by different means. Even with driving, there is large parking space. Malls and plazas also have large commercial spaces, which is an important aspect for opening a restaurant.

Our analysis also shows that there are many different Chinese cuisines served at these potential locations. Unfortunately, many restaurants are labelled as "Chinese Restaurant", and our client will have to look into the menus to determine the actual cuisines served at these places. Nevertheless, our analysis provides insights about the cuisine our client should serve. For example, we see that there are only two restaurants in the potential locations that serve Szechuan cuisine. There are also other type of Chinese cuisines that are not common in the areas. This means that There is an opportunity for our client that is worth considering. 

The purpose of this analysis is to only provide information on areas that are close to residential areas and other venues like parks. Recommended locations should therefore be considered only as a starting point for more detailed analysis which could eventually result in a location that satisfies more specific requirements. Locations like the commercial spaces in a condo could also be worth considering, but there might be more limits, such as noise level and smaller space. This is something for our client to consider.

## Conclusion <a name="conclusion"></a>

The Purpose of this study is to identify locations that are suitable for opening a Chinese Restaurant in York Region of Ontario, Canada. The results will aid our client in narrowing down the search for an optimal location. By evaluating the demographic data of York Region, we identify Markham as the city of interest. Using locations of malls and plazas as the center points for Foursquare search save us time from collecting neighborhood information in Markham because malls and plazas are located close to residential areas. The Foursquare data generate an extensive collection of venues, and clustering based on these venues creates a list of potential locations. Furthermore, information regarding types of Chinese cuisines is collected and could be used as starting points for final exploration by our client.

Final decision on an optimal restaurant location will be made by our client based on specific requirements, such as finance. We have taken into consideration factors, such as ease of parking, proximity to residential areas and parks. Our client should evaluate real estate availability, prices, social and economic dynamics of every potential location before making a final decision.