In [1]:
# The code was removed by Watson Studio for sharing.

# IBM Data Science Professional Certificate
# Applied Data Science Capstone

# Finding optimal locations to open restaurant/grocery businesses.

<hr/>

## 1. Introduction
An international grocery and restaurant chain looking forward opening their business locations in the city of Toronto.  They wanted to identify optimum locations having maximum businesses potential and required to generate business intelligence to form a strategy in establishing their new business locations. 
In the week 3 assignment we note that the Toronto city has 140 postal zip codes assigned to 103 different boroughs. This project will conduct analyzing population demographics, financial and household data in those neighborhoods and cluster them based on their similarity. It will also find the existing venues creating competition (e.g. Restaurants, Grocery stores) and other venues in the proximity which adds new businesses opportunities.


## 2. Data

### Data Sources

#### Source #1: City of Toronto’s Open Data Catalogue
URL: https://www.toronto.ca/city-government/data-research-maps/open-data/open-data-catalogue/
The data from Open Data Catalogue will be used to cluster neighborhoods based on their similarity characteristics. This will help the business to group neighborhoods when forming custom business strategies to their targeted neighborhoods. This data will also be used in finding the optimum business locations. 

#### Source #2: Datasets used in the Week3 Assignment, Neighborhood Segmentation and Clustering
This neighbourhood and postal codes dataset is created from Scraping Wikipedia page: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M. The above dataset is joined with the location dataset provided in the Week3 Assignment (https://cocl.us/Geospatial_data).

The above source had some gaps maping city-designated neighbourhoods in Toronto to the geo locations. Therfore, another webscraping atmmept were done across number of Wikipedia pages starting from https://en.wikipedia.org/wiki/List_of_city-designated_neighbourhoods_in_Toronto. The geo locations were extracted from each Neighbourhood's Wikipedia page and from few of them had to do manual data entry after reffering to other sources.

#### Source #3: Foursquare APIs location data
URL: https://developer.foursquare.com 
The foursquare dataset will be used to identify competitive business locations in each neighborhood (e.g. Grocery stores, restaurants) as well as venues which adds new businesses opportunities (e.g. Schools, Offices, Attractions, Shopping Malls, etc.).


### 2.1 Loading Datasets

In [2]:
import pandas as pd
import numpy as np

In [3]:
# The code was removed by Watson Studio for sharing.

#### Source1: Neighbourhood Profile Data

In [4]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Category,Topic,Data Source,Characteristic,City of Toronto,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,Banbury-Don Mills,...,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
0,Neighbourhood Information,Neighbourhood Information,City of Toronto,Neighbourhood Number,,129,128,20,95,42,...,37,7,137,64,60,94,100,97,27,31
1,Neighbourhood Information,Neighbourhood Information,City of Toronto,TSNS2020 Designation,,No Designation,No Designation,No Designation,No Designation,No Designation,...,No Designation,No Designation,NIA,No Designation,No Designation,No Designation,No Designation,No Designation,NIA,Emerging Neighbourhood
2,Population,Population and dwellings,Census Profile 98-316-X2016001,"Population, 2016",2731571.0,29113,23757,12054,30526,27695,...,16936,22156,53485,12541,7865,14349,11817,12528,27593,14804
3,Population,Population and dwellings,Census Profile 98-316-X2016001,"Population, 2011",2615060.0,30279,21988,11904,29177,26918,...,15004,21343,53350,11703,7826,13986,10578,11652,27713,14687
4,Population,Population and dwellings,Census Profile 98-316-X2016001,Population Change 2011-2016,0.045,-0.039,0.08,0.013,0.046,0.029,...,0.129,0.038,0.003,0.072,0.005,0.026,0.117,0.075,-0.004,0.008


#### Source2: Geo Location Data

In [5]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
2,M3A,North York,Parkwoods,43.753259,-79.329656
3,M4A,North York,Victoria Village,43.725882,-79.315572
4,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
5,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636
6,M6A,North York,Lawrence Heights,43.718518,-79.464763


#### Location Data From Web Scrapping

In [6]:
from bs4 import BeautifulSoup
import requests
import nltk
import re

In [7]:
designated_neighbourhoods = r'https://en.wikipedia.org/wiki/List_of_city-designated_neighbourhoods_in_Toronto'
source = requests.get(designated_neighbourhoods).text
soup = BeautifulSoup(source, 'lxml')
tableStr = soup.find('table').prettify() # Returns HTML table tag as a string
df = pd.read_html(tableStr, match='str', header=0) # Returns a list of dataframes from tables tags in the input string. header=0 is denote the first row conatins the column lables.
NeighbourhoodInfo = df[0] # First dataframe in the list
NeighbourhoodInfo.columns = ['NeighbourhoodId', 'Neighbourhood', 'Former city/borough', 'Neighbourhoods covered', 'Map']
NeighbourhoodInfo.drop(columns=['Map'], inplace=True)
NeighbourhoodInfo.head()

Unnamed: 0,NeighbourhoodId,Neighbourhood,Former city/borough,Neighbourhoods covered
0,129,Agincourt North,Scarborough,Agincourt and Brimwood
1,128,Agincourt South-Malvern West,Scarborough,Agincourt and Malvern
2,20,Alderwood,Etobicoke,Alderwood
3,95,Annex,Old City of Toronto,The Annex and Seaton Village
4,42,Banbury-Don Mills,North York,Don Mills


In [8]:
def GetCoordinates(Neigbourhood):
    targetPage = r'https://en.wikipedia.org/wiki/{}'.format(Neigbourhood)
    source = requests.get(targetPage).text
    soup = BeautifulSoup(source, 'lxml')
    text = soup.getText()
    try:
        coordinates = re.search(r'(\d{2}.\d{3,8})°N (\d{2}.\d{3,7})°W', text)
        Coordinate = (Neigbourhood, coordinates.group(1), coordinates.group(2))
    except:
        Coordinate = (Neigbourhood, 0, 0)
    return Coordinate

In [9]:
NeighbourhoodInfo['out'] = NeighbourhoodInfo.Neighbourhood.apply(GetCoordinates)
NeighbourhoodInfo[['tag','Latitude','Longitude']] = pd.DataFrame(NeighbourhoodInfo.out.values.tolist(), index= NeighbourhoodInfo.index)
NeighbourhoodInfo.drop(columns=['out','tag'], inplace=True)
NeighbourhoodInfo[['Longitude', 'Latitude']] = NeighbourhoodInfo[['Longitude', 'Latitude']].astype('float')
NeighbourhoodInfo['Longitude']=-NeighbourhoodInfo['Longitude']
NeighbourhoodInfo.sort_values('NeighbourhoodId')
NeighbourhoodInfo.loc[NeighbourhoodInfo.Neighbourhood=='Eglinton East','NeighbourhoodId']=138 # Data Error in Wikipedia
NeighbourhoodInfo.head()

Unnamed: 0,NeighbourhoodId,Neighbourhood,Former city/borough,Neighbourhoods covered,Latitude,Longitude
0,129,Agincourt North,Scarborough,Agincourt and Brimwood,43.7925,-79.28389
1,128,Agincourt South-Malvern West,Scarborough,Agincourt and Malvern,0.0,-0.0
2,20,Alderwood,Etobicoke,Alderwood,0.0,-0.0
3,95,Annex,Old City of Toronto,The Annex and Seaton Village,0.0,-0.0
4,42,Banbury-Don Mills,North York,Don Mills,43.73722,-79.34333


In [10]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Neighbourhood,NeighbourhoodId,Designation,AlternateNeighbourhoodName,Latitude,Longitude,Former city/borough,Neighbourhoods covered
0,City of Toronto,0,,Toronto,43.7,-79.4,,
1,West Humber-Clairville,1,No Designation,West Humber-Clairville,43.742,-79.617,Etobicoke,
2,Mount Olive-Silverstone-Jamestown,2,NIA,Mount Olive-Silverstone-Jamestown,43.73972,-79.58028,Etobicoke,Smithfield
3,Thistletown-Beaumond Heights,3,NIA,Thistletown-Beaumond Heights,43.73722,-79.56528,Etobicoke,
4,Rexdale-Kipling,4,No Designation,Rexdale-Kipling,43.72194,-79.57194,Etobicoke,Rexdale


## 2.2 Data Preperation

In [11]:
# Unique Data Categories
source1.Category.unique()

array(['Neighbourhood Information', 'Population',
       'Families, households and marital status', 'Language', 'Income',
       'Immigration and citizenship', 'Aboriginal peoples',
       'Visible minority', 'Ethnic origin', 'Housing', 'Education',
       'Labour', 'Journey to work', 'Language of work', 'Mobility'], dtype=object)

In [12]:
# List of columns in the dataset 
source1.columns

Index(['Category', 'Topic', 'Data Source', 'Characteristic', 'City of Toronto',
       'Agincourt North', 'Agincourt South-Malvern West', 'Alderwood', 'Annex',
       'Banbury-Don Mills',
       ...
       'Willowdale West', 'Willowridge-Martingrove-Richview', 'Woburn',
       'Woodbine Corridor', 'Woodbine-Lumsden', 'Wychwood', 'Yonge-Eglinton',
       'Yonge-St.Clair', 'York University Heights', 'Yorkdale-Glen Park'],
      dtype='object', length=145)

#### Create Neighbourhood List

In [13]:
NeighbourhoodList = list(source1.columns[4:].values)
NeighbourhoodInformation = source1[source1.Category=='Neighbourhood Information'][NeighbourhoodList].transpose()
NeighbourhoodInformation = NeighbourhoodInformation.reset_index()
NeighbourhoodInformation.columns = ['Neighbourhood', 'NeighbourhoodId', 'TSNS2020Designation']
NeighbourhoodInformation.loc[NeighbourhoodInformation.Neighbourhood=='City of Toronto', 'NeighbourhoodId'] = 0
NeighbourhoodInformation['NeighbourhoodId'] = NeighbourhoodInformation['NeighbourhoodId'].astype('int')
NeighbourhoodInformation.head()

Unnamed: 0,Neighbourhood,NeighbourhoodId,TSNS2020Designation
0,City of Toronto,0,
1,Agincourt North,129,No Designation
2,Agincourt South-Malvern West,128,No Designation
3,Alderwood,20,No Designation
4,Annex,95,No Designation


In [14]:
NeighbourhoodInformation = NeighbourhoodInformation.merge(NeighbourhoodCoordinateInfo[['NeighbourhoodId',  'AlternateNeighbourhoodName', 'Latitude', 'Longitude', 'Former city/borough', 'Neighbourhoods covered']], on='NeighbourhoodId',how='left')
NeighbourhoodInformation.head()

Unnamed: 0,Neighbourhood,NeighbourhoodId,TSNS2020Designation,AlternateNeighbourhoodName,Latitude,Longitude,Former city/borough,Neighbourhoods covered
0,City of Toronto,0,,Toronto,43.7,-79.4,,
1,Agincourt North,129,No Designation,Agincourt North,43.7925,-79.28389,Scarborough,Agincourt and Brimwood
2,Agincourt South-Malvern West,128,No Designation,Agincourt,43.7925,-79.283889,Scarborough,Agincourt and Malvern
3,Alderwood,20,No Designation,Alderwood,43.6075,-79.54028,Etobicoke,Alderwood
4,Annex,95,No Designation,The Annex,43.67,-79.404,Old City of Toronto,The Annex and Seaton Village


### Map view

In [None]:
!conda install -c conda-forge folium=0.5.0 --yes # install folium. Uncomment and run if not installed

Fetching package metadata ...

In [None]:
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors

#### Custom function to mark a set of location data on a Folium map

In [None]:
# Mark locations on map and returns a  "folium.folium.Map" object
def MarkLocationsOnMap(c_lat, c_lon, VenueData, kclusters, l_Latitude, l_Longitude, l_Name, l_Cluster, l_ClusterLabel, zoom= 10, color= None, map_clusters=None):
    
    if map_clusters ==None:
        map_clusters = folium.Map(location=[c_lat, c_lon], zoom_start=zoom)

    # set color scheme for the clusters
    x = np.arange(kclusters)
    ys = [i+x+(i*x)**2 for i in range(kclusters)]
    colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
    rainbow = [colors.rgb2hex(i) for i in colors_array]

    # add markers to the map
    markers_colors = []
    for lat, lon, poi, cluster, cluster_label in zip(VenueData[l_Latitude], VenueData[l_Longitude], VenueData[l_Name], VenueData[l_Cluster], VenueData[l_ClusterLabel]):
        label = folium.Popup(str(poi) + '  (' + str(cluster_label) + ')', parse_html=True)
        try:
            if color==None:
                colorx = rainbow[cluster-1]
                fill_colorx=rainbow[cluster-1]
            else:
                colorx = color
                fill_colorx = color
        except:
            colorx = '#000000'
            fill_colorx = '#FFFFFF'            
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=colorx,
            fill=True,
            fill_color=fill_colorx,
            fill_opacity=0.6).add_to(map_clusters)
    
    return map_clusters

In [None]:
# Set map center to Toronto city center
c_lat = NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId==0]['Latitude'].values[0]
c_lon = NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId==0]['Longitude'].values[0]
kclusters = 140 # Number of neighbourhoods

NeighbourhoodInformation['Latitude'], NeighbourhoodInformation['Longitude'], NeighbourhoodInformation['Neighbourhood'], NeighbourhoodInformation['NeighbourhoodId']

map_clustersx= MarkLocationsOnMap(c_lat, c_lon, NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId==0], kclusters, 'Latitude', 'Longitude', 'Neighbourhood', 'NeighbourhoodId', 'NeighbourhoodId', zoom= 15, color= '#000000', map_clusters=None)
MarkLocationsOnMap(c_lat, c_lon, NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId>0], kclusters, 'Latitude', 'Longitude', 'Neighbourhood', 'NeighbourhoodId', 'NeighbourhoodId', zoom= 15, color= None, map_clusters=map_clustersx)

#### Extracting Neighbourhood data

In [None]:
# List of data records need to extract
SourseDataFields = ['Topic', 'Characteristic'] + NeighbourhoodList

In [None]:
def ExtractData(Source, Category, ColumnsList=None):
    DataSet = Source[Source.Category==Category][SourseDataFields].transpose()
    DataSet.columns = DataSet.loc['Characteristic']
    DataSet = DataSet.drop(index=['Topic', 'Characteristic'])
    if ColumnsList != None:
        DataSet = DataSet[ColumnsList]
    return DataSet

##### Population

In [None]:
Population_col=['Population, 2016', 'Total private dwellings', 'Population density per square kilometre', 'Land area in square kilometres']
AgeCategory_col = ['Children (0-14 years)', 'Youth (15-24 years)', 'Working Age (25-54 years)', 'Pre-retirement (55-64 years)', 'Seniors (65+ years)', 'Older Seniors (85+ years)']

ColumnsList = Population_col + AgeCategory_col
Population = ExtractData(source1, 'Population', ColumnsList)
Population.head()

##### Income

In [None]:
ColumnsList = ["  Number of total income recipients aged 15 years and over in private households", "Total income: Aggregate amount ($'000)", "  Average after-tax income of households in 2015 ($)"]
Income = ExtractData(source1, 'Income', ColumnsList)
Income.head()

##### EthnicOrigin

In [None]:
ColumnsList = ['  North American Aboriginal origins', '  Other North American origins', '  European origins', '  Latin; Central and South American origins', '  African origins', '  Asian origins', '  Oceania origins']
EthnicOrigin = ExtractData(source1, 'Ethnic origin', ColumnsList)
EthnicOrigin.head()

##### Household

In [None]:
ColumnsList = [' Average household size', '  Married or living common law', 'Persons living alone (total)']
Household = ExtractData(source1, 'Families, households and marital status', ColumnsList)
Household.head()

##### Labour

In [None]:
ColumnsList = ['Employment rate']
Labour = ExtractData(source1, 'Labour', ColumnsList)
Labour.head()##### Population

##### Safety

In [None]:
# The code was removed by Watson Studio for sharing.

In [None]:
Safety = pd.read_excel(streaming_body_1,sheet_name ='RawData-Ref Period 2011', header=1)
Safety = Safety.set_index('Neighbourhood Id')[['Total Major Crime Incidents', 'Break & Enters', 'Robberies', 'Thefts']]
Safety.head()

##### Economics

In [None]:
# The code was removed by Watson Studio for sharing.

In [None]:
Economics = pd.read_excel(streaming_body_2,sheet_name ='RawData-Ref Period 2011', header=1)
Economics = Economics.set_index('Neighbourhood Id')[['Businesses', 'Debt Risk Score', 'Home Prices', 'Local Employment']]
Economics.head()

##### Transportation

In [None]:
# The code was removed by Watson Studio for sharing.

In [None]:
Transportation = pd.read_excel(streaming_body_3,sheet_name ='RawData-Ref Period 2011', header=0)
Transportation = Transportation.set_index('Neighbourhood Id')[['TTC Stops']]
Transportation.head()

### 2.3 Creating the Base Dataset to further analysis and modeling

In [None]:
DataCategories = ['Population', 'Income', 'EthnicOrigin', 'Household', 'Labour', 'Safety', 'Economics', 'Transportation']
PopulationVariables = list(Population.columns.values)
IncomeVariables = list(Income.columns.values)
EthnicOriginVariables = list(EthnicOrigin.columns.values)
HouseholdVariables = list(Household.columns.values)
LabourVariables = list(Labour.columns.values)
SafetyVariables = list(Safety.columns.values)
EconomicsVariables = list(Economics.columns.values)
TransportationVariables = list(Transportation.columns.values)

AllVariables = PopulationVariables+IncomeVariables+EthnicOriginVariables+HouseholdVariables+LabourVariables+SafetyVariables+EconomicsVariables+TransportationVariables

In [None]:
NeighbourhoodDataSet = NeighbourhoodInformation.merge(Population, left_on='Neighbourhood', right_index=True, how='left')
# Merge with rest of the data sets
NeighbourhoodDataSet = NeighbourhoodDataSet.merge(Income, left_on='Neighbourhood', right_index=True, how='left')
NeighbourhoodDataSet = NeighbourhoodDataSet.merge(EthnicOrigin, left_on='Neighbourhood', right_index=True, how='left')
NeighbourhoodDataSet = NeighbourhoodDataSet.merge(Household, left_on='Neighbourhood', right_index=True, how='left')
NeighbourhoodDataSet = NeighbourhoodDataSet.merge(Labour, left_on='Neighbourhood', right_index=True, how='left')

NeighbourhoodDataSet = NeighbourhoodDataSet.merge(Safety, left_on='NeighbourhoodId', right_index=True, how='left')
NeighbourhoodDataSet = NeighbourhoodDataSet.merge(Economics, left_on='NeighbourhoodId', right_index=True, how='left')
NeighbourhoodDataSet = NeighbourhoodDataSet.merge(Transportation, left_on='NeighbourhoodId', right_index=True, how='left')

In [None]:
#Convert all variables to numeric
NeighbourhoodDataSet[AllVariables] = NeighbourhoodDataSet[AllVariables].astype('float')


In [None]:
# Construct Combined Fields
NeighbourhoodDataSet['BreakEntersRobberiesThefts'] = NeighbourhoodDataSet['Break & Enters']+NeighbourhoodDataSet['Robberies']+NeighbourhoodDataSet['Thefts']
NeighbourhoodDataSet['BuisnessTargetPopulation'] = NeighbourhoodDataSet['Youth (15-24 years)']+NeighbourhoodDataSet['Working Age (25-54 years)']+NeighbourhoodDataSet['Pre-retirement (55-64 years)']
NeighbourhoodDataSet['Diversity'] = (NeighbourhoodDataSet['\xa0 European origins']+NeighbourhoodDataSet['\xa0 Latin; Central and South American origins']+NeighbourhoodDataSet['\xa0 African origins']+NeighbourhoodDataSet['\xa0 Asian origins']+NeighbourhoodDataSet['\xa0 Oceania origins']) / (NeighbourhoodDataSet['\xa0 North American Aboriginal origins']+NeighbourhoodDataSet['\xa0 Other North American origins'])

CombinedVariables = ['BreakEntersRobberiesThefts', 'BuisnessTargetPopulation', 'Diversity']

In [None]:
#Sperate city totals from Neighbourhood data
CityDataSet = NeighbourhoodDataSet[NeighbourhoodDataSet.NeighbourhoodId==0]
NeighbourhoodDataSet = NeighbourhoodDataSet[NeighbourhoodDataSet.NeighbourhoodId>0]

In [None]:
NeighbourhoodDataSet.head()

In [None]:
CityDataSet.head()

#### Saving the dataset

In [None]:
# The code was removed by Watson Studio for sharing.

<hr/>

### 2.4 Exploring the area using Foursquare API (Source3: Venue dataset)

In [None]:
import json
import requests 
from pandas.io.json import json_normalize

####  Four Squre Credentials and URL

In [None]:
# The code was removed by Watson Studio for sharing.

#### API Call returns JSON object resons : GetVenues(latitude, longitude, radius, limit, query)
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)

results = requests.get(url).json()

In [None]:
# The code was removed by Watson Studio for sharing.

In [None]:
def GetCategories(categoriesJSON):
    try:
        category = json_normalize(categoriesJSON[0])
        category = category[category.primary==True]['shortName'].values[0]
    except:
        category = None
    return category

In [None]:
def GetVenuesData(NeighbourhoodId, latitude, longitude, radius, limit, query):
    print('Working on Neighbourhood {}'.format(NeighbourhoodId))
    results = GetVenues(latitude, longitude, radius, limit, query)
    venues = results['response']['groups'][0]['items']
    VenuesDataSet = json_normalize(venues) # fConvert JASON to Pandas dataframe
    try:
        VenuesDataSet['venue.category']=VenuesDataSet['venue.categories'].apply(GetCategories)
    except:
        VenuesDataSet['venue.category'] = None
    VenuesDataSet['NeighbourhoodId']=NeighbourhoodId
    VenuesDataSet['query']=query
    VenuesDataSet = VenuesDataSet.loc[:,VenuesDataSet.columns.isin(['NeighbourhoodId', 'query', 'venue.id', 'venue.name', 'venue.category', 'venue.location.lat', 'venue.location.lng', 'venue.location.postalCode', 'venue.photos.count'])]
    return VenuesDataSet

In [None]:
# Test (Change format when needed)
query = 'Restaurant'

Neighbourhood = 'Broadview North'
d = NeighbourhoodDataSet[NeighbourhoodDataSet.Neighbourhood==Neighbourhood][['Neighbourhood', 'NeighbourhoodId', 'Latitude', 'Longitude', 'Land area in square kilometres']].values[0]
limit = 20
NeighbourhoodId = d[1]
latitude =  d[2]
longitude = d[3]
radius = np.sqrt(d[4]/np.pi)*1000 # Radius in meters approximated usin neighbourhood area
radius = int(np.ceil(radius/100)*100) 
print(NeighbourhoodId, latitude, longitude, radius, limit, query)
Restaurants = GetVenuesData(NeighbourhoodId, latitude, longitude, radius, limit, query)
Restaurants

#### Get data from venues of interest

##### Restaurant

##### Grocery Store

##### Fun

##### Shopping

##### Parking

##### Hotel

#### Combining Venue DataSets

#### Source3: Load pre extracted venue dataset
A previously extracted dataset using above mentioned codes were used in the daat analysis

In [None]:
# The code was removed by Watson Studio for sharing.

In [None]:
VenueDataSet = source3

In [None]:
VenueTypes = VenueDataSet['query'].unique()
VenueDataSet['VenueType']=None

for n in range(len(VenueTypes)):
    VenueDataSet.loc[VenueDataSet['query']==VenueTypes[n],'VenueType']= n+1

#VenueDataSet.loc[:,'VenueType'] = n+1
VenueDataSet.head()

#### Mark Neighbourhood venues on the map

In [None]:
def MapNeighbourhood(SelectedNeighbourhoodId=None, SelectedNeighbourhoodName=None, NeighbourhoodDataSet=None, VenueDataSet=None, zoom= 10):
    if SelectedNeighbourhoodId != None:
        SelectedNeighbourhood = NeighbourhoodDataSet[NeighbourhoodDataSet.NeighbourhoodId== SelectedNeighbourhoodId]
    elif SelectedNeighbourhoodName != None:
        SelectedNeighbourhood = NeighbourhoodDataSet[NeighbourhoodDataSet.Neighbourhood== SelectedNeighbourhoodName]
    else:
        return None

    # Number of clusters
    kclusters = len(VenueDataSet['VenueType'].unique()) # Number of categories

    # Set map center to Toronto city center
    c_lat = SelectedNeighbourhood['Latitude'].values[0]
    c_lon = SelectedNeighbourhood['Longitude'].values[0]

    # Create the map plot
    zoom= zoom
    map_clusters_x = MarkLocationsOnMap(c_lat, c_lon, SelectedNeighbourhood, len(SelectedNeighbourhood.index), l_Latitude='Latitude', l_Longitude='Longitude', l_Name='Neighbourhood', l_Cluster='NeighbourhoodId', l_ClusterLabel='NeighbourhoodId', zoom= zoom, color='#000000')
    MarkLocationsOnMap(c_lat, c_lon, VenueDataSet[VenueDataSet.NeighbourhoodId==SelectedNeighbourhood.NeighbourhoodId.values[0]], kclusters, l_Latitude='venue.location.lat', l_Longitude='venue.location.lng', l_Name='venue.name', l_Cluster='VenueType', l_ClusterLabel='venue.category', zoom= zoom, map_clusters=map_clusters_x)
    
    return map_clusters_x

In [None]:
MapNeighbourhood(SelectedNeighbourhoodId=None, SelectedNeighbourhoodName='University', NeighbourhoodDataSet=NeighbourhoodDataSet, VenueDataSet=VenueDataSet, zoom= 15)

####  Counts of venues of interest by neighbourhood

In [None]:
VenuesCount = pd.pivot_table(VenueDataSet, index='NeighbourhoodId', columns='query', aggfunc='count', values='venue.id', margins=True)
VenuesCount = VenuesCount.fillna(0).astype('int')
VenueCountsVariables = list(VenuesCount.columns.values[:-1]) #Remove 'All'
VenuesCount.sort_values('All', ascending=False).head()

In [None]:
VenuesCount['SupportiveVenues'] = VenuesCount['Parking'] + VenuesCount['Shopping'] + VenuesCount['Fun'] + VenuesCount['Hotel']
VenuesCount['CompetitiveVenues'] =  VenuesCount['Grocery Store'] + VenuesCount['Restaurant']

CombinedVariables = CombinedVariables+['SupportiveVenues', 'CompetitiveVenues']

In [None]:
NeighbourhoodDataSetToModel = NeighbourhoodDataSet.merge(VenuesCount[VenuesCount.index!='All'].drop(columns='All'), left_on='NeighbourhoodId', right_index=True, how='left')
NeighbourhoodDataSetToModel.head()

In [None]:
NeighbourhoodDataSetToModel['NeighbourhoodLabel'] = "["+ NeighbourhoodDataSetToModel['NeighbourhoodId'].astype('str')+"] "+NeighbourhoodDataSetToModel['Neighbourhood']
NeighbourhoodDataSetToModel.head()

<hr/>

## 3. Exploratory Data Analysis

In [None]:
NeighbourhoodDataSetToModel_stat=NeighbourhoodDataSetToModel.describe()
NeighbourhoodDataSetToModel_stat

In [None]:
import matplotlib.pyplot as plt

In [None]:
DataFields = ['Land area in square kilometres', 'Population, 2016', 'Employment rate', 'Businesses','Debt Risk Score', 'Home Prices', 'Local Employment', '  Number of total income recipients aged 15 years and over in private households', "Total income: Aggregate amount ($'000)",  '  Average after-tax income of households in 2015 ($)', ' Average household size', 'Total Major Crime Incidents', 'BreakEntersRobberiesThefts', 
'\xa0 North American Aboriginal origins',  '\xa0 Other North American origins', '\xa0 European origins', '\xa0 Latin; Central and South American origins', '\xa0 African origins', '\xa0 Asian origins', '\xa0 Oceania origins']
X=7
Y=3
FigX=25 
FigY=25


In [None]:
fig, axes = plt.subplots(X, Y,  figsize=(FigX, FigY))
plt.subplots_adjust(wspace=0.3, hspace=0.6)

for i in range(X):
    for j in range(Y):        
        n = Y*i+j
        try:
            NeighbourhoodDataSetToModel[DataFields[n]].plot(kind='hist', rot=90, ax=axes[i,j]); axes[i,j].set_title(DataFields[n]);
        except:
            pass

In [None]:
fig, axes = plt.subplots(X, Y,  figsize=(FigX, FigY))
plt.subplots_adjust(wspace=0.3, hspace=0.6)

for i in range(X):
    for j in range(Y):        
        n = Y*i+j
        try:
            NeighbourhoodDataSetToModel[DataFields[n]].plot(kind='kde', rot=90, ax=axes[i,j]); axes[i,j].set_title(DataFields[n]);
        except:
            pass

In [None]:
fig, axes = plt.subplots(X, Y,  figsize=(FigX, FigY))
plt.subplots_adjust(wspace=0.3, hspace=0.6)

for i in range(X):
    for j in range(Y):        
        n = Y*i+j
        try:
            NeighbourhoodDataSetToModel[DataFields[n]].plot(kind='box', rot=0, ax=axes[i,j]); axes[i,j].set_title(DataFields[n]);
        except:
            pass

In [None]:
fig, axes = plt.subplots(X, Y,  figsize=(FigX, FigY))
plt.subplots_adjust(wspace=0.3, hspace=0.6)

TopN = 14

for i in range(X):
    for j in range(Y):        
        n = Y*i+j
        try:
            NeighbourhoodDataSetToModel.sort_values(DataFields[n], ascending=False).head(TopN).plot(x='NeighbourhoodId', y=DataFields[n], kind='bar', legend=False, rot=90, ax=axes[i,j]); axes[i,j].set_title(DataFields[n]);
        except:
            pass

#### Distribution of Origins for selected neighbourhoods

In [None]:
Neighbourhoods = ['Bayview Village', 'Waterfront Communities-The Island', 'Niagara', 'Banbury-Don Mills', 'Willowdale East'] + ['Oakridge', 'Elms-Old Rexdale', 'Beechborough-Greenbrook', 'Blake-Jones', 'Rustic']

X=2
Y=5
FigX = 20
FigY = 10

fig, axes = plt.subplots(X, Y,  figsize=(FigX, FigY))
plt.subplots_adjust(wspace=0.3, hspace=1.8)

for i in range(X):
    for j in range(Y):        
        n = Y*i+j
        #print(i, j , n)
        SelectedNeighbourhood = NeighbourhoodDataSetToModel[NeighbourhoodDataSetToModel.Neighbourhood==Neighbourhoods[n]][['\xa0 North American Aboriginal origins',  '\xa0 Other North American origins', '\xa0 European origins', '\xa0 Latin; Central and South American origins', '\xa0 African origins', '\xa0 Asian origins', '\xa0 Oceania origins']].transpose()
        SelectedNeighbourhood.plot(kind='bar', legend=False, rot=90, ax=axes[i,j]); axes[i,j].set_title(Neighbourhoods[n]);
        

#### Distribution of population by age for selected neighbourhoods

In [None]:
Neighbourhoods = ['Bayview Village', 'Waterfront Communities-The Island', 'Niagara', 'Banbury-Don Mills', 'Willowdale East'] + ['Oakridge', 'Elms-Old Rexdale', 'Beechborough-Greenbrook', 'Blake-Jones', 'Rustic']

X=2
Y=5
FigX = 20
FigY = 10

fig, axes = plt.subplots(X, Y,  figsize=(FigX, FigY))
plt.subplots_adjust(wspace=0.3, hspace=1.8)

for i in range(X):
    for j in range(Y):        
        n = Y*i+j
        #print(i, j , n)
        SelectedNeighbourhood = NeighbourhoodDataSetToModel[NeighbourhoodDataSetToModel.Neighbourhood==Neighbourhoods[n]][['Children (0-14 years)', 'Youth (15-24 years)', 'Working Age (25-54 years)', 'Pre-retirement (55-64 years)', 'Seniors (65+ years)', 'Older Seniors (85+ years)']
].transpose()
        SelectedNeighbourhood.plot(kind='bar', legend=False, rot=90, ax=axes[i,j]); axes[i,j].set_title(Neighbourhoods[n]);
        

## 4. Clustering Neighbourhoods

In [None]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

### 4.1 Finding best number of clusters (k) using "Sum of squared distances" 
Reference: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html


In [None]:
from sklearn import metrics
from scipy.spatial.distance import cdist

In [None]:
def SumOfSquaredDistances(Dataset, VariableList, max_k = 10):
    ssqd=[]
    k_list = []
    for k in range(1, max_k):
        kmeans = KMeans(n_clusters=k, max_iter=1000).fit(Dataset[AllVariables])
        k_list.append(k)
        ssqd.append(kmeans.inertia_)
    k_list=np.array(k_list)
    ssqd=np.array(ssqd)
    return k_list, ssqd/np.max(ssqd)

In [None]:
plt.figure()
#Population
AllVariables = ['Population, 2016', 'BuisnessTargetPopulation', 'Diversity',  'Population density per square kilometre', 'Total private dwellings', ' Average household size'] 
k_list, ssqd = SumOfSquaredDistances(NeighbourhoodDataSetToModel, AllVariables)
plt.plot(k_list, ssqd, "o-", label="Population")

#Economy
AllVariables = ['Employment rate', 'Businesses','Debt Risk Score', 'Home Prices', 'Local Employment', '  Number of total income recipients aged 15 years and over in private households', "Total income: Aggregate amount ($'000)",  '  Average after-tax income of households in 2015 ($)'] 
k_list, ssqd = SumOfSquaredDistances(NeighbourhoodDataSetToModel, AllVariables)
plt.plot(k_list, ssqd, "o-", label="Economy")

#Locality
AllVariables = ['Population density per square kilometre', 'Land area in square kilometres', 'Businesses', 'TTC Stops', 'SupportiveVenues', 'CompetitiveVenues', 'Grocery Store', 'Restaurant', 'Diversity'] 
k_list, ssqd = SumOfSquaredDistances(NeighbourhoodDataSetToModel, AllVariables)
plt.plot(k_list, ssqd, "o-", label="Locality")
                                     
#Safty
AllVariables = [ 'Total Major Crime Incidents',  'Break & Enters',  'Robberies',  'Thefts', 'BreakEntersRobberiesThefts'] 
k_list, ssqd = SumOfSquaredDistances(NeighbourhoodDataSetToModel, AllVariables)
plt.plot(k_list, ssqd, "o-", label="Safty")
                                     
plt.xlabel("Number of clusters (k)")
plt.ylabel("Sum of squared distances (Normalized)")
plt.legend()
plt.show()

In [None]:
def ClusterNeighbourhoods(NeighbourhoodDataSetToModel, VariableList, ClusterColumn='Cluster Labels', kclusters=4):

    # run k-means clustering
    kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(NeighbourhoodDataSetToModel[VariableList])

    # check cluster labels generated for each row in the dataframe
    kmeans.labels_[0:10] 

    NeighbourhoodDataSetToModel[ClusterColumn] = kmeans.labels_
   
    return NeighbourhoodDataSetToModel

##### Cluster by: All variables

In [None]:
# set number of clusters
kclusters = 4

#Variable list
AllVariables = PopulationVariables + IncomeVariables + EthnicOriginVariables + HouseholdVariables + LabourVariables + SafetyVariables + EconomicsVariables + TransportationVariables + VenueCountsVariables + CombinedVariables

NeighbourhoodDataSetToModel_x = ClusterNeighbourhoods(NeighbourhoodDataSetToModel, AllVariables, ClusterColumn='Cluster Labels', kclusters=kclusters)

# Set map center to Toronto city center
c_lat = NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId==0]['Latitude'].values[0]
c_lon = NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId==0]['Longitude'].values[0]

map_clustersx= MarkLocationsOnMap(c_lat, c_lon, NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId==0], kclusters, 'Latitude', 'Longitude', 'Neighbourhood', 'NeighbourhoodId', 'NeighbourhoodId', zoom= 10, color= '#000000', map_clusters=None)
MarkLocationsOnMap(c_lat, c_lon, NeighbourhoodDataSetToModel_x, kclusters, 'Latitude', 'Longitude', 'Neighbourhood', 'Cluster Labels', 'NeighbourhoodId', zoom= 10, color= None, map_clusters=map_clustersx)

#### Cluster 1 (Population)

In [None]:
kclusters = 4

#Variable list
AllVariables = ['Population, 2016', 'BuisnessTargetPopulation', 'Diversity',  'Population density per square kilometre', 'Total private dwellings', ' Average household size'] 

NeighbourhoodDataSetToModel_x = ClusterNeighbourhoods(NeighbourhoodDataSetToModel, AllVariables, ClusterColumn='Population_Cluster', kclusters=kclusters)

# Set map center to Toronto city center
c_lat = NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId==0]['Latitude'].values[0]
c_lon = NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId==0]['Longitude'].values[0]

map_clustersx= MarkLocationsOnMap(c_lat, c_lon, NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId==0], kclusters, 'Latitude', 'Longitude', 'Neighbourhood', 'NeighbourhoodId', 'NeighbourhoodId', zoom= 10, color= '#000000', map_clusters=None)
MarkLocationsOnMap(c_lat, c_lon, NeighbourhoodDataSetToModel_x, kclusters, 'Latitude', 'Longitude', 'NeighbourhoodLabel', 'Population_Cluster', 'Population_Cluster', zoom= 10, color= None, map_clusters=map_clustersx)

#### Cluster 2 (Economy)

In [None]:
kclusters = 4

#Variable list
AllVariables = ['Employment rate', 'Businesses','Debt Risk Score', 'Home Prices', 'Local Employment', '  Number of total income recipients aged 15 years and over in private households', "Total income: Aggregate amount ($'000)",  '  Average after-tax income of households in 2015 ($)'] 

NeighbourhoodDataSetToModel_x = ClusterNeighbourhoods(NeighbourhoodDataSetToModel, AllVariables, ClusterColumn='Economy_Cluster', kclusters=kclusters)

# Set map center to Toronto city center
c_lat = NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId==0]['Latitude'].values[0]
c_lon = NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId==0]['Longitude'].values[0]

map_clustersx= MarkLocationsOnMap(c_lat, c_lon, NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId==0], kclusters, 'Latitude', 'Longitude', 'Neighbourhood', 'NeighbourhoodId', 'NeighbourhoodId', zoom= 10, color= '#000000', map_clusters=None)
MarkLocationsOnMap(c_lat, c_lon, NeighbourhoodDataSetToModel_x, kclusters, 'Latitude', 'Longitude', 'NeighbourhoodLabel', 'Economy_Cluster', 'Economy_Cluster', zoom= 10, color= None, map_clusters=map_clustersx)

#### Cluster 3 (Locality)

In [None]:
kclusters = 4

#Variable list
AllVariables = ['Population density per square kilometre', 'Land area in square kilometres', 'Businesses', 'TTC Stops', 'SupportiveVenues', 'CompetitiveVenues', 'Grocery Store', 'Restaurant', 'Diversity'] 

NeighbourhoodDataSetToModel_x = ClusterNeighbourhoods(NeighbourhoodDataSetToModel, AllVariables, ClusterColumn='Locality_Cluster', kclusters=kclusters)

# Set map center to Toronto city center
c_lat = NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId==0]['Latitude'].values[0]
c_lon = NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId==0]['Longitude'].values[0]

map_clustersx= MarkLocationsOnMap(c_lat, c_lon, NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId==0], kclusters, 'Latitude', 'Longitude', 'Neighbourhood', 'NeighbourhoodId', 'NeighbourhoodId', zoom= 10, color= '#000000', map_clusters=None)
MarkLocationsOnMap(c_lat, c_lon, NeighbourhoodDataSetToModel_x, kclusters, 'Latitude', 'Longitude', 'NeighbourhoodLabel', 'Locality_Cluster', 'Locality_Cluster', zoom= 10, color= None, map_clusters=map_clustersx)

#### Cluster 4 (Safty)

In [None]:
kclusters = 4

#Variable list
AllVariables = [ 'Total Major Crime Incidents',  'Break & Enters',  'Robberies',  'Thefts', 'BreakEntersRobberiesThefts'] 

NeighbourhoodDataSetToModel_x = ClusterNeighbourhoods(NeighbourhoodDataSetToModel, AllVariables, ClusterColumn='Safty_Cluster', kclusters=kclusters)

# Set map center to Toronto city center
c_lat = NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId==0]['Latitude'].values[0]
c_lon = NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId==0]['Longitude'].values[0]

map_clustersx= MarkLocationsOnMap(c_lat, c_lon, NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId==0], kclusters, 'Latitude', 'Longitude', 'Neighbourhood', 'NeighbourhoodId', 'NeighbourhoodId', zoom= 10, color= '#000000', map_clusters=None)
MarkLocationsOnMap(c_lat, c_lon, NeighbourhoodDataSetToModel_x, kclusters, 'Latitude', 'Longitude', 'NeighbourhoodLabel', 'Safty_Cluster', 'Safty_Cluster', zoom= 10, color= None, map_clusters=map_clustersx)

In [None]:
AllVariables

##### Cluster by: SafetyVariables + EconomicsVariables + TransportationVariables + VenueCountsVariables

In [None]:
kclusters = 4

AllVariables = SafetyVariables + EconomicsVariables + TransportationVariables + VenueCountsVariables

NeighbourhoodDataSetToModel_x = ClusterNeighbourhoods(NeighbourhoodDataSetToModel, AllVariables, ClusterColumn='Cluster Labels', kclusters=kclusters)

# Set map center to Toronto city center
c_lat = NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId==0]['Latitude'].values[0]
c_lon = NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId==0]['Longitude'].values[0]

map_clustersx= MarkLocationsOnMap(c_lat, c_lon, NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId==0], kclusters, 'Latitude', 'Longitude', 'Neighbourhood', 'NeighbourhoodId', 'NeighbourhoodId', zoom= 10, color= '#000000', map_clusters=None)
MarkLocationsOnMap(c_lat, c_lon, NeighbourhoodDataSetToModel_x, kclusters, 'Latitude', 'Longitude', 'Neighbourhood', 'Cluster Labels', 'NeighbourhoodId', zoom= 10, color= None, map_clusters=map_clustersx)

## 5. Rank Neighbourhoods

In [None]:
def RankDataSet(NeighbourhoodDataSetToModel, RankVariables):
    for var in RankVariables:
        NeighbourhoodDataSetToModel['Rank_{}'.format(var)] = NeighbourhoodDataSetToModel[var].rank(ascending=False).astype('int')
    return NeighbourhoodDataSetToModel

In [None]:
AllVariables = PopulationVariables + IncomeVariables + EthnicOriginVariables + HouseholdVariables + LabourVariables + SafetyVariables + EconomicsVariables + TransportationVariables + VenueCountsVariables + CombinedVariables
RankVariablesPositive = ['Population, 2016', 'BuisnessTargetPopulation', 'Diversity', 'SupportiveVenues', '  Average after-tax income of households in 2015 ($)', ' Average household size', 'Debt Risk Score',  'Employment rate', 'Local Employment', 'Businesses', '  Married or living common law', 'Persons living alone (total)']
RankVariablesNegative = ['CompetitiveVenues', 'Home Prices', 'BreakEntersRobberiesThefts', 'Total Major Crime Incidents']
RankVariablesUnsure = ['Grocery Store', 'Restaurant']
RankVariables = RankVariablesPositive + RankVariablesNegative + RankVariablesUnsure

In [None]:
NeighbourhoodDataSetToModel_Ranked = RankDataSet(NeighbourhoodDataSetToModel, RankVariables)

In [None]:
NeighbourhoodDataSetToModel_Ranked['Score'] = NeighbourhoodDataSetToModel_Ranked[['{}_{}'.format(a, b) for b in RankVariablesPositive for a in ['Rank']]].sum(axis=1) - \
NeighbourhoodDataSetToModel_Ranked[['{}_{}'.format(a, b) for b in RankVariablesNegative for a in ['Rank']]].sum(axis=1)*2

In [None]:
#NeighbourhoodDataSetToModel_Ranked[['Neighbourhood', 'NeighbourhoodId', 'Population, 2016', 'Rank_Population, 2016']]
NeighbourhoodDataSetToModel_Ranked['Rank'] = NeighbourhoodDataSetToModel_Ranked['Score'].rank(ascending=True).astype('int')
NeighbourhoodDataSetToModel_Ranked[['Neighbourhood', 'NeighbourhoodId', 'Population, 2016', 'Score', 'Rank']].sort_values(by=['Rank']).head()

In [None]:
NeighbourhoodDataSetToModel_Ranked[['Neighbourhood', 'NeighbourhoodId', 'Population, 2016', 'Score', 'Rank']].sort_values(by=['Rank']).tail()

In [None]:
kclusters = 140

# Set map center to Toronto city center
c_lat = NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId==0]['Latitude'].values[0]
c_lon = NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId==0]['Longitude'].values[0]

map_clustersx= MarkLocationsOnMap(c_lat, c_lon, NeighbourhoodInformation[NeighbourhoodInformation.NeighbourhoodId==0], kclusters, 'Latitude', 'Longitude', 'Neighbourhood', 'NeighbourhoodId', 'NeighbourhoodId', zoom= 10, color= '#000000', map_clusters=None)
MarkLocationsOnMap(c_lat, c_lon, NeighbourhoodDataSetToModel_Ranked, kclusters, 'Latitude', 'Longitude', 'NeighbourhoodLabel', 'Rank', 'Rank', zoom= 10, color= None, map_clusters=map_clustersx)

## 6. Explore High Ranked Neighbourhoods

In [None]:
# Bayview Village
MapNeighbourhood(SelectedNeighbourhoodId=52, SelectedNeighbourhoodName=None, NeighbourhoodDataSet=NeighbourhoodDataSet, VenueDataSet=VenueDataSet, zoom= 15)

In [None]:
# Henry Farm
MapNeighbourhood(SelectedNeighbourhoodId=53, SelectedNeighbourhoodName=None, NeighbourhoodDataSet=NeighbourhoodDataSet, VenueDataSet=VenueDataSet, zoom= 15)

In [None]:
#88	Niagara
MapNeighbourhood(SelectedNeighbourhoodId=88, SelectedNeighbourhoodName=None, NeighbourhoodDataSet=NeighbourhoodDataSet, VenueDataSet=VenueDataSet, zoom= 15)

In [None]:
#Blake-Jones	
MapNeighbourhood(SelectedNeighbourhoodId=69, SelectedNeighbourhoodName=None, NeighbourhoodDataSet=NeighbourhoodDataSet, VenueDataSet=VenueDataSet, zoom= 15)

In [None]:
MapNeighbourhood(SelectedNeighbourhoodId=69, SelectedNeighbourhoodName=None, NeighbourhoodDataSet=NeighbourhoodDataSet, VenueDataSet=VenueDataSet, zoom= 15)

In [None]:
NeighbourhoodDataSetToModel_Ranked[['Neighbourhood', 'NeighbourhoodId', 'Population, 2016', 'Rank','Population_Cluster',
       'Economy_Cluster', 'Locality_Cluster', 'Safty_Cluster',
       'Rank_Population, 2016', 'Rank_BuisnessTargetPopulation',
       'Rank_Diversity', 'Rank_SupportiveVenues',
       'Rank_  Average after-tax income of households in 2015 ($)',
       'Rank_ Average household size', 'Rank_Debt Risk Score',
       'Rank_Employment rate', 'Rank_Local Employment', 'Rank_Businesses',
       'Rank_  Married or living common law',
       'Rank_Persons living alone (total)', 'Rank_CompetitiveVenues',
       'Rank_Home Prices', 'Rank_BreakEntersRobberiesThefts',
       'Rank_Total Major Crime Incidents', 'Rank_Grocery Store',
       'Rank_Restaurant']].sort_values(by=['Rank'])

In [None]:
# The code was removed by Watson Studio for sharing.

<hr/>

<i>This notebook is created by Sumudu Tennakoon  <a href='https://github.com/sptennak/Coursera_Capstone'>[GitHub Link]</a> </i>

<i> Last Update: 2018-01-03 </i>

Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/