<h1 align=center><font size = 5>Analyzing the neighborhoods in the city of Toronto</font></h1>
 

<h1 align=center><font size = 2>Bhuvanesh Selvakumar</font></h1>


### Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">
<font size = 2>

1. [Introduction](#1)<br> 
    1.1. [Problem Statement](#1.1)<br>
    1.2. [Methodology](#1.2)<br><br>
    
2. [Data Mining & Wrangling](#2)<br>
    2.1. [Importing the necessary libraries for analysis](#2.1)<br>
    2.2. [Importing open-source data from Wikipedia on Toronto's neighboorhoods and demography](#2.2)<br>
    2.3. [Import Toronto's demography dataset from Wikipedia, extract and clean the data](#2.3)<br>
    2.4. [Query location data from Foursquare API, and generate  map of Toronto using neighborhood dataset](#2.4)<br>
    2.5. [Add Latitude, Longitude and Address of the neighborhoods to the demography dataframe](#2.5)<br><br>

3. [Analyze the neighborhoods using Foursquare API](#3)<br>
    3.1. [Input user credentials to access Foursquare API](#3.1)<br>
    3.2. [Define a function to query, sort data from Categories list and transfer them into a dataframe](#3.2)<br>
    3.3. [Apply one-hot encoding](#3.3)<br>
    3.4. [Identify neighborhoods with active Tamil population and evaluate if they have any Indian restaurants](#3.4)<br>
    3.5. [Implement k-means clusterning to develop neighborhood clusters with Indian restaurants, and detailed analysis](#3.5)<br><br>

4. [Conclusions and Recommendations](#4)<br>
    </font>
    </div>
    
    
   


### 1. Introduction <a class="anchor" id="1"></a>


> The Tamil community (native to Southern India, Sri Lanka and Malaysia) are an important part of the Indian diaspora in Canada, and offer an ethnic cuisine that is unique and different from the mainstream Indian cuisine. Rice is the staple dietary constituent and heavily leans on to use of ground spices and root herbs for cooking. While many Indian restaurants in Canada include popular South Indian recipes in their menu, the traditional/native recipes are often ruled out due to the very small target population, their extreme flavor and spice variations. However with the growing Tamil population in Toronto coupled with the community's interest in expanding the commercial/culture outreach, the demand and potential of success in opening a dedicated South Indian restaurant is very high.



#### 1.1. Problem Statement  <a class="anchor" id="1.1"></a>

> An investor is considering the opportunity to open a South Indian restaurant in Toronto with a focus on offering authentic Tamil recipes along with popular Indian menu. The primary target customer base is the Indian diaspora in Canada due to their growing expectation and population in Toronto. The investor wants to identify neighborhoods with leading Tamil population as secondary ethnic group, as potential locations for starting the restaurant with limited/no business competition from other Indian restaurants.


#### 1.2. Methodology  <a class="anchor" id="1.2"></a>

> 1. Data Mining - Data on Toronto's demography, neighborhoods and geo-locations were queried from the following websites:<br>
>       a. Neighborhoods: https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=1008658788 <br>
>       b. Demography: https://en.wikipedia.org/wiki/Demographics_of_Toronto_neighbourhoods <br>
>       c. Activities: FourSquare API<br><br>
> The "read_html" function was used to mine data on Toronto's neighborhoods, demography and postal codes from wikipedia. The postal code data was then used to run the geocoder function to query latitude and longitude for each of the boroughs in Toronto. The foursquare API function is used to generate insights on the neighborhoods and present them in a Folium map for visualization <br>

> 2. Data Cleaning - Sort and organize the queried raw data into a structured dataframe. Concatenate columns and segregate data as necessary and required. Create a color trend for the data in the dataframe based on ethnicity and neighborhood <br><br>
> The raw dataframe on Toronto's demography included many columns which were deemed unimportant for this project, and were aptly excluded using the drop function. Also, this dataframe presented the secondary ethnicity of the neighborhoods under the column "Ethnicity" where the Ethnicity and and its composition were printed as a single string. This limited the flexibility in using this data for further analysis. The "split" and "strip" functions were used to extract the data and tabulate them in separate columns. Using the geocoder function to query the latitude and longitude of the neighborhoods initially resulted in inaccurate results, as the output data included cities from US and UK due to similar neighbordhood names. To localize the search to Toronto, a new column (Name-ccat) was created to list the neighborhood names concatenated with the city name (x, Toronto). <br>

> 3. Data Visualization - Generate map of Toronto and visualize different neighborhoods including their activities, boroughs, ethnicities etc. <br><br>
> The neighborhoods are plotted on Toronto map using the "folium" function. To color-code the neighborhoods based on the secondary ethnicity and suburb region (Markham, Scarborough etc.), the "rgb2hex" function was used to generate colors for each neighborhood based on the above listed criteria and stored under the following columns: [Color_ethnicity, Color_FM].

> 4. Data Processing - Apply filters to identify neighborhood clusters (one hot encoding, k-means clustering) with high frequency of Indian restaurants and Tamil population (data indexing)<br><br>
> Foursquare API function's venue search feature was used to generate event category list for each neighborhood and keep count on the activity list. The output of this query is stored in the dataframe: "Toronto_venues".

> 5. Data Analysis - Integrate tabuluar data (data frames) and visual analysis (color-coded maps) to identify the neighborhoods with high probability of success opening a South Indian restaurant and present summary of recommendations. Restaurants with high Tamil population and low frequency of Indian restaurants is deemed desirable and ideal for investment. Rank top 5 neighborhoods for opening the South Indian restaurant based on the ethnic composition of the neighborhoods.<br><br>
>Analysis of the results began with one-hot encoding using "get_dummies" function to generate the mean frequency of each event category for every neighborhood. The top 5 activity/event list for each neighborhood was briefly reviewed to study the popular event list and identify the neighborhoods with high frequency of Indian restaurants. Next filters were applied to identify neighborhoods with high frequency of Indian restaurants and Tamil population. The search was subtly improvised to identify neighborhoods which have sizable Tamil population but dont have any Indian restaurant, as they are considered to be the oppotune neighborhoods for starting a new South Indian restaurant without any local business competition from other Indian restaurants. Next, k-means clustering was implemented using the frequency of Indian restaurants as grouping criteria, to organize the data under 3 clusters. "Cluster 0" lists neighborhoods which have low-medium frequency (0.02 - 0.0667) of Indian restaurants. "Cluster 1" includes the high frequency neighborhoods (0.1 - 0.2727) and "Cluster 2" contains neighborhoods with low frequency (0 - 0.01786) of Indian restaurants. Surprisingly, most of the cluster 1 neighborhoods were concentrated in Central and South-Western Toronto mostly around the University of Toronto, suggesting university students as the likely target population for the restaurants in these neighborhoods. The neighborhoods in the Eastern part of the city while having high residence population of Tamil community, have low frequency of Indian restuarants. Analysis of the visual and tabular data help triangulate and recommend the following 5 neighborhoods for starting a South-Indian restaurant: Rouge Hill, Malvern, Scarborough Village, Morningside, Scarborough City Centre.




### 2. Data Mining & Wrangling <a class="anchor" id="2"></a>

#### 2.1. Importing the necessary libraries for analysis <a class="anchor" id="2.1"></a>


In [1]:
# Importing the necessary libraties for the project

import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import requests
import random
import json
from pandas.io.json import json_normalize
from sklearn import preprocessing
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors

#!conda install -c conda-forge geopy --yes
!pip install folium
!pip install geocoder

import folium




#### 2.2. Importing open-source data from Wikipedia on Toronto's neighboorhoods and demography <a class="anchor" id="2.2"></a>

In [2]:
# Link to wikipedia neighborhoods
#url_nei = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
url_nei = "https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=1008658788"
wiki_Toronto_nei = pd.read_html(url_nei)[:]
wiki_Toronto_nei = pd.DataFrame(wiki_Toronto_nei[0])
wiki_Toronto_nei
wiki_Toronto_nei.groupby(by='Borough', axis=0)

#Count the number of boroughs                               
wiki_Toronto_borough = pd.unique(wiki_Toronto_nei['Borough'])
wiki_Toronto_borough_count = wiki_Toronto_nei['Borough'].value_counts()
wiki_Toronto_borough_count

#Reorganize the dataframe and sort them in ascending order
wiki_Toronto_nei = wiki_Toronto_nei.sort_values(by=['Borough'], ascending = True).reset_index()
#wiki_Toronto_nei.reset_index(inplace=True)
wiki_Toronto_nei.drop(['index'], axis=1, inplace=True)
wiki_Toronto_nei.set_index
wiki_Toronto_nei.index.name = 'index'
#wiki_Toronto_nei.index = range(len(wiki_Toronto_nei['Borough']))

#wiki_Toronto_nei
#wiki_Toronto_nei[wiki_Toronto_nei['Postal Code'] == 'M5A']


In [3]:
wiki_Toronto_nei.shape
print('There are {} rows and {} columns in the Toronto neighborhood dataframe'.format(wiki_Toronto_nei.shape[0],
                                                                                       wiki_Toronto_nei.shape[1]))

There are 180 rows and 3 columns in the Toronto neighborhood dataframe


In [4]:
#Getting Latitudes and Longtitudes from Postcodes and integrate it with the neighboorhood dataframe

#!pip install pgeocode
import pgeocode
Toronto_geocoder = pgeocode.Nominatim('ca')

Toronto_boroughs_LL = Toronto_geocoder.query_postal_code(i for i in wiki_Toronto_nei['Postal Code'])[['postal_code',
                                                                                                      'latitude',
                                                                                                      'longitude']]
Toronto_boroughs_LL
wiki_Toronto_nei[['latitude','longitude']] = Toronto_boroughs_LL[['latitude','longitude']]
#wiki_Toronto_nei

#### 2.3. Import Toronto's demography dataset from Wikipedia, extract and clean the data <a class="anchor" id="2.3"></a>



In [5]:


def extract_ethnicity(row): 
    y = wiki_Toronto_dem['Ethnicity'][row]
    y = re.search('\(([^)]+)', y).group(1)
    y =  float(y.strip('%'))
    #y =  y.strip('%')

    return y


## Query data of Toronto demograohy from Wikipedia

#!pip install wikipedia #Uncomment this line after installing the first installation of Wikipedia library
import wikipedia as wp
import pandas as pd
import re

url_dem = "https://en.wikipedia.org/wiki/Demographics_of_Toronto_neighbourhoods"
wiki_Toronto_dem = pd.read_html(url_dem)[:]
wiki_Toronto_dem = pd.DataFrame(wiki_Toronto_dem[1])    
#wiki_Toronto_dem.rename(columns={"Second most common language (after English) by name" : "Ethnicity"}, inplace = True)
#wiki_Toronto_dem.groupby(by='Ethnicity', axis=0)

for i in range(2,6):
    x = pd.read_html(url_dem)[:]
    x = pd.DataFrame(x[i])
    wiki_Toronto_dem = wiki_Toronto_dem.append(x, ignore_index=False)

wiki_Toronto_dem.rename(columns={"Second most common language (after English) by name" : "Ethnicity"}, inplace = True)
wiki_Toronto_dem['Ethnicity']
wiki_Toronto_dem.reset_index(inplace=True)

#----------------------------------------------------------------------------------------------------------------------


# Creating a new column and remove unused columns

wiki_Toronto_dem['Ethnicity Percentage (%)'] = ''

for i in range(wiki_Toronto_dem.shape[0]):
    if set(['Map','Census Tracts','Second most common language (after English) by percentage']).issubset(wiki_Toronto_dem):
        wiki_Toronto_dem.drop(columns = ['Map','Census Tracts','Second most common language (after English) by percentage'], axis = 1, inplace=True)
    else:
        pass
    
wiki_Toronto_dem = wiki_Toronto_dem.dropna()

wiki_Toronto_dem.drop(['index'], axis=1, inplace=True)
wiki_Toronto_dem.set_index
wiki_Toronto_dem.index.name = 'index'
wiki_Toronto_dem.index = range(len(wiki_Toronto_dem['Ethnicity']))
wiki_Toronto_dem

#-----------------------------------------------------------------------------------------------------------------------


# Extract the ethnicty and save it in Ethnicity percentage columnn

out = []
for i in range(len(wiki_Toronto_dem['Ethnicity'])):
    out.append(extract_ethnicity(i))

wiki_Toronto_dem['Ethnicity Percentage (%)'] = out
wiki_Toronto_dem['Ethnicity_new'] = wiki_Toronto_dem['Ethnicity'].str.split('(').str[0]

for i in range(len(wiki_Toronto_dem['Ethnicity_new'])):
     wiki_Toronto_dem['Ethnicity_new'][i] = wiki_Toronto_dem['Ethnicity_new'][i].strip()
        
wiki_Toronto_dem
#wiki_Toronto_dem['Ethnicity_new'].value_counts()


#------------------------------------------------------------------------------------------------------------------------

#Create an new column listing the names of neighborhood concatenated with "Toronto" to make the address search easier

temp = wiki_Toronto_dem['Name'] + ', Toronto'
wiki_Toronto_dem.insert(1,'Name-ccat',temp)
wiki_Toronto_dem


#------------------------------------------------------------------------------------------------------------------------

#Clean the neighborhood names (row 1, 55)

pd.set_option('display.max_rows', 500)
wiki_Toronto_dem['Name'][1] = wiki_Toronto_dem['Name'][1].split('/')[0]
wiki_Toronto_dem['Name'][55] = wiki_Toronto_dem['Name'][55].split('/')[0]

wiki_Toronto_dem.head()


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Unnamed: 0,Name,Name-ccat,FM,Population,Land area (km2),Density (people/km2),% Change in Population since 2001,Average Income,Transit Commuting %,% Renters,Ethnicity,Ethnicity Percentage (%),Ethnicity_new
0,Crescent Town,"Crescent Town, Toronto",EY,8157,0.4,20393,-10.0,23021,24.5,20.3,Bengali (18.1%),18.1,Bengali
1,Governor's Bridge,"Governor's Bridge/Bennington Heights, Toronto",EY,2112,1.87,1129,4.0,129904,7.1,13.3,Polish (1.4%),1.4,Polish
2,Leaside,"Leaside, Toronto",EY,13876,2.81,4938,3.0,82670,9.7,10.5,Bulgarian (0.4%),0.4,Bulgarian
3,O'Connor–Parkview,"O'Connor–Parkview, Toronto",EY,17740,4.94,3591,-6.1,33517,15.8,19.4,Urdu (3.2%),3.2,Urdu
4,Old East York,"Old East York, Toronto",EY,52220,7.94,6577,-4.6,33172,22.0,19.1,Greek (4.3%),4.3,Greek


#### 2.4. Query location data from Foursquare API, and generate  map of Toronto using neighborhood dataset <a class="anchor" id="2.4"></a>



In [6]:

# Use Nominatim function to generate map of Toronto

address = 'Toronto, Ontario'
geolocator = Nominatim(user_agent="Toronto_explorer")
Toronto_location = geolocator.geocode(address, timeout = None)
Toronto_latitude = Toronto_location.latitude
Toronto_longitude = Toronto_location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(Toronto_latitude, Toronto_longitude))


# create map of Toronto using latitude and longitude values
map_Toronto = folium.Map(location=[Toronto_latitude, Toronto_longitude], zoom_start=11)

wiki_Toronto_nei = wiki_Toronto_nei.dropna()


# add markers to map
for lat, lng, label in zip(wiki_Toronto_nei['latitude'], wiki_Toronto_nei['longitude'], wiki_Toronto_nei['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Toronto)  
    
map_Toronto

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


#### 2.5. Add Latitude, Longitude and Address of the neighborhoods to the demography dataframe <a class="anchor" id="2.5"></a>

In [7]:

# Adding address to the Dataframe 

Toronto_locator =  Nominatim(user_agent="FourSquare_Toronto")
#Toronto_locator.geocode(Toronto_dem_LL[i])
wiki_Toronto_dem['Address'] = ''
wiki_Toronto_dem['Latitude'] = ''
wiki_Toronto_dem['Longitude'] = ''
    
for i in range(len(wiki_Toronto_dem['Name-ccat'])):
    if bool(Toronto_locator.geocode(wiki_Toronto_dem['Name-ccat'][i])) == True:
        wiki_Toronto_dem['Address'][i] = Toronto_locator.geocode(wiki_Toronto_dem['Name-ccat'][i])[0]
    else:
        #return None 
        wiki_Toronto_dem['Address'][i] = 0

#wiki_Toronto_dem



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # This is added back by InteractiveShellApp.init_path()
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [8]:
    
#Making a temporary copy of the Dataframe for ease of testing the code

x = wiki_Toronto_dem
x = x[x['Address'] != 0]
x.reset_index(drop=True, inplace=True)
x

def find_LL(row):
    latitude = Toronto_locator.geocode(x['Name-ccat'][row],timeout=None)[1][0]
    longitude = Toronto_locator.geocode(x['Name-ccat'][row],timeout=None)[1][1]
    return latitude, longitude


for i in range(len(x['Name'])):
    x['Latitude'].loc[i] = find_LL(i)[0]
    x['Longitude'].loc[i] = find_LL(i)[1]
    #lati = find_LL(i)[0]
    #long = find_LL(i)[1]
    #x.append({'Latitude':lati,'Longitude':long}, ignore_index=True)

wiki_Toronto_dem = x



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  from ipykernel import kernelapp as app
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  app.launch_new_instance()


In [9]:
# Color code the neighborhoods based on the highest secondary ethnicty

neigh_labels = wiki_Toronto_dem['Ethnicity_new'].unique()
color_array = cm.rainbow(np.linspace(0, 1, len(neigh_labels)))
color_array = [colors.rgb2hex(i) for i in color_array]
#color_array

for i in range(len(neigh_labels)):
    wiki_Toronto_dem.loc[wiki_Toronto_dem['Ethnicity_new'] == neigh_labels[i] , "Color_ethnicity"] = color_array[i]
    
#wiki_Toronto_dem


#-----------------------------------------------------------------------------------------------------------------


# Color code the neighborhoods based on the neighborhoods

neigh_labels = wiki_Toronto_dem['FM'].unique()
color_array = cm.rainbow(np.linspace(0, 1, len(neigh_labels)))
color_array = [colors.rgb2hex(i) for i in color_array]
#color_array

for i in range(len(neigh_labels)):
    wiki_Toronto_dem.loc[wiki_Toronto_dem['FM'] == neigh_labels[i] , "Color_FM"] = color_array[i]
    
#wiki_Toronto_dem


In [10]:

# create map of Toronto using latitude and longitude values
map_Toronto = folium.Map(location=[Toronto_latitude, Toronto_longitude], zoom_start=11)

wiki_Toronto_nei = wiki_Toronto_nei.dropna()


# add markers to map
for lat, lng, label, area, color_ethnicity, color_FM in zip(wiki_Toronto_dem['Latitude'],
                                                      wiki_Toronto_dem['Longitude'],
                                                      wiki_Toronto_dem['Ethnicity_new'],
                                                      wiki_Toronto_dem['Name'],                                                            
                                                      wiki_Toronto_dem['Color_ethnicity'],
                                                      wiki_Toronto_dem['Color_FM']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=color_FM,
        fill=True,
        fill_color=color_FM,
        fill_opacity=1,
        parse_html=False).add_to(map_Toronto)  
    
map_Toronto

### 3. Analyze the neighborhoods using Foursquare API <a class="anchor" id="3"></a>

#### 3.1. Input user credentials to access Foursquare API <a class="anchor" id="3.1"></a>

In [11]:

CLIENT_ID = 'RF0UYLVZDZ3W4IBJOOHUTN3LZZY1YTHCOHODTMLWHSN11HJQ' # your Foursquare ID
CLIENT_SECRET = 'Z1LAGZCWCJOMUQKQ53VXXBCPHGHZD1EOG3O4R0XIFK1RCZYB' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: RF0UYLVZDZ3W4IBJOOHUTN3LZZY1YTHCOHODTMLWHSN11HJQ
CLIENT_SECRET:Z1LAGZCWCJOMUQKQ53VXXBCPHGHZD1EOG3O4R0XIFK1RCZYB


#### 3.2. Define a function to query, sort data from Categories list and transfer them into a dataframe <a class="anchor" id="3.2"></a>



In [12]:

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
#results['response']['groups'][0]['items']


In [13]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)



# type your answer here
Toronto_venues = getNearbyVenues(names=wiki_Toronto_dem['Name-ccat'],
                                   latitudes=wiki_Toronto_dem['Latitude'],
                                   longitudes=wiki_Toronto_dem['Longitude']
                                  )

#Toronto_venues

Crescent Town, Toronto
Governor's Bridge/Bennington Heights, Toronto
Leaside, Toronto
O'Connor–Parkview, Toronto
Old East York, Toronto
Thorncliffe Park, Toronto
Alderwood, Toronto
Centennial, Toronto
Clairville, Toronto
Eatonville, Toronto
Humber Heights, Toronto
Humberwood, Toronto
Humber Valley Village, Toronto
Islington – Six Points, Toronto
Kingsview Village, Toronto
Long Branch, Toronto
Markland Wood, Toronto
Mimico, Toronto
New Toronto, Toronto
Princess Gardens, Toronto
Agincourt, Toronto
Alexandra Park, Toronto
Allenby, Toronto
Amesbury, Toronto
Armour Heights, Toronto
Banbury, Toronto
Bathurst Manor, Toronto
Bay Street Corridor, Toronto
Bayview Village, Toronto
Bayview Woods – Steeles, Toronto
Bedford Park, Toronto
Bendale, Toronto
Birch Cliff, Toronto
Bloor West Village, Toronto
Bracondale Hill, Toronto
Branson, Toronto
Bridle Path, Toronto
Brockton, Toronto
Cabbagetown, Toronto
Caribou Park, Toronto
Carleton Village, Toronto
Casa Loma, Toronto
Chaplin Estates, Toronto
Christ

In [14]:
#print(Toronto_venues.shape)
Toronto_venues.groupby('Neighborhood').count()


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Agincourt, Toronto",12,12,12,12,12,12
"Alderwood, Toronto",7,7,7,7,7,7
"Alexandra Park, Toronto",100,100,100,100,100,100
"Allenby, Toronto",7,7,7,7,7,7
"Amesbury, Toronto",6,6,6,6,6,6
"Armour Heights, Toronto",3,3,3,3,3,3
"Banbury, Toronto",6,6,6,6,6,6
"Bathurst Manor, Toronto",4,4,4,4,4,4
"Bay Street Corridor, Toronto",100,100,100,100,100,100
"Bayview Village, Toronto",13,13,13,13,13,13


#### 3.3. Apply one-hot encoding <a class="anchor" id="3.3"></a>

In [15]:
# one hot encoding
Toronto_onehot = pd.get_dummies(Toronto_venues[['Venue Category']], prefix="", prefix_sep="")

Toronto_onehot = Toronto_onehot.drop(['Neighborhood'], axis=1)

# add neighborhood column back to dataframe
Toronto_onehot['Neighborhood'] = Toronto_venues['Neighborhood'] 


# move neighborhood column to the first column
fixed_columns = [Toronto_onehot.columns[-1]] + list(Toronto_onehot.columns[:-1])
Toronto_onehot = Toronto_onehot[fixed_columns]

Toronto_onehot.head()

Toronto_grouped = Toronto_onehot.groupby('Neighborhood').mean().reset_index()
Toronto_grouped

Unnamed: 0,Neighborhood,ATM,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Terminal,American Restaurant,Animal Shelter,Antique Shop,...,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Agincourt, Toronto",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Toronto",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Alexandra Park, Toronto",0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,...,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.01
3,"Allenby, Toronto",0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Amesbury, Toronto",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Armour Heights, Toronto",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"Banbury, Toronto",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Bathurst Manor, Toronto",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,"Bay Street Corridor, Toronto",0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,...,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.01
9,"Bayview Village, Toronto",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [16]:
num_top_venues = 5

for hood in Toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = Toronto_grouped[Toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

# Sort the venues in descending order 
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]


# ==================================================================================================================


num_top_venues = 10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
Toronto_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
Toronto_neighborhoods_venues_sorted['Neighborhood'] = Toronto_grouped['Neighborhood']
for ind in np.arange(Toronto_grouped.shape[0]):
    Toronto_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Toronto_grouped.iloc[ind, :], num_top_venues)

Toronto_neighborhoods_venues_sorted.head()

----Agincourt, Toronto----
                  venue  freq
0    Chinese Restaurant  0.25
1      Asian Restaurant  0.08
2          Intersection  0.08
3            Food Court  0.08
4  Cantonese Restaurant  0.08


----Alderwood, Toronto----
            venue  freq
0     Pizza Place  0.29
1             Gym  0.14
2      Playground  0.14
3             Pub  0.14
4  Sandwich Place  0.14


----Alexandra Park, Toronto----
                    venue  freq
0                     Bar  0.11
1  Furniture / Home Store  0.05
2                    Café  0.05
3    Caribbean Restaurant  0.04
4     Arts & Crafts Store  0.02


----Allenby, Toronto----
                  venue  freq
0         Big Box Store  0.14
1  Fast Food Restaurant  0.14
2    African Restaurant  0.14
3          Intersection  0.14
4     Fish & Chips Shop  0.14


----Amesbury, Toronto----
          venue  freq
0          Bank  0.17
1   Coffee Shop  0.17
2  Intersection  0.17
3          Park  0.17
4   Gas Station  0.17


----Armour Heights, Toron

                 venue  freq
0              Brewery  0.14
1            BBQ Joint  0.14
2  Sporting Goods Shop  0.07
3          Coffee Shop  0.07
4         Gourmet Shop  0.07


----Henry Farm, Toronto----
          venue  freq
0        Lawyer  0.25
1  Tennis Court  0.25
2          Park  0.25
3  Intersection  0.25
4    Non-Profit  0.00


----High Park North, Toronto----
            venue  freq
0            Park   0.4
1  Baseball Field   0.2
2    Tennis Court   0.2
3  Mattress Store   0.2
4             ATM   0.0


----Highland Creek, Toronto----
                  venue  freq
0           IT Services   0.5
1                   ATM   0.0
2  Other Great Outdoors   0.0
3       Organic Grocery   0.0
4          Optical Shop   0.0


----Hillcrest, Toronto----
                venue  freq
0  Italian Restaurant  0.07
1          Restaurant  0.05
2  Mexican Restaurant  0.05
3                Café  0.05
4    Sushi Restaurant  0.05


----Hoggs Hollow, Toronto----
                 venue  freq
0            

4   Train Station  0.17


----Scarborough Junction, Toronto----
            venue  freq
0      Restaurant  0.25
1  Sandwich Place  0.25
2   Train Station  0.25
3    Soccer Field  0.25
4             ATM  0.00


----Scarborough Village, Toronto----
                  venue  freq
0           Coffee Shop   0.2
1    Chinese Restaurant   0.1
2           Supermarket   0.1
3  Fast Food Restaurant   0.1
4         Shopping Mall   0.1


----Seaton Village, Toronto----
                           venue  freq
0                  Grocery Store  0.10
1                          Diner  0.07
2                    Coffee Shop  0.07
3  Vegetarian / Vegan Restaurant  0.07
4                           Café  0.07


----Silverthorn, Toronto----
                       venue  freq
0                     Bakery  0.14
1                Gas Station  0.07
2  Latin American Restaurant  0.07
3                Pizza Place  0.07
4             Discount Store  0.07


----Smithfield, Toronto----
            venue  freq
0     Coff

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Agincourt, Toronto",Chinese Restaurant,Korean Restaurant,Intersection,Cantonese Restaurant,Asian Restaurant,Train Station,Coffee Shop,Food Court,Vietnamese Restaurant,Hong Kong Restaurant
1,"Alderwood, Toronto",Pizza Place,Playground,Pub,Gym,Coffee Shop,Sandwich Place,Cuban Restaurant,Cupcake Shop,Dumpling Restaurant,Eastern European Restaurant
2,"Alexandra Park, Toronto",Bar,Furniture / Home Store,Café,Caribbean Restaurant,Boutique,Pizza Place,Italian Restaurant,Bakery,Liquor Store,Poutine Place
3,"Allenby, Toronto",Fish & Chips Shop,African Restaurant,Café,Fast Food Restaurant,Intersection,Big Box Store,Restaurant,Filipino Restaurant,Event Space,Falafel Restaurant
4,"Amesbury, Toronto",Athletics & Sports,Intersection,Park,Gas Station,Bank,Coffee Shop,Yoga Studio,Field,Ethiopian Restaurant,Event Space


#### 3.4. Identify neighborhoods with active Tamil population and evaluate if they have any Indian restaurants <a class="anchor" id="3.4"></a>

In [17]:
#Narrow down on neighborhoods with active Indian restaurants

Toronto_neighborhoods_venues_sorted
len(Toronto_grouped[Toronto_grouped["Indian Restaurant"] > 0])
Indian_restaurant_nei = Toronto_grouped[["Neighborhood","Indian Restaurant"]]
x=wiki_Toronto_dem.sort_values(['Name'], ascending=True).reset_index()
x.drop(['index'], axis=1, inplace=True)
x

Indian_restaurant_nei
len(Indian_restaurant_nei)
len(x)

y = x[x['Name-ccat'].isin(Indian_restaurant_nei['Neighborhood'])].reset_index()
y.drop(['index'], axis=1, inplace=True)
y

z = pd.DataFrame(Indian_restaurant_nei['Neighborhood'])

z['Indian restaurant freq'] = Indian_restaurant_nei['Indian Restaurant']
z['Ethnicity'] = y['Ethnicity']
z['Ethnicity Percentage (%)'] = y['Ethnicity Percentage (%)']
z['Ethnicity_new'] = y['Ethnicity_new']
z['Latitude'] = y['Latitude']
z['Longitude'] = y['Longitude']
z['Color_ethnicity'] = y['Color_ethnicity']
z['Color_FM'] = y['Color_FM']
z

#z['Indian restaurant freq']
z1 = z[z['Ethnicity_new'] == 'Tamil'].sort_values(by = ['Ethnicity Percentage (%)'], ascending=True)
z1.reset_index(inplace=True)
z1.drop(['index'], axis=1, inplace=True)
z1

#Identify neighborhoods which dont have any Indian restaurants
z2 = z1[z1['Indian restaurant freq'] == 0].sort_values(by = ['Ethnicity Percentage (%)'], ascending=False)
z2.reset_index(inplace=True)
z2.drop(['index'], axis=1, inplace=True)
z2.head()
#z2.sort_values(by = ['Ethnicity Percentage (%)'], ascending=False)

Unnamed: 0,Neighborhood,Indian restaurant freq,Ethnicity,Ethnicity Percentage (%),Ethnicity_new,Latitude,Longitude,Color_ethnicity,Color_FM
0,"Rouge Hill, Toronto",0.0,Tamil (15.6%),15.6,Tamil,43.8049,-79.1658,#a6f89d,#4df3ce
1,"Malvern, Toronto",0.0,Tamil (12.2%),12.2,Tamil,43.8092,-79.2217,#a6f89d,#4df3ce
2,"Scarborough Village, Toronto",0.0,Tamil (11.4%),11.4,Tamil,43.7437,-79.2116,#a6f89d,#4df3ce
3,"Morningside, Toronto",0.0,Tamil (10.8%),10.8,Tamil,43.7826,-79.205,#a6f89d,#4df3ce
4,"Scarborough City Centre, Toronto",0.0,Tamil (10.3%),10.3,Tamil,43.717,-79.2547,#a6f89d,#4df3ce


In [18]:
#Generate range of colors for based on % of Tamils in each of the above neighborhood

'''import colorsys
N = 5
HSV_tuples = [(x*1.0/N, 0.5, 0.5) for x in range(N)]
RGB_tuples = map(lambda x: colorsys.hsv_to_rgb(*x), HSV_tuples)

HSV_tuples = [(0, 1-x/100, 0) for x in z2['Ethnicity Percentage (%)']]
RGB_tuples = map(lambda x: colorsys.hsv_to_rgb(*x), HSV_tuples)

print(HSV_tuples)

from sklearn import preprocessing
x_array = np.array(z2['Ethnicity Percentage (%)'])
x_array = x_array/100
x_array

normalized_arr = preprocessing.normalize([x_array])
print(normalized_arr)

x_array = ((x_array-min(x_array))/(max(x_array) - min(x_array)))
x_array
RGB_x_array = map(lambda x: colorsys.hsv_to_rgb(*x), x_array)
RGB_x_array

r1, g1, b1 = [1,0,0]
r2, g2, b2 = [0,1,0]
rdelta, gdelta, bdelta = (r2-r1)/steps, (g2-g1)/steps, (b2-b1)/steps
for step in len(z2['Ethnicity Percentage (%)']):
    r1 += rdelta
    g1 += gdelta
    b1 += bdelta
    output.append((r1, g1, b1))'''
    

"import colorsys\nN = 5\nHSV_tuples = [(x*1.0/N, 0.5, 0.5) for x in range(N)]\nRGB_tuples = map(lambda x: colorsys.hsv_to_rgb(*x), HSV_tuples)\n\nHSV_tuples = [(0, 1-x/100, 0) for x in z2['Ethnicity Percentage (%)']]\nRGB_tuples = map(lambda x: colorsys.hsv_to_rgb(*x), HSV_tuples)\n\nprint(HSV_tuples)\n\nfrom sklearn import preprocessing\nx_array = np.array(z2['Ethnicity Percentage (%)'])\nx_array = x_array/100\nx_array\n\nnormalized_arr = preprocessing.normalize([x_array])\nprint(normalized_arr)\n\nx_array = ((x_array-min(x_array))/(max(x_array) - min(x_array)))\nx_array\nRGB_x_array = map(lambda x: colorsys.hsv_to_rgb(*x), x_array)\nRGB_x_array\n\nr1, g1, b1 = [1,0,0]\nr2, g2, b2 = [0,1,0]\nrdelta, gdelta, bdelta = (r2-r1)/steps, (g2-g1)/steps, (b2-b1)/steps\nfor step in len(z2['Ethnicity Percentage (%)']):\n    r1 += rdelta\n    g1 += gdelta\n    b1 += bdelta\n    output.append((r1, g1, b1))"

#### 3.5. Implement k-means clusterning to develop neighborhood clusters with Indian restaurants, and detailed analysis <a class="anchor" id="3.5"></a>

In [19]:
#wiki_Toronto_dem
Toronto_grouped

# set number of clusters
kclusters = 3

#Indian_restaurant_nei_clustering = Indian_restaurant_nei.drop('Neighborhood', 1)
Indian_restaurant_nei_clustering = pd.DataFrame(z['Indian restaurant freq'])
Indian_restaurant_nei_clustering
Indian_restaurant_nei

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=1)
kmeans.fit_transform(Indian_restaurant_nei_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:100]

z['Cluster label'] = kmeans.labels_
Indian_restaurant_nei['Cluster label'] = kmeans.labels_
Indian_restaurant_nei

Indian_restaurant_nei_final = Indian_restaurant_nei.join(Toronto_venues.set_index("Neighborhood"), on="Neighborhood")

print(Indian_restaurant_nei_final.shape)
Indian_restaurant_nei_final.head()
#Indian_restaurant_nei_final['Cluster label']
#z[z['Cluster label'] == 2]

z.head()



(3527, 9)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Unnamed: 0,Neighborhood,Indian restaurant freq,Ethnicity,Ethnicity Percentage (%),Ethnicity_new,Latitude,Longitude,Color_ethnicity,Color_FM,Cluster label
0,"Agincourt, Toronto",0.0,Cantonese (19.3%),19.3,Cantonese,43.7854,-79.2785,#58f8c9,#4df3ce,2
1,"Alderwood, Toronto",0.0,Polish (6.2%),6.2,Polish,43.6017,-79.5452,#6e1cff,#1996f3,2
2,"Alexandra Park, Toronto",0.0,Cantonese (17.9%),17.9,Cantonese,43.6508,-79.4043,#58f8c9,#b2f396,2
3,"Allenby, Toronto",0.0,Russian (1.4%),1.4,Russian,43.7114,-79.5534,#6dfdbf,#b2f396,2
4,"Amesbury, Toronto",0.0,Spanish (6.1%),6.1,Spanish,43.7062,-79.4835,#08bee9,#ff964f,2


In [20]:
# Print map of Toronto displaying the neighborhood clusters based on frequency of Indian restaurant

map_clusters = folium.Map(location=[Toronto_latitude, Toronto_longitude],zoom_start=14)

# add markers to the map
markers_colors={}
markers_colors[0] = 'red'
markers_colors[1] = 'blue'
markers_colors[2] = 'green'
markers_colors[3] = 'yellow'
markers_colors[4] = 'cyan'
markers_colors[5] = 'black'
for lat, lon, cluster, label in zip(z['Latitude'],
                             z['Longitude'],
                             z['Cluster label'],
                             z['Cluster label']):
    
    label = folium.Popup('Cluster no: {}'.format(str(label)), parse_html=True)
    folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup = label,
            color =markers_colors[cluster],
            fill = True,
            fill_color=markers_colors[cluster],
            fill_opacity=1
            ).add_to(map_clusters)
    
map_clusters

In [21]:
z[z['Cluster label'] == 2].max()

# min = 0.02, max = 0.066667 (cluster 0)
# min = 0.1, max = 0.272727 (cluster 1)
# min = 0, max = 0.0178571 (cluster 2)

Neighborhood                York University Heights, Toronto
Indian restaurant freq                             0.0178571
Ethnicity                                  Vietnamese (6.9%)
Ethnicity Percentage (%)                                31.4
Ethnicity_new                                     Vietnamese
Latitude                                             43.8232
Longitude                                           -79.1305
Color_ethnicity                                      #ffa95b
Color_FM                                             #ff964f
Cluster label                                              2
dtype: object

In [22]:
# Print map of Toronto displaying the neighborhood clusters based on boroughs and Tamil population demography. Mark neighborhoods with high prospects.


# create map of Toronto using latitude and longitude values
map_Toronto = folium.Map(location=[Toronto_latitude, Toronto_longitude], zoom_start=11)

wiki_Toronto_nei = wiki_Toronto_nei.dropna()


# add markers to map
for lat, lng, label, area, color_ethnicity, color_FM in zip(wiki_Toronto_dem['Latitude'],
                                                      wiki_Toronto_dem['Longitude'],
                                                      wiki_Toronto_dem['Ethnicity_new'],
                                                      wiki_Toronto_dem['Name'],                                                            
                                                      wiki_Toronto_dem['Color_ethnicity'],
                                                      wiki_Toronto_dem['Color_FM']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=color_FM,
        fill=True,
        fill_color=color_FM,
        fill_opacity=1,
        parse_html=False).add_to(map_Toronto)  
    
map_Toronto


z1
z2 = z1[z1['Indian restaurant freq'] == 0]
z2


# Overlay Tamil restaurants atop the neighborhoods

for lat, lng, label, neighborhood in zip(z2['Latitude'],
                                         z2['Longitude'],
                                         z2['Ethnicity'],
                                         z2['Neighborhood']):
    label = folium.Popup('Location: {}, Ethnicity: {}'.format(neighborhood, label), parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=7,
        popup=label,
        color='black',
        fill=True,
        fill_color='black',
        fill_opacity=0).add_to(map_Toronto) 
     
    
map_Toronto


In [23]:
# Top 5 neighborhoods with high Tamil population and without any Indian restaurant

z2.head()

Unnamed: 0,Neighborhood,Indian restaurant freq,Ethnicity,Ethnicity Percentage (%),Ethnicity_new,Latitude,Longitude,Color_ethnicity,Color_FM
0,"Cliffcrest, Toronto",0.0,Tamil (1.5%),1.5,Tamil,43.7218,-79.2362,#a6f89d,#4df3ce
2,"Bendale, Toronto",0.0,Tamil (3.7%),3.7,Tamil,43.7535,-79.2553,#a6f89d,#4df3ce
3,"Scarborough Junction, Toronto",0.0,Tamil (4.2%),4.2,Tamil,43.716,-79.2607,#a6f89d,#4df3ce
4,"Highland Creek, Toronto",0.0,Tamil (5.1%),5.1,Tamil,43.7901,-79.1733,#a6f89d,#4df3ce
7,"Scarborough City Centre, Toronto",0.0,Tamil (10.3%),10.3,Tamil,43.717,-79.2547,#a6f89d,#4df3ce


In [24]:
test = z.loc[z['Cluster label'] == 0]
test
wiki_Toronto_dem
Toronto_neighborhoods_venues_sorted
z

Toronto_merged = z.join(Toronto_neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
Toronto_merged = Toronto_merged.dropna().reset_index(drop=True)


### Analyze Cluster #0

In [25]:
#Analyse Cluser #1:

test = Toronto_merged.loc[z['Cluster label'] == 0, Toronto_merged.columns[[1] + list(range(19, Toronto_merged.shape[1]))]]
test

#, z.columns[[1] + list(range(19, z.shape[1]))]\

#Find the top 3 places in each ranking list:

"""for column in test.columns[1:]:
    print('------------{}------------'.format(column))
    print(test[column].value_counts().head(3))
    print('\n')"""

#Extract columns 
x = test.iloc[:,1:]
x.mode().iloc[0,:]    

#Wuery the list of all activities
activities = Toronto_grouped.columns
activities

Cluster1_all = x.apply(pd.Series.value_counts).sum(axis=1).sort_values(ascending=False)
Cluster1_top10 = x.apply(pd.Series.value_counts).sum(axis=1).sort_values(ascending=False).head(10)
Cluster1_bottom10 = x.apply(pd.Series.value_counts).sum(axis=1).sort_values(ascending=False).tail(10)
Cluster1_top10


print('----------- CLUSTER#1 TOP 10 -----------')
#print('\n')
print(Cluster1_top10)
print('\n')


print('----------- CLUSTER#1 BOTTOM 10 -----------')
#print('\n')
print(Cluster1_bottom10)

# Count the frequency of occurence of every event
Cluster1_all
y = pd.DataFrame(Cluster1_all,columns =['Freq'])
y.index.name = 'Event'
y

# Filter the popular cuisine (restaurants)
filter_restaurants = y[y.index.str.contains('Rest')]
filter_restaurants

# Filter the popular shops 
filter_shops = y[y.index.str.contains('Shop','Store')]
filter_shops

filter_restaurants
#Cluster1_all
y


----------- CLUSTER#1 TOP 10 -----------
Indian Restaurant      3
Restaurant             2
Liquor Store           1
Japanese Restaurant    1
Supermarket            1
Bank                   1
Dessert Shop           1
Café                   1
Scenic Lookout         1
Beer Store             1
dtype: int64


----------- CLUSTER#1 BOTTOM 10 -----------
Scenic Lookout         1
Beer Store             1
Juice Bar              1
Pub                    1
Event Space            1
Museum                 1
Baby Store             1
Sushi Restaurant       1
Bus Stop               1
Fried Chicken Joint    1
dtype: int64


Unnamed: 0_level_0,Freq
Event,Unnamed: 1_level_1
Indian Restaurant,3
Restaurant,2
Liquor Store,1
Japanese Restaurant,1
Supermarket,1
Bank,1
Dessert Shop,1
Café,1
Scenic Lookout,1
Beer Store,1


#### Analyzing cluster #0 we learn that:
1. Most of the neighborhoods in Central/South-Western Toronto have a healthy concentration of Indian restaurants
2. Majority of the Indian restaurants would fit within this cluster bubble and are in the University district surrounding University of Toronto
3. Possibly the focus/target population are the International students from India

### Analyze Cluster #1

In [26]:


test = Toronto_merged.loc[z['Cluster label'] == 1, Toronto_merged.columns[[1] + list(range(19, Toronto_merged.shape[1]))]]
test

#Find the top 3 places in each ranking list
"""for column in test.columns[1:]:
    print('------------{}------------'.format(column))
    print(test[column].value_counts().head(3))
    print('\n')"""
    
    

#Extract columns 
x = test.iloc[:,1:]
x.mode().iloc[0,:]    

#Wuery the list of all activities
activities = Toronto_grouped.columns
activities

Cluster1_all = x.apply(pd.Series.value_counts).sum(axis=1).sort_values(ascending=False)
Cluster1_top10 = x.apply(pd.Series.value_counts).sum(axis=1).sort_values(ascending=False).head(10)
Cluster1_bottom10 = x.apply(pd.Series.value_counts).sum(axis=1).sort_values(ascending=False).tail(10)
Cluster1_top10


print('----------- CLUSTER#1 TOP 10 -----------')
#print('\n')
print(Cluster1_top10)
print('\n')


print('----------- CLUSTER#1 BOTTOM 10 -----------')
#print('\n')
print(Cluster1_bottom10)
print('\n')


# Count the frequency of occurence of every event
Cluster1_all
y = pd.DataFrame(Cluster1_all,columns =['Freq'])
y.index.name = 'Event'
y

# Filter the popular cuisine (restaurants)
filter_restaurants = y[y.index.str.contains('Rest')]
filter_restaurants

# Filter the popular shops 
filter_shops = y[y.index.str.contains('Shop','Store')]
#filter_shops

filter_restaurants
#Cluster1_all


----------- CLUSTER#1 TOP 10 -----------
Electronics Store       2
Beer Store              1
Food & Drink Shop       1
Ethiopian Restaurant    1
Restaurant              1
Discount Store          1
dtype: int64


----------- CLUSTER#1 BOTTOM 10 -----------
Electronics Store       2
Beer Store              1
Food & Drink Shop       1
Ethiopian Restaurant    1
Restaurant              1
Discount Store          1
dtype: int64




Unnamed: 0_level_0,Freq
Event,Unnamed: 1_level_1
Ethiopian Restaurant,1
Restaurant,1


#### Analyzing cluster #1 we learn that:
1. The marked blue blips constitute the highest frequency of Indian restaurants and are distributed sparsely across the city.
2. Contrary to the popular expectation only 2 high frequency blips (Scarborough East, Eglinton) are in the Western part of the city, which have the high population of Tamils.
3. Can likely infer that the existing restaurants primarily target the larger population base who may be more interested in the popular Indian recipes, with limited interest in traditional South Indian recipes. 

### Analyze Cluster #2


In [27]:
#Analyse Cluser #2

test = Toronto_merged.loc[z['Cluster label'] == 2, Toronto_merged.columns[[1] + list(range(19, Toronto_merged.shape[1]))]]
test

#Find the top 3 places in each ranking list
"""for column in test.columns[1:]:
    print('------------{}------------'.format(column))
    print(test[column].value_counts().head(3))
    print('\n')"""
    
    

#Extract columns 
x = test.iloc[:,1:]
x.mode().iloc[0,:]    

#Wuery the list of all activities
activities = Toronto_grouped.columns
activities

Cluster1_all = x.apply(pd.Series.value_counts).sum(axis=1).sort_values(ascending=False)
Cluster1_top10 = x.apply(pd.Series.value_counts).sum(axis=1).sort_values(ascending=False).head(10)
Cluster1_bottom10 = x.apply(pd.Series.value_counts).sum(axis=1).sort_values(ascending=False).tail(10)
Cluster1_top10


print('----------- CLUSTER#1 TOP 10 -----------')
#print('\n')
print(Cluster1_top10)
#print('\n')


print('----------- CLUSTER#1 BOTTOM 10 -----------')
#print('\n')
print(Cluster1_bottom10)

# Count the frequency of occurence of every event
Cluster1_all
y = pd.DataFrame(Cluster1_all,columns =['Freq'])
y.index.name = 'Event'
y

# Filter the popular cuisine (restaurants)
filter_restaurants = y[y.index.str.contains('Rest')]
filter_restaurants

# Filter the popular shops 
filter_shops = y[y.index.str.contains('Shop','Store')]
filter_shops

filter_restaurants
#Cluster1_all

----------- CLUSTER#1 TOP 10 -----------
Fast Food Restaurant    19
Farmers Market          16
Filipino Restaurant     10
Field                    8
Event Space              8
Falafel Restaurant       7
Grocery Store            4
Park                     3
Coffee Shop              3
Fish & Chips Shop        3
dtype: int64
----------- CLUSTER#1 BOTTOM 10 -----------
Burger Joint                 1
Electronics Store            1
Music Venue                  1
Middle Eastern Restaurant    1
Gourmet Shop                 1
Taco Place                   1
Dessert Shop                 1
Poutine Place                1
Scenic Lookout               1
Food & Drink Shop            1
dtype: int64


Unnamed: 0_level_0,Freq
Event,Unnamed: 1_level_1
Fast Food Restaurant,19
Filipino Restaurant,10
Falafel Restaurant,7
Ethiopian Restaurant,3
Italian Restaurant,2
Korean Restaurant,2
Comfort Food Restaurant,1
Japanese Restaurant,1
Caribbean Restaurant,1
Eastern European Restaurant,1


#### Analyzing cluster #2 we learn that:
1. Most of the neighborhoods in Toronto don't have any Indian restaurants (green blips), signaling that it's more of an exquisite cuisine and not a contemporary food preference.
2. Surprisingly, Western Toronto is densely populated with these green blips indicating lack of Indian/South-Indian restaurants that are near the Tamil community.
3. Flipping the green blips to red/blue would encourage the Tamil community to benefit from Indian restaurants without farther commute to Central/South-Western Toronto


### Conclusions and Recommendations <a class="anchor" id="4"></a>

1. Based on the above analysis, the top 5 locations to start a dedicated South-Indian restaurant with their corresponding Tamil population are:<br>
    a.Rouge Hill (15.6%)<br>
    b.Malvern (12.2%)<br>
    c.Scarborough Village (11.4%)<br>
    d.Morningside (10.8%)<br>
    e.Scarborough City Centre (10.3%)<br><br>
    
2. The above neighborhoods offer competitive advantage as they dont have any Indian restaurants, thus increasing the probability of success <br>

3. Most of the Tamil population is concentrated in Scarborough, which surprisingly has limited choice of Indian restaurants<br>

4. The taxation, rules and business  guidelines corresponding to the city of Scarborough would be in effect for the list of recommended locations <br><br><br>

The following map highlights the list of recommmended locations (highlighted  in black border) for starting a dedicted South Indian restaurant in Toronto:

In [28]:
# Print map of Toronto displaying the neighborhood clusters based on boroughs and Tamil population demography. Mark neighborhoods with high prospects.


# create map of Toronto using latitude and longitude values
map_Toronto = folium.Map(location=[Toronto_latitude, Toronto_longitude], zoom_start=11)

wiki_Toronto_nei = wiki_Toronto_nei.dropna()


# add markers to map
for lat, lng, label, area, color_ethnicity, color_FM in zip(wiki_Toronto_dem['Latitude'],
                                                      wiki_Toronto_dem['Longitude'],
                                                      wiki_Toronto_dem['Ethnicity_new'],
                                                      wiki_Toronto_dem['Name'],                                                            
                                                      wiki_Toronto_dem['Color_ethnicity'],
                                                      wiki_Toronto_dem['Color_FM']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=color_FM,
        fill=True,
        fill_color=color_FM,
        fill_opacity=1,
        parse_html=False).add_to(map_Toronto)  
    
map_Toronto


z1
z2 = z1[z1['Indian restaurant freq'] == 0]
z2


# Overlay Tamil restaurants atop the neighborhoods

for lat, lng, label, neighborhood in zip(z2['Latitude'],
                                         z2['Longitude'],
                                         z2['Ethnicity'],
                                         z2['Neighborhood']):
    label = folium.Popup('Location: {}, Ethnicity: {}'.format(neighborhood, label), parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=7,
        popup=label,
        color='black',
        fill=True,
        fill_color='black',
        fill_opacity=0).add_to(map_Toronto) 
     
    
map_Toronto
