# The Battle of Neighborhoods (part1)

## Introduction/Business Problem

### Background

An international renowned, multi-Michelin starred chef Gordon Ramsey has opened a string of award-winning restaurants across the globe and now he is ready to open his new italian restaurant in Toronto. Looking for a perfect location for his new restaurant in Toronto, Gordon Ramsey asked me to use my data science knowledge to provide valuable insights and recommendation for his new restaurant. <br>
He specifically wants to know:<br>
* Floating population in Toronto
* Restaurants that could be competitors to his new restaurant
* Whether there are markets or farms near the location for affordable, quick and fresh ingredients
* Average income of residents in each neighborhood (because most of his foods are expensive)
* Characteristics of residents in each neighborhood (race, dining culture, etc...)
* Whether there's school/university in each neighborhood 

### Business Problem

The challenge is finding a best location for Ramsey's new italian restaurent in Toronto. Such location should have high demand for Ramsey's luxurious dishes. In addition, it should be close enough to markets/farms for fresh and quick ingredients distribution to his restaurant. 

### Target Audience

Beside my client Gordon Ramsey, the target audience would be the stakeholders and employees of Ramsey's food company. Anyone who are interested in food business also would be the target audience.  

## Question #2

## Data Description

I will be using the data below in order to solve the business problem.
* Boroughs and neighborhood in Toronto including its latitude and longitude
* Floating population data in each borough
* Location of markets/farms
* Average income of residents in each neighborhood
* Location of Universities 
* Proportion of race in each neighborhood

### How to use data to solve the problem

I will collect data as follows:
* Foursquare and geopy will be used to retreive geodata (latitude and longitude)
* Address and geodata for markets/farms will be achieved by geopy and Nominatim
* BeautifulSoup will be utilized for retriving data from URL (for floating population and avg income data)
* Location of universities and proportion of race will be read through CSV file.
* Use Folium for drawing map for Toronto and neighborhood

#### Successful collection of the data will allow us to answer the following questions:
* Which location has high floating population?
* What would be the possible candidates in terms of neighborhood?
* What is the characteristics of residents in each neighborhood?
* What is the distance between our possible candidates and markets/farms for each neighborhood?
* Who is the possible competitor?

### Import Libraries

In [1]:
import pandas as pd
import numpy as np
import requests # handle request
from bs4 import BeautifulSoup # use for getting data from URL

import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print("Libraries imported Success!")

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported Success!


### Load data from URL

In [2]:
URL = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
req = requests.get(URL)

soup = BeautifulSoup(req.content, "html5lib")
table = soup.find('div', {'id':'container'})
# print(soup.prettify())

### Prepare Dataset

In [3]:
lst = []
n = 0

all_tds = soup.find_all("td")               # finds all things that starts with <td> and ends with </td>
all_tds_lst = [x.get_text().strip("\n") for x in all_tds]  # convert bs4 list into a list of strings
len_tds = [a for a in range(0, 3)]                      # create list of 0, 1, 2s

for element in all_tds_lst:
    if element == "": 
        break # control for things that goes beyond our last </td>
    if n == 0:
        lst.append([])
        lst[-1].append(element)
        n += 1
    elif n == 2:
        lst[-1].append(element)
        n = 0 # reset n 
    else:
        lst[-1].append(element)
        n += 1

In [4]:
# create pandas DF
df = pd.DataFrame(data = lst, columns = ['PostalCode', 'Borough', 'Neighborhood'])
#df = df[~df.Borough.str.contains("Not assigned")]
df = df[~df.Neighborhood.str.contains("Not assigned")]
df = df.reset_index()
df = df.drop(columns='index', axis=1)
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [5]:
df.shape

(103, 3)

#### Get latitude and longitude for each borough

In [6]:
df_latlng = pd.read_csv("http://cocl.us/Geospatial_data")
df_latlng.rename(columns={'Postal Code':'PostalCode'}, inplace=True)
df_latlng.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [7]:
toronto_df = pd.merge(df, df_latlng, on='PostalCode')
toronto_df = toronto_df.drop_duplicates(subset=['Borough', 'Neighborhood'], keep=False)
toronto_df

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


#### Get geodata for Toronto by geopy, Nominatim

In [8]:
address = 'Toronto, ON, Canada'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


#### Map of Toronto and mark circle for each borough using Folium

In [9]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_df['Latitude'], toronto_df['Longitude'], toronto_df['Borough'], toronto_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### Collect demographic data of Toronto neighborhood from URL

In [10]:
demogrpah_URL = 'https://en.wikipedia.org/wiki/Demographics_of_Toronto_neighbourhoods'
r = requests.get(demogrpah_URL)

soup = BeautifulSoup(r.content, "html5lib")
table = soup.find('div', {'id':'container'})
#print(soup.prettify())

In [11]:
lst_2 = []
n = 0

all_tds = soup.find_all("td")   
all_tds_lst = [x.get_text().strip('\n') for x in all_tds] 
# retrieve the table data that we are interested in
start_idx = all_tds_lst.index('Agincourt')
end_idx = all_tds_lst.index('General outline\nDemographics\nName\nFlag\nCoat of arms\nSister cities\nNotable Torontonians')
demographic_lst = all_tds_lst[start_idx:end_idx]
demographic_lst = [x for x in demographic_lst if x != '']

# separate into 12 columns
for element in demographic_lst:
    if n == 0:
        lst_2.append([])
        lst_2[-1].append(element)
        n += 1
    elif n == 11:
        lst_2[-1].append(element)
        n = 0 # reset n 
    else:
        lst_2[-1].append(element)
        n += 1
#print(lst_2)

In [12]:
# Convert lst into DataFrame
cols = ['Neighborhood', 'Borough', 'CensusTracts', 'Population', 
        'LandArea', 'Density (people/km2)', 'PopulationChange %', 'AvgIncome',
       'TransitCommuting %', 'Renters %', '2nd common language (name)', '2nd common language %']
demograph_df_all = pd.DataFrame(data = lst_2, columns=cols)
demograph_df_all

Unnamed: 0,Neighborhood,Borough,CensusTracts,Population,LandArea,Density (people/km2),PopulationChange %,AvgIncome,TransitCommuting %,Renters %,2nd common language (name),2nd common language %
0,Agincourt,S,"0377.01, 0377.02, 0377.03, 0377.04, 0378.02, 0...",44577,12.45,3580,4.6,25750,11.1,5.9,Cantonese (19.3%),19.3% Cantonese
1,Alderwood,E,"0211.00, 0212.00",11656,4.94,2360,-4.0,35239,8.8,8.5,Polish (6.2%),06.2% Polish
2,Alexandra Park,OCoT,0039.00,4355,0.32,13609,0.0,19687,13.8,28.0,Cantonese (17.9%),17.9% Cantonese
3,Allenby,OCoT,0140.00,2513,0.58,4333,-1.0,245592,5.2,3.4,Russian (1.4%),01.4% Russian
4,Amesbury,NY,"0280.00, 0281.01, 0281.02",17318,3.51,4934,1.1,27546,16.4,19.7,Spanish (6.1%),06.1% Spanish
...,...,...,...,...,...,...,...,...,...,...,...,...
169,Woburn,S,"0356.00, 0357.01, 0357.02, 0363.07, 0364.01, 0...",48507,13.34,3636,-1.5,26190,13.3,16.0,Gujarati (9.1%),09.1% Gujarati
170,Wychwood,OCoT,0116.00,4182,0.68,6150,-2.0,53613,17.1,20.1,Portuguese (2.7%),02.7% Portuguese
171,York Mills,NY,"0273.01, 0273.02, 0274.01, 0274.02",17564,7.29,2409,2.0,92099,10.0,11.8,Korean (4.0%),04.0% Korean
172,York University Heights,NY,"0311.02, 0311.03, 0311.04, 0311.05, 0311.06",26140,13.21,1979,-1.2,24432,15.2,20.4,Italian (6.6%),06.6% Italian


In [13]:
# filter columns that we are interested in
demo_df = demograph_df_all.drop(['CensusTracts', 'Borough', 'LandArea', 'TransitCommuting %', 'Renters %', 
                   '2nd common language (name)', '2nd common language %'], axis=1)
demo_df

Unnamed: 0,Neighborhood,Population,Density (people/km2),PopulationChange %,AvgIncome
0,Agincourt,44577,3580,4.6,25750
1,Alderwood,11656,2360,-4.0,35239
2,Alexandra Park,4355,13609,0.0,19687
3,Allenby,2513,4333,-1.0,245592
4,Amesbury,17318,4934,1.1,27546
...,...,...,...,...,...
169,Woburn,48507,3636,-1.5,26190
170,Wychwood,4182,6150,-2.0,53613
171,York Mills,17564,2409,2.0,92099
172,York University Heights,26140,1979,-1.2,24432


### Get Latitude and Longitude for each neighborhoods

In [67]:
lat_lng_list = []
not_avail = [] # There are several neighborhoods that are not available in geopy geolocator

geolocator = Nominatim(user_agent='myGeocoder')
for neighborhood in demo_df['Neighborhood']:
    lat_lng_list.append([])
    city ="Toronto"
    country ="Canada"
    location = geolocator.geocode(neighborhood+','+city+','+ country)
    try:
        lat_lng_list[-1].append(location.latitude)
        lat_lng_list[-1].append(location.longitude)
        #print('{}: Latitude = {}, Longitude = {}'.format(neighborhood, location.latitude, location.longitude))
    except:
        not_avail.append(neighborhood)

# print(lat_lng_list)
# print(not_avail)

[[43.7853531, -79.2785494], [43.6017173, -79.5452325], [43.650786999999994, -79.40431814731767], [43.7113509, -79.5534236], [43.7061619, -79.48349185404643], [43.7439436, -79.4308512], [43.7427961, -79.3699566407258], [43.76389295, -79.45636693710946], [43.6673421, -79.3884571], [43.7691966, -79.3766617], [43.7981268, -79.3829726], [43.7373876, -79.4109253], [43.7535196, -79.2553355], [43.6918051, -79.2644935], [43.6493184, -79.4844358], [43.6761954, -79.4280155], [43.771426, -79.44728711605052], [43.7381512, -79.3725113], [43.6509173, -79.4400216], [43.6644734, -79.3669861], [43.7208504, -79.4152744], [43.6707006, -79.4532993], [43.6781015, -79.409415775], [43.7874914, -79.1507681], [43.7025981, -79.4032704], [43.6671385, -79.4227656], [43.6655242, -79.3838011], [43.7088231, -79.2959856], [43.7218363, -79.2362138], [43.7111699, -79.2481769], [43.6573699, -79.3565129], [43.695403, -79.293099], [43.8306384, -79.224609], [43.6715454, -79.4483222], [43.697936, -79.3972908], [43.68809, -79

In [76]:
# drop the neighbordhoods that are not available from demo_df
for s in not_avail:
    demo_df = demo_df[demo_df['Neighborhood'] != s]

# create new DF for lat lng for each neighborhood
lat_list = [lat_lng_list[x][0] for x in range(len(lat_lng_list))]
lng_list = [lat_lng_list[x][1] for x in range(len(lat_lng_list))]
lat_lng_df = pd.DataFrame({'Neighborhood':demo_df['Neighborhood'],
                           'Latitude':lat_list,
                          'Longitude':lng_list})
# now, merge lat_lng_df and demo_df by 'Neighborhood'
complete_tor_df = pd.merge(lat_lng_df, demo_df, on='Neighborhood')
complete_tor_df

Unnamed: 0,Neighborhood,Latitude,Longitude,Population,Density (people/km2),PopulationChange %,AvgIncome
0,Agincourt,43.785353,-79.278549,44577,3580,4.6,25750
1,Alderwood,43.601717,-79.545232,11656,2360,-4.0,35239
2,Alexandra Park,43.650787,-79.404318,4355,13609,0.0,19687
3,Allenby,43.711351,-79.553424,2513,4333,-1.0,245592
4,Amesbury,43.706162,-79.483492,17318,4934,1.1,27546
...,...,...,...,...,...,...,...
159,Woburn,43.759824,-79.225291,48507,3636,-1.5,26190
160,Wychwood,43.682122,-79.423839,4182,6150,-2.0,53613
161,York Mills,43.744039,-79.406657,17564,2409,2.0,92099
162,York University Heights,43.758781,-79.519434,26140,1979,-1.2,24432


### Access to Foursquare to top venues near each neighborhood

In [15]:
CLIENT_ID = 'HT110NTAISUECGEVJNJ0ER0XPBTJGEUN5RTHQZX5TBI2JHHZ' # your Foursquare ID
CLIENT_SECRET = '5R332C1DX3M5FHJEPJHRVGMA1A41FJ01FJY13450I4OBAHKO' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: HT110NTAISUECGEVJNJ0ER0XPBTJGEUN5RTHQZX5TBI2JHHZ
CLIENT_SECRET:5R332C1DX3M5FHJEPJHRVGMA1A41FJ01FJY13450I4OBAHKO


In [85]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [88]:
# # Clean the json and structure it into a pandas dataframe
# venues = results['response']['groups'][0]['items']
    
# nearby_venues = json_normalize(venues) # flatten JSON

# # filter columns
# filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
# nearby_venues =nearby_venues.loc[:, filtered_columns]

# # filter the category for each row
# nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# # clean columns
# nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

# nearby_venues.head()

In [89]:
def getVenues (name, latitude, longitude, rad=500, lim=100):
    venues_list=[]
    for name, lat, lng in zip(name, latitude, longitude):   
        # request data from Foursquare
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        lat, 
        lng, 
        rad, 
        lim)
        
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [91]:
venues_df = getVenues(complete_tor_df['Neighborhood'], complete_tor_df['Latitude'], complete_tor_df['Longitude'])

In [92]:
venues_df

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Agincourt,43.785353,-79.278549,Tim Hortons,43.785637,-79.279215,Coffee Shop
1,Agincourt,43.785353,-79.278549,One2 Snacks,43.787048,-79.276658,Asian Restaurant
2,Agincourt,43.785353,-79.278549,In Cheon House Korean & Japanese Restaurant 인천관,43.786468,-79.275693,Korean Restaurant
3,Agincourt,43.785353,-79.278549,Beef Noodle Restaurant 老李牛肉麵,43.785937,-79.276031,Chinese Restaurant
4,Agincourt,43.785353,-79.278549,Congee King,43.785908,-79.276042,Chinese Restaurant
...,...,...,...,...,...,...,...
3630,Yorkville,43.671386,-79.390168,Yorkville Village,43.671096,-79.394417,Shopping Mall
3631,Yorkville,43.671386,-79.390168,The One Eighty,43.668575,-79.388210,American Restaurant
3632,Yorkville,43.671386,-79.390168,Whole Hearth Bakery & Cafe,43.672005,-79.395405,Bakery
3633,Yorkville,43.671386,-79.390168,Prairie Girl Bakery,43.669144,-79.394196,Cupcake Shop


In [93]:
# check how many venues were returned for each neighborhood
venues_df.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,16,16,16,16,16,16
Alderwood,8,8,8,8,8,8
Alexandra Park,100,100,100,100,100,100
Allenby,8,8,8,8,8,8
Amesbury,6,6,6,6,6,6
...,...,...,...,...,...,...
Woburn,22,22,22,22,22,22
Wychwood,56,56,56,56,56,56
York Mills,14,14,14,14,14,14
York University Heights,19,19,19,19,19,19


In [95]:
# find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(venues_df['Venue Category'].unique())))

There are 321 uniques categories.


### Analyze Each Neighborhood

In [103]:
# one hot encoding (convert venue category variables into binary variable)
venues_onehot = pd.get_dummies(venues_df[['Venue Category']], prefix='', prefix_sep='')

# add neighborhood column back to dataframe
venues_onehot['Neighborhood'] = venues_df['Neighborhood'] 

# find the idx of 'Neighborhood' column
a = list(venues_onehot.columns)
a.index('Neighborhood') #212

# now, move the Neighborhood column to 1st
fixed_columns = [venues_onehot.columns[212]] + list(venues_onehot.columns[:212]) + list(venues_onehot.columns[213:])
venues_onehot = venues_onehot[fixed_columns]
venues_onehot.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,Afghan Restaurant,African Restaurant,American Restaurant,Animal Shelter,Antique Shop,Aquarium,Arcade,...,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Agincourt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Agincourt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Agincourt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Agincourt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [104]:
venues_onehot.shape

(3635, 321)

In [106]:
venues_grouped = venues_onehot.groupby('Neighborhood').mean().reset_index()
venues_grouped.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,Afghan Restaurant,African Restaurant,American Restaurant,Animal Shelter,Antique Shop,Aquarium,Arcade,...,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Alderwood,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Alexandra Park,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,...,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01
3,Allenby,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Amesbury,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [107]:
venues_grouped.shape

(159, 321)

In [118]:
num_top_venues = 5

for hood in venues_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = venues_grouped[venues_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                  venue  freq
0    Chinese Restaurant  0.19
1  Cantonese Restaurant  0.12
2  Hong Kong Restaurant  0.06
3                Bakery  0.06
4           Coffee Shop  0.06


----Alderwood----
          venue  freq
0   Pizza Place  0.25
1          Pool  0.12
2  Skating Rink  0.12
3           Gym  0.12
4   Coffee Shop  0.12


----Alexandra Park----
                           venue  freq
0                            Bar  0.10
1           Caribbean Restaurant  0.05
2                           Café  0.04
3         Furniture / Home Store  0.04
4  Vegetarian / Vegan Restaurant  0.03


----Allenby----
                venue  freq
0          Restaurant  0.12
1   Fish & Chips Shop  0.12
2  African Restaurant  0.12
3           Bookstore  0.12
4                Café  0.12


----Amesbury----
                venue  freq
0         Coffee Shop  0.17
1        Intersection  0.17
2                Park  0.17
3  Athletics & Sports  0.17
4         Gas Station  0.17


----Armour Heigh

### Let's put that into a pandas dataframe
#### First, let's write a function to sort the venues in descending order.

In [132]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [145]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns list according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = venues_grouped['Neighborhood']

# now fill the data of the new data frame
for ind in np.arange(venues_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(venues_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Chinese Restaurant,Cantonese Restaurant,Hong Kong Restaurant,Shopping Mall,Rental Car Location,Train Station,Korean Restaurant,Coffee Shop,Butcher,Vietnamese Restaurant
1,Alderwood,Pizza Place,Pub,Gym,Skating Rink,Coffee Shop,Sandwich Place,Pool,Farm,Electronics Store,Doctor's Office
2,Alexandra Park,Bar,Caribbean Restaurant,Furniture / Home Store,Café,Vegetarian / Vegan Restaurant,Poutine Place,Art Gallery,Park,Coffee Shop,Italian Restaurant
3,Allenby,Bookstore,Café,African Restaurant,Fish & Chips Shop,Big Box Store,Fast Food Restaurant,Intersection,Restaurant,Falafel Restaurant,Eastern European Restaurant
4,Amesbury,Intersection,Gas Station,Athletics & Sports,Park,Coffee Shop,Bank,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School


## Cluster Neighborhoods
#### Run k-means to cluster the neighborhood into 5 clusters

In [193]:
# add clustering labels
neighborhoods_venues_sorted.drop('Cluster Labels', axis=1, inplace=True)
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = complete_tor_df
complete_tor_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

# drop the neighborhoods that contains NaN
toronto_merged.dropna(subset=['Cluster Labels'], axis = 0, inplace=True)
toronto_merged.reset_index(inplace=True)
toronto_merged # check the last columns!

Unnamed: 0,index,Neighborhood,Latitude,Longitude,Population,Density (people/km2),PopulationChange %,AvgIncome,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,Agincourt,43.785353,-79.278549,44577,3580,4.6,25750,0.0,Chinese Restaurant,Cantonese Restaurant,Hong Kong Restaurant,Shopping Mall,Rental Car Location,Train Station,Korean Restaurant,Coffee Shop,Butcher,Vietnamese Restaurant
1,1,Alderwood,43.601717,-79.545232,11656,2360,-4.0,35239,0.0,Pizza Place,Pub,Gym,Skating Rink,Coffee Shop,Sandwich Place,Pool,Farm,Electronics Store,Doctor's Office
2,2,Alexandra Park,43.650787,-79.404318,4355,13609,0.0,19687,0.0,Bar,Caribbean Restaurant,Furniture / Home Store,Café,Vegetarian / Vegan Restaurant,Poutine Place,Art Gallery,Park,Coffee Shop,Italian Restaurant
3,3,Allenby,43.711351,-79.553424,2513,4333,-1.0,245592,0.0,Bookstore,Café,African Restaurant,Fish & Chips Shop,Big Box Store,Fast Food Restaurant,Intersection,Restaurant,Falafel Restaurant,Eastern European Restaurant
4,4,Amesbury,43.706162,-79.483492,17318,4934,1.1,27546,1.0,Intersection,Gas Station,Athletics & Sports,Park,Coffee Shop,Bank,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
154,159,Woburn,43.759824,-79.225291,48507,3636,-1.5,26190,0.0,Fast Food Restaurant,Discount Store,Bank,Coffee Shop,Big Box Store,Beer Store,Toy / Game Store,Department Store,Gym,Sandwich Place
155,160,Wychwood,43.682122,-79.423839,4182,6150,-2.0,53613,0.0,Italian Restaurant,Restaurant,Coffee Shop,Ice Cream Shop,Pizza Place,Bakery,Convenience Store,Café,Sushi Restaurant,Burger Joint
156,161,York Mills,43.744039,-79.406657,17564,2409,2.0,92099,0.0,Coffee Shop,Gym,Thai Restaurant,Optical Shop,Business Service,French Restaurant,Pub,Indian Restaurant,Sandwich Place,Restaurant
157,162,York University Heights,43.758781,-79.519434,26140,1979,-1.2,24432,0.0,Pizza Place,Fast Food Restaurant,Discount Store,Grocery Store,Coffee Shop,Sandwich Place,Caribbean Restaurant,Liquor Store,Gym / Fitness Center,Fried Chicken Joint


In [None]:
# Visualize K-mean clustering 
plt.scatter(toronto_merged['Sepal length'],toronto_merged['Sepal width'],c=toronto_merged['predict'],alpha=0.5)


In [194]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    cluster = int(cluster)
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine Clusters 
#### Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster

#### Cluster 0

In [195]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, 
                   toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Density (people/km2),PopulationChange %,AvgIncome,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,3580,4.6,25750,0.0,Chinese Restaurant,Cantonese Restaurant,Hong Kong Restaurant,Shopping Mall,Rental Car Location,Train Station,Korean Restaurant,Coffee Shop,Butcher,Vietnamese Restaurant
1,Alderwood,2360,-4.0,35239,0.0,Pizza Place,Pub,Gym,Skating Rink,Coffee Shop,Sandwich Place,Pool,Farm,Electronics Store,Doctor's Office
2,Alexandra Park,13609,0.0,19687,0.0,Bar,Caribbean Restaurant,Furniture / Home Store,Café,Vegetarian / Vegan Restaurant,Poutine Place,Art Gallery,Park,Coffee Shop,Italian Restaurant
3,Allenby,4333,-1.0,245592,0.0,Bookstore,Café,African Restaurant,Fish & Chips Shop,Big Box Store,Fast Food Restaurant,Intersection,Restaurant,Falafel Restaurant,Eastern European Restaurant
5,Armour Heights,1914,2.0,116651,0.0,Deli / Bodega,Market,Pharmacy,Farm,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant,Event Space
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
154,Woburn,3636,-1.5,26190,0.0,Fast Food Restaurant,Discount Store,Bank,Coffee Shop,Big Box Store,Beer Store,Toy / Game Store,Department Store,Gym,Sandwich Place
155,Wychwood,6150,-2.0,53613,0.0,Italian Restaurant,Restaurant,Coffee Shop,Ice Cream Shop,Pizza Place,Bakery,Convenience Store,Café,Sushi Restaurant,Burger Joint
156,York Mills,2409,2.0,92099,0.0,Coffee Shop,Gym,Thai Restaurant,Optical Shop,Business Service,French Restaurant,Pub,Indian Restaurant,Sandwich Place,Restaurant
157,York University Heights,1979,-1.2,24432,0.0,Pizza Place,Fast Food Restaurant,Discount Store,Grocery Store,Coffee Shop,Sandwich Place,Caribbean Restaurant,Liquor Store,Gym / Fitness Center,Fried Chicken Joint


#### Cluster 1

In [196]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, 
                   toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Density (people/km2),PopulationChange %,AvgIncome,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Amesbury,4934,1.1,27546,1.0,Intersection,Gas Station,Athletics & Sports,Park,Coffee Shop,Bank,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School
6,Banbury,2442,5.0,92319,1.0,Tennis Court,Auto Garage,Park,Yoga Studio,Event Space,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant
7,Bathurst Manor,3187,12.3,34169,1.0,Convenience Store,Playground,Baseball Field,Park,Falafel Restaurant,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant
15,Bracondale Hill,8618,-3.0,41605,1.0,Park,Art Gallery,Bar,Coffee Shop,Bakery,Falafel Restaurant,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School
20,Carleton Village,8843,-4.0,23301,1.0,Jewelry Store,Park,Coffee Shop,Dog Run,Yoga Studio,Event Space,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
30,Crescent Town,20393,-10.0,23021,1.0,Convenience Store,Park,Golf Course,Metro Station,Costume Shop,Creperie,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School
31,Davenport,8870,-6.9,28335,1.0,Convenience Store,Music Venue,Park,Coffee Shop,Dog Run,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School
38,Dovercourt Park,9767,-9.2,28311,1.0,Coffee Shop,Park,Bar,Café,Brazilian Restaurant,Fish Market,Dumpling Restaurant,Flea Market,Eastern European Restaurant,Electronics Store
51,Forest Hill,5530,-0.2,101631,1.0,Playground,Mediterranean Restaurant,Bank,Park,Yoga Studio,Event Space,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School
61,Henry Farm,3066,-6.0,56395,1.0,Women's Store,Tennis Court,Lawyer,Park,Yoga Studio,Event Space,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School


#### Cluster 2

In [197]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, 
                   toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Density (people/km2),PopulationChange %,AvgIncome,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Centennial,2544,0.5,34867,2.0,Fish & Chips Shop,Park,Yoga Studio,Event Space,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant
46,Eringate,3282,-3.4,34789,2.0,Park,Yoga Studio,Dog Run,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant,Event Space
54,Governor's Bridge/Bennington Heights,1129,4.0,129904,2.0,Park,Trail,Yoga Studio,Event Space,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant
75,Kingsview Village,4013,-6.2,32004,2.0,Park,Yoga Studio,Dog Run,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant,Event Space
104,Old Mill/Baby Point,3748,1.0,110372,2.0,Park,River,Yoga Studio,Ethiopian Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School
108,Pelmo Park,2001,14.0,32002,2.0,Park,Coffee Shop,Yoga Studio,Falafel Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant
111,Port Union,2310,-1.7,48117,2.0,Park,Yoga Studio,Dog Run,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant,Event Space
116,Rosedale,2821,4.8,213941,2.0,Park,Playground,Bike Trail,Yoga Studio,Falafel Restaurant,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant
117,Rouge,791,175.0,29230,2.0,Park,Fast Food Restaurant,Yoga Studio,Event Space,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant


#### Cluster 3

In [198]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, 
                   toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Density (people/km2),PopulationChange %,AvgIncome,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Bayview Woods – Steeles,3267,-1.5,41485,3.0,Dog Run,Trail,Yoga Studio,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant,Event Space
148,West Rouge,2090,-1.8,44605,3.0,Food Truck,Trail,Yoga Studio,Falafel Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant


#### Cluster 4

In [199]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, 
                   toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Density (people/km2),PopulationChange %,AvgIncome,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
87,Lytton Park,5073,5.0,127356,4.0,Garden,Playground,Falafel Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant,Event Space
128,Steeles,5464,1.9,26660,4.0,Playground,Yoga Studio,Falafel Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant,Event Space
130,Sunnylea,3366,-1.1,51398,4.0,Deli / Bodega,Playground,Yoga Studio,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant,Event Space
