### Capstone Project - The Battle of Neighbourhood
#### Executive Summary

#### Objective
ABC Limited is a chain of vegan restaurant and want to expand the foot print in Toronto. Find best location for setting up a vegan restaurant either in Toronto.
#### Goals
Find best location for setting up a vegan restaurant either in Toronto backed by the data.
#### Solution
Using four square api gather data on neighbourhood in Toronto. Perform analysis and inference. Provide recommended location and risk matrix to business.



In [1]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import json # library to handle JSON files
import requests # library to handle requests
import csv
import sys
trace = False

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
from sklearn.cluster import KMeans # import k-means from clustering stage
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2018.8.24          |        py35_1001         139 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    openssl-1.0.2r             |       h14c3975_0         3.1 MB  conda-forge
    ca-certificates-2019.6.16  |       hecc5488_0         145 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    altair-2.2.2               |           py35_1         462 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         4.0 MB

The following NEW packages will

#### Step two
- Scrap the data from Wikipedia

In [2]:
wiki_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

soup = BeautifulSoup(wiki_url,'lxml')
#print(soup.prettify())

df = soup.find('table',{'class':'wikitable sortable'})
#print(df)


#### Step three
- Get the features from Wikipedia table

In [3]:
feature_names = []

header_row = df.find('tr')
for header in header_row.find_all('th'):
    feature_name = ' '.join(header.find_all(text=True))
    feature_name = feature_name.replace('\n', '')
    feature_names.append(feature_name)

print(feature_names)

['Postcode', 'Borough', 'Neighbourhood']


- Extract the table

In [4]:
samples = []
sample_rows = df.find_all('tr')[1:]
for sample_row in sample_rows:
    features = []
    for feature_col in sample_row.find_all('td'):
        feature_value = ''
        text = feature_col.string
        if text:
            if trace:
                features.append('T = {}'.format(text))
            else:
                features.append(text)
            continue
        
        for child in feature_col.children:
            if child.name == 'span':
                if child.has_attr('class'):
                    if child['class'] == 'display:none':
                        continue
                if child.find_all(has_coords):
                    feature_value = get_coords(child)
                    if feature_value:
                        break
                    else:
                        continue
            if child.name == 'sup':
                continue
            if child.name == 'a':
                if child.string[0] == '[':
                    continue            
            if child.name == 'a':
                if trace:
                    feature_value = 'A = {}'.format(child.string)
                else:
                    feature_value = child.string
                break
            if child.name == 'font':
                if trace:
                    feature_value = 'F = {}'.format(child.string)
                else:
                    feature_value = child.string
                break
            try:
                # feature_value = '' for any tags not covered above
                content = child.contents
            except AttributeError:
                # Handle whitespace between child tags, treated as a child string
                if child.isspace():
                    continue
                if trace:
                    feature_value = 'E = {}'.format(child)
                else:
                    feature_value = child
                break
        features.append(feature_value)
    samples.append(dict(zip(feature_names, features)))

data = pd.DataFrame(samples)
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(data['Borough'].unique()),
        data.shape[0]
    )
)

The dataframe has 12 boroughs and 288 neighborhoods.


#### Data Cleansing
- Drop the Not assigned Borough
- assign Borough to Not assigned neighbourhood 

In [5]:
data = data.drop(data[data.Borough =='Not assigned'].index)
#data.Neighbourhood = np.where(data.Neighbourhood == 'Not assigned', data.Borough, data.Neighbourhood)
data.Neighbourhood = data['Neighbourhood'].str.replace('\n', '')
data.Borough = data['Borough'].str.replace('\n', '')
data.Neighbourhood[data.Neighbourhood == 'Not assigned'] = data.Borough
print(data[data.Postcode=='M7A'])

        Borough Neighbourhood Postcode
8  Queen's Park  Queen's Park      M7A


#### Change the dimension for the same Borough by creating list of Neighbourhood

In [6]:
p_table = data.groupby(['Postcode', 'Borough'])['Neighbourhood'].apply(lambda x: "%s" % ', '.join(x))
p_table = p_table.reset_index() 
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(p_table['Borough'].unique()),
        p_table.shape[0]
    )
)

The dataframe has 11 boroughs and 103 neighborhoods.


In [7]:
p_table.shape

(103, 3)

In [8]:
p_table['Borough'].unique()

array(['Scarborough', 'North York', 'East York', 'East Toronto',
       'Central Toronto', 'Downtown Toronto', 'York', 'West Toronto',
       "Queen's Park", 'Mississauga', 'Etobicoke'], dtype=object)

In [9]:
p_table.head(20)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


In [10]:
lat_long_data = pd.read_csv('https://cocl.us/Geospatial_data')

In [11]:
p_table = pd.merge(p_table,lat_long_data, left_on='Postcode', right_on='Postal Code', how='left')

In [12]:
p_table_lat_long = p_table.drop('Postal Code', axis = 1)

In [13]:
p_table_lat_long.head(12)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


### The following exploration is done with Borough containing word "Toronto" and similar analysis is conducted as with newyork data

#### Let get the geographical coordinate of Toronto using geopy library

In [14]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto City are 43.653963, -79.387207.


#### Visualize Toronto on map using folium library

In [15]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(p_table_lat_long['Latitude'], p_table_lat_long['Longitude'], p_table_lat_long['Borough'], p_table_lat_long['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Initialize foursquare api

In [16]:
{
    "tags": [
        "removecell",
    ]
}
CLIENT_ID = 'KYPPXH3M11Z54HYIU3TGQGVIVJJEGINYR1NIJUYHD0WDVVJI' # your Foursquare ID
CLIENT_SECRET = 'FQX4PSHDFT5GB35KGUFGC0DN1PVXATY4EVYYQVG0MPN4CPTQ' # your Foursquare Secret
VERSION = '20190501' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: KYPPXH3M11Z54HYIU3TGQGVIVJJEGINYR1NIJUYHD0WDVVJI
CLIENT_SECRET:FQX4PSHDFT5GB35KGUFGC0DN1PVXATY4EVYYQVG0MPN4CPTQ


## Extract Borough which Contains word "Toronto"

In [17]:
Toronto_data = p_table_lat_long[p_table_lat_long['Borough'].str.contains('Toronto')].reset_index(drop=True)
Toronto_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


#### Create map with borough containing Toronto

In [18]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="TO_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [19]:
# create map of Manhattan using latitude and longitude values
map_Boro_Toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(Toronto_data['Latitude'], Toronto_data['Longitude'], Toronto_data['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Boro_Toronto)  
    
map_Boro_Toronto

#### Extract the first Neighbourhood and It's coordinate

In [20]:
Toronto_data.loc[0, 'Neighbourhood']

'The Beaches'

In [21]:
neighbourhood_latitude = Toronto_data.loc[0, 'Latitude'] # neighbourhood latitude value
neighbourhood_longitude = Toronto_data.loc[0, 'Longitude'] # neighbourhood longitude value

neighbourhood_name = Toronto_data.loc[0, 'Neighbourhood'] # neighbourhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighbourhood_name, 
                                                               neighbourhood_latitude, 
                                                               neighbourhood_longitude))

Latitude and longitude values of The Beaches are 43.67635739999999, -79.2930312.


### Get the Venues for Vegetarian/Vegan restaurant using FourSquare API
#### The Category ID for vegetarian/Vegan restaurant is mentioned on https://developer.foursquare.com/docs/resources/categories
Category ID Vegetarian / Vegan Restaurant
4bf58dd8d48988d1d3941735

We are now ready to get the 15 recommendation within 1000 meters of location using four square API

In [22]:
# Configure additional Search parameters
categoryId = '4bf58dd8d48988d1d3941735' # Vegetarian/Vegan Restaurant
radius = 1000
limit = 15

# Get the restaurant in The beaches area
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={}&radius={}&limit={}'.format(
    CLIENT_ID,
    CLIENT_SECRET,
    neighbourhood_latitude,
    neighbourhood_longitude,
    VERSION,
    categoryId,
    radius,
    limit)

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d10532a86bc49002cbfde58'},
 'response': {'confident': True,
  'venues': [{'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/vegetarian_',
       'suffix': '.png'},
      'id': '4bf58dd8d48988d1d3941735',
      'name': 'Vegetarian / Vegan Restaurant',
      'pluralName': 'Vegetarian / Vegan Restaurants',
      'primary': True,
      'shortName': 'Vegetarian / Vegan'}],
    'hasPerk': False,
    'id': '4f5a855be4b0a4baa1ae0063',
    'location': {'address': '2188 Queen Street E',
     'cc': 'CA',
     'city': 'Toronto',
     'country': 'Canada',
     'crossStreet': 'Balsam Ave',
     'distance': 519,
     'formattedAddress': ['2188 Queen Street E (Balsam Ave)',
      'Toronto ON M431E6',
      'Canada'],
     'labeledLatLngs': [{'label': 'display',
       'lat': 43.672113947269565,
       'lng': -79.29033140068843}],
     'lat': 43.672113947269565,
     'lng': -79.29033140068843,
     'postalCode': 'M431E6',
     'state':

The requests returns a JSON object which can then be queried for the restaurant details required. A sample restaurnt from the results returned is shown below:

```json
{'meta': {'code': 200, 'requestId': '5d0fa3ed9ba3e5002cff9b1c'},
 'response': {'confident': True,
  'venues': [{'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/vegetarian_',
       'suffix': '.png'},
      'id': '4bf58dd8d48988d1d3941735',
      'name': 'Vegetarian / Vegan Restaurant',
      'pluralName': 'Vegetarian / Vegan Restaurants',
      'primary': True,
      'shortName': 'Vegetarian / Vegan'}],
    'hasPerk': False,
    'id': '4f5a855be4b0a4baa1ae0063',
    'location': {'address': '2188 Queen Street E',
     'cc': 'CA',
     'city': 'Toronto',
     'country': 'Canada',
     'crossStreet': 'Balsam Ave',
     'distance': 519,
     'formattedAddress': ['2188 Queen Street E (Balsam Ave)',
      'Toronto ON M431E6',
      'Canada'],
     'labeledLatLngs': [{'label': 'display',
       'lat': 43.672113947269565,
       'lng': -79.29033140068843}],
     'lat': 43.672113947269565,
     'lng': -79.29033140068843,
     'postalCode': 'M431E6',
     'state': 'ON'},
    'name': "Tori's Bakeshop",
    'referralId': 'v-1561306093',
    'venuePage': {'id': '43778861'}}]}}
 ```

```
From this JSON the following attributes are extraced and added to the Dataframe:

Restaurant ID
Restaurant Category Name
Restaurant Category ID
Restaurant Nest_name
Restaurant Address
Restaurant Postalcode
Restaurant City
Restaurant Latitude
Restaurant Longitude
Venue Name
Venue Latitude
Venue Longitude
```

## Neighbourhood Exploration

In [23]:
def getNearbyVenues(names, latitudes, longitudes, categoryId='4bf58dd8d48988d1d3941735', radius=1000, limit=100):
    rest_cols = ['Neighbourhood',
                 'id',
                 'category', 
                 'categoryID', 
                 'name', 
                 'address',
                 'postalcode',
                 'city',
                 'latitude',
                 'longitude', 
                 ]
    df_rest = pd.DataFrame(columns=rest_cols)
    venues_list = []
    for name_neigh, lat, long in zip(names, latitudes, longitudes):
        # print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            lat,
            long,
            VERSION,
            categoryId,
            radius,
            limit)
            
        # make the GET request
        results = requests.get(url).json()['response']['venues']
        #print(results)
        try:
            venues_list.append([(
                name_neigh,
                v['id'],
                v['categories'][0]['pluralName'],
                v['categories'][0]['id'],
                v['name'],
                v['location']['formattedAddress'][0],
                v['location']['formattedAddress'][1],
                v['location']['city'],
                v['location']['lat'],
                v['location']['lng']) for v in results])
        except:
            continue
    
    #print(venues_list)
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = rest_cols
    
    return (nearby_venues) 

### Get all the nearby Vegetarian/Vegan restaurant in Borough

In [24]:
Toronto_venues = getNearbyVenues(names=Toronto_data['Neighbourhood'],
                                   latitudes=Toronto_data['Latitude'],
                                   longitudes=Toronto_data['Longitude']
                                  )

In [25]:
Toronto_venues

Unnamed: 0,Neighbourhood,id,category,categoryID,name,address,postalcode,city,latitude,longitude
0,The Beaches,4f5a855be4b0a4baa1ae0063,Vegetarian / Vegan Restaurants,4bf58dd8d48988d1d3941735,Tori's Bakeshop,2188 Queen Street E (Balsam Ave),Toronto ON M431E6,Toronto,43.672114,-79.290331
1,The Beaches,4bd36093046076b0ecf17571,Bars,4bf58dd8d48988d116941735,Castro's Lounge,2116 Queen Street E (Wineva Ave.),Toronto ON M4E 1E2,Toronto,43.671104,-79.295107
2,"The Danforth West, Riverdale",4ad9fe24f964a520ed1c21e3,Breakfast Spots,4bf58dd8d48988d143941735,Mocha Mocha,489 Danforth Ave.,Toronto ON,Toronto,43.678078,-79.349459
3,"The Danforth West, Riverdale",4feb68ece4b07864fce4d4e7,Vegetarian / Vegan Restaurants,4bf58dd8d48988d1d3941735,Vegetarian Cafe in the Big Carrot,348 Danforth Ave,Toronto ON,Toronto,43.677874,-79.352939
4,"The Danforth West, Riverdale",5834707203cf257cba83f9c0,Vegetarian / Vegan Restaurants,4bf58dd8d48988d1d3941735,Green Earth Vegan Cuisine,804 Danforth Ave,Toronto ON M4J 1L2,Toronto,43.679713,-79.341331
5,"The Danforth West, Riverdale",4b6b6e30f964a5204c082ce3,Pizza Places,4bf58dd8d48988d1ca941735,Magic Oven,1450 Danforth Ave. (at Monarch Park Ave.),Toronto ON M4J 1N4,Toronto,43.679637,-79.341752
6,"The Beaches West, India Bazaar",4ae0c7a8f964a520638221e3,Indian Restaurants,4bf58dd8d48988d10f941735,Udupi Palace,1460 Gerrard St E (Coxwell Ave),Toronto ON M4L 2A3‎,Toronto,43.67248,-79.321275
7,"The Beaches West, India Bazaar",53d9133e498ef675684a0d50,Vegetarian / Vegan Restaurants,4bf58dd8d48988d1d3941735,The Social Gardener,1326 Gerrard St E (btwn Glenside and Highfield),Toronto ON,Toronto,43.671493,-79.325764
8,"The Beaches West, India Bazaar",538209be498ede9f52c34f7c,Food Trucks,4bf58dd8d48988d1cb941735,Portobello Burger,Toronto ON,Canada,Toronto,43.663849,-79.31411
9,Davisville North,521e0c6c04939a8ad55d93d3,Vegetarian / Vegan Restaurants,4bf58dd8d48988d1d3941735,Fresh,90 Eglinton Avenue East (Yonge & Eglinton),Toronto ON M4P 1A6,Toronto,43.707324,-79.395649


In [26]:
Toronto_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,id,category,categoryID,name,address,postalcode,city,latitude,longitude
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Business Reply Mail Processing Centre 969 Eastern,3,3,3,3,3,3,3,3,3
"Cabbagetown, St. James Town",1,1,1,1,1,1,1,1,1
Central Bay Street,34,34,34,34,34,34,34,34,34
Church and Wellesley,9,9,9,9,9,9,9,9,9
Davisville,6,6,6,6,6,6,6,6,6
Davisville North,5,5,5,5,5,5,5,5,5
"Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West",3,3,3,3,3,3,3,3,3
"Dovercourt Village, Dufferin",1,1,1,1,1,1,1,1,1
"Forest Hill North, Forest Hill West",3,3,3,3,3,3,3,3,3
"Harbourfront, Regent Park",2,2,2,2,2,2,2,2,2


In [27]:
print('There are {} uniques categories.'.format(len(Toronto_venues['category'].unique())))

There are 13 uniques categories.


## Analyze Each Neighbourhood

In [28]:
# one hot encoding
Toronto_onehot = pd.get_dummies(Toronto_venues[['category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Toronto_onehot['Neighbourhood'] = Toronto_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [Toronto_onehot.columns[-1]] + list(Toronto_onehot.columns[:-1])
Toronto_onehot = Toronto_onehot[fixed_columns]

Toronto_onehot.head()

Unnamed: 0,Neighbourhood,Bakeries,Bars,Breakfast Spots,Comfort Food Restaurants,Fast Food Restaurants,Food Trucks,Indian Restaurants,Juice Bars,Pizza Places,Salad Places,Soup Places,Vegetarian / Vegan Restaurants,Vietnamese Restaurants
0,The Beaches,0,0,0,0,0,0,0,0,0,0,0,1,0
1,The Beaches,0,1,0,0,0,0,0,0,0,0,0,0,0
2,"The Danforth West, Riverdale",0,0,1,0,0,0,0,0,0,0,0,0,0
3,"The Danforth West, Riverdale",0,0,0,0,0,0,0,0,0,0,0,1,0
4,"The Danforth West, Riverdale",0,0,0,0,0,0,0,0,0,0,0,1,0


In [29]:
Toronto_onehot.shape

(86, 14)

### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [30]:
Toronto_grouped = Toronto_onehot.groupby('Neighbourhood').mean().reset_index()
Toronto_grouped

Unnamed: 0,Neighbourhood,Bakeries,Bars,Breakfast Spots,Comfort Food Restaurants,Fast Food Restaurants,Food Trucks,Indian Restaurants,Juice Bars,Pizza Places,Salad Places,Soup Places,Vegetarian / Vegan Restaurants,Vietnamese Restaurants
0,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.666667,0.0
1,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
2,Central Bay Street,0.058824,0.0,0.029412,0.0,0.029412,0.0,0.058824,0.029412,0.0,0.058824,0.029412,0.676471,0.029412
3,Church and Wellesley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.111111,0.777778,0.0
4,Davisville,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.833333,0.0
5,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
6,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
7,"Dovercourt Village, Dufferin",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
8,"Forest Hill North, Forest Hill West",0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.666667,0.0
9,"Harbourfront, Regent Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


Let's confirm the size of new dataframe

In [31]:
Toronto_grouped.shape

(19, 14)

Let's print each neighborhood along with the top 5 most common venues

In [32]:
num_top_venues = 5

for hood in Toronto_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = Toronto_grouped[Toronto_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Business Reply Mail Processing Centre 969 Eastern----
                            venue  freq
0  Vegetarian / Vegan Restaurants  0.67
1                     Food Trucks  0.33
2                        Bakeries  0.00
3                            Bars  0.00
4                 Breakfast Spots  0.00


----Cabbagetown, St. James Town----
                            venue  freq
0  Vegetarian / Vegan Restaurants   1.0
1                        Bakeries   0.0
2                            Bars   0.0
3                 Breakfast Spots   0.0
4        Comfort Food Restaurants   0.0


----Central Bay Street----
                            venue  freq
0  Vegetarian / Vegan Restaurants  0.68
1                        Bakeries  0.06
2              Indian Restaurants  0.06
3                    Salad Places  0.06
4                 Breakfast Spots  0.03


----Church and Wellesley----
                            venue  freq
0  Vegetarian / Vegan Restaurants  0.78
1                    Salad Places  0.11
2   

#### Let's put that into a *pandas* dataframe
First, let's write a function to sort the venues in descending order.

In [33]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [34]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = Toronto_grouped['Neighbourhood']

for ind in np.arange(Toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Business Reply Mail Processing Centre 969 Eastern,Vegetarian / Vegan Restaurants,Food Trucks,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Indian Restaurants,Fast Food Restaurants,Comfort Food Restaurants
1,"Cabbagetown, St. James Town",Vegetarian / Vegan Restaurants,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants,Comfort Food Restaurants
2,Central Bay Street,Vegetarian / Vegan Restaurants,Salad Places,Indian Restaurants,Bakeries,Vietnamese Restaurants,Soup Places,Juice Bars,Fast Food Restaurants,Breakfast Spots,Pizza Places
3,Church and Wellesley,Vegetarian / Vegan Restaurants,Soup Places,Salad Places,Vietnamese Restaurants,Pizza Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants,Comfort Food Restaurants
4,Davisville,Vegetarian / Vegan Restaurants,Comfort Food Restaurants,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants
5,Davisville North,Vegetarian / Vegan Restaurants,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants,Comfort Food Restaurants
6,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",Vegetarian / Vegan Restaurants,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants,Comfort Food Restaurants
7,"Dovercourt Village, Dufferin",Vegetarian / Vegan Restaurants,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants,Comfort Food Restaurants
8,"Forest Hill North, Forest Hill West",Vegetarian / Vegan Restaurants,Comfort Food Restaurants,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants
9,"Harbourfront, Regent Park",Vegetarian / Vegan Restaurants,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants,Comfort Food Restaurants


## Cluster Neighborhoods
Run *k*-means to cluster the neighborhood into 6 clusters.

In [35]:
# set number of clusters
kclusters = 5

Toronto_grouped_clustering = Toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 0, 3, 3, 3, 0, 0, 0, 3, 0], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [36]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Toronto_merged = Toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
Toronto_merged = Toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

Toronto_merged.head() # check the last columns!
#Toronto_merged.fillna(0,inplace=True)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,4.0,Vegetarian / Vegan Restaurants,Bars,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,1.0,Vegetarian / Vegan Restaurants,Pizza Places,Breakfast Spots,Vietnamese Restaurants,Soup Places,Salad Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572,2.0,Vegetarian / Vegan Restaurants,Indian Restaurants,Food Trucks,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Fast Food Restaurants,Comfort Food Restaurants
3,M4M,East Toronto,Studio District,43.659526,-79.340923,,,,,,,,,,,
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,,,,,,,,,,,


Finally, let's visualize the resulting clusters

In [37]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Toronto_merged['Latitude'], Toronto_merged['Longitude'], Toronto_merged['Neighbourhood'], Toronto_merged['Cluster Labels'].fillna(0).astype(int)):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

##  Examine Clusters

### Cluster 1

In [38]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 0, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Central Toronto,0.0,Vegetarian / Vegan Restaurants,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants,Comfort Food Restaurants
6,Central Toronto,0.0,Vegetarian / Vegan Restaurants,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants,Comfort Food Restaurants
8,Central Toronto,0.0,Vegetarian / Vegan Restaurants,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants,Comfort Food Restaurants
9,Central Toronto,0.0,Vegetarian / Vegan Restaurants,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants,Comfort Food Restaurants
10,Downtown Toronto,0.0,Vegetarian / Vegan Restaurants,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants,Comfort Food Restaurants
11,Downtown Toronto,0.0,Vegetarian / Vegan Restaurants,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants,Comfort Food Restaurants
13,Downtown Toronto,0.0,Vegetarian / Vegan Restaurants,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants,Comfort Food Restaurants
22,Central Toronto,0.0,Vegetarian / Vegan Restaurants,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants,Comfort Food Restaurants
31,West Toronto,0.0,Vegetarian / Vegan Restaurants,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants,Comfort Food Restaurants
36,West Toronto,0.0,Vegetarian / Vegan Restaurants,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants,Comfort Food Restaurants


### Cluster 2 - Park

In [39]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 1, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,East Toronto,1.0,Vegetarian / Vegan Restaurants,Pizza Places,Breakfast Spots,Vietnamese Restaurants,Soup Places,Salad Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants


### Cluster 3

In [40]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 2, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,East Toronto,2.0,Vegetarian / Vegan Restaurants,Indian Restaurants,Food Trucks,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Fast Food Restaurants,Comfort Food Restaurants


### Cluster 4

In [41]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 3, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Central Toronto,3.0,Vegetarian / Vegan Restaurants,Comfort Food Restaurants,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants
12,Downtown Toronto,3.0,Vegetarian / Vegan Restaurants,Soup Places,Salad Places,Vietnamese Restaurants,Pizza Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants,Comfort Food Restaurants
17,Downtown Toronto,3.0,Vegetarian / Vegan Restaurants,Salad Places,Indian Restaurants,Bakeries,Vietnamese Restaurants,Soup Places,Juice Bars,Fast Food Restaurants,Breakfast Spots,Pizza Places
23,Central Toronto,3.0,Vegetarian / Vegan Restaurants,Comfort Food Restaurants,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants
34,West Toronto,3.0,Vegetarian / Vegan Restaurants,Bakeries,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants
37,East Toronto,3.0,Vegetarian / Vegan Restaurants,Food Trucks,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Indian Restaurants,Fast Food Restaurants,Comfort Food Restaurants


### Cluster 5

In [42]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 4, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,4.0,Vegetarian / Vegan Restaurants,Bars,Vietnamese Restaurants,Soup Places,Salad Places,Pizza Places,Juice Bars,Indian Restaurants,Food Trucks,Fast Food Restaurants


### Conclusion
#### After Analyzing the cluster, The recommendation to starting a vegan restuarnt will be East Toronto