#Table of contents 
> ### Introduction 
> ### Problem statement
> ### Methodology
> ### Result and Discussions
> ### Conclusion









## Introduction

The strength and vitality of the many neighbourhoods that make up Toronto, Ontario, Canada has earned the city its unofficial nickname of "the city of neighbourhoods." There are over 140 neighbourhoods officially recognized by the City of Toronto and upwards of 240 official and unofficial neighbourhoods within city limits. Before 1998, Toronto was a much smaller municipality and formed the core of Metropolitan Toronto. When the city amalgamated that year, Toronto grew to encompass the former municipalities of York, East York, North York, Etobicoke, and Scarborough. Each of these former municipalities still maintains, to a certain degree, its own distinct identity, and the names of these municipalities are still used by their residents, sometimes for disambiguation purposes as amalgamation resulted in duplicated street names. The area known as Toronto before the amalgamation is sometimes called the "old" City of Toronto, the Central District or simply "Downtown". The "inner ring" suburbs of York and East York are older, predominantly middle-income areas, and ethnically diverse. Much of the housing stock in these areas consists of pre-World War II single-family houses and do not (obviously) post-war high-rises. Many of the neighbourhoods in these areas were built up as streetcar suburbs and contain many dense and mixed-use streets, some of which are one-way. They share many characteristics with sections of the "old" city outside the downtown core.

## Problem statement
The idea is to find neighborhood in Toronto city of Canada that has all the basic necessity shops within kilometers of the living place. People who are new to the city or shifting from another city to Toronto may require a place to live in. It might be difficult for them to find the neighborhood with all their necessities. The aim of the project is to divide the city neighborhoods in different categories according to shops and facilities available in the neighborhoods. The Foursquare API will be used to find all the nearby venues in neighborhoods and retrieve categories and count of shops in each category for each neighborhood.

## Methodology

### Clustering
means clustering with 5 clusters were used on the dataset. The features of clustering were those 7 categories retrieved on previous step. The frequency of occurrence of each category determined clusters of neighborhoods. The cluster which has high frequency of occurrence of these categories are better. These clusters will help in recognizing neighborhoods with needed category shop

### Data
Data of boroughs and neighborhoods of the Toronto City would be retrieved from Wikipedia (https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Toronto). The data is there in form of tables with postal codes and names of neighborhoods in each of the Borough. The Geospatial data would be used to retrieve Longitude and Latitude of each neighborhood. Then, Foursquare API would be used to retrieved nearby venues of each neighborhood.
Wikipedia Data: Columns Retrieved: Borough, Postal Code, Neighborhoods
Foursquare Data: Latitude, Longitude, Venues, Category

### Libary used
 Pandas: For creating and manipulating dataframes.</br>Folium: Python visualization library would be used to visualize the neighborhoods cluster distribution of using interactive leaflet map.</br>
 Scikit Learn: For importing k-means clustering.</br>
 JSON: Library to handle JSON files.</br>
 XML: To separate data from presentation and XML stores data in plain text format.</br>
 Geocoder: To retrieve Location Data.</br>
 Beautiful Soup and Requests: To scrap and library to handle http requests.</br>
 Matplotlib: Python Plotting Module.


# Import of all necessary libraries

In [None]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np

# BeautifulSoup Object to make request to website

In [None]:
URL = 'http://en.turkcewiki.org/wiki/List_of_postal_codes_of_Canada:_M'
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'lxml')

# Table tags

In [None]:
tble = soup.find('table')
print(len(tble))

2


# initialise table

In [None]:
postal_codes=[]
boroughs = []
neighs = []

# Web Scraping from the table

In [None]:
for neigh in tble.find_all('td'):
  sp = neigh.find('span')
  if(sp.text != 'Not assigned'):
    postal_codes.append(neigh.find('b').text)
    data = sp.text
    split_both = data.split("(")
    hoods = split_both[1].split(")")[0]
    hoods_data = hoods.replace("/",",")
    boroughs.append(split_both[0])
    neighs.append(hoods_data)

In [None]:
print(len(postal_codes))
print(len(boroughs))
print(len(neighs))

103
103
103


In [None]:
df = pd.DataFrame(
columns=['PostalCode','Borough','Neighbourhood'])
df

Unnamed: 0,PostalCode,Borough,Neighbourhood


In [None]:
df['PostalCode'] =postal_codes
df['Borough'] = boroughs
df['Neighbourhood'] = neighs

df

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park , Harbourfront"
3,M6A,North York,"Lawrence Manor , Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
...,...,...,...
98,M8X,Etobicoke,"The Kingsway , Montgomery Road , Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East TorontoBusiness reply mail Processing Cen...,Enclave of M4L
101,M8Y,Etobicoke,"Old Mill South , King's Mill Park , Sunnylea ,..."


In [None]:
len(df['PostalCode'].unique())

103

# GeoSpatial Data of Latitude and Longitude

In [None]:
from io import StringIO 
url = 'http://cocl.us/Geospatial_data'
s=requests.get(url).content
c=pd.read_csv(StringIO(s.decode('utf-8')))
c

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


# Merging tables

In [None]:
df2= df.merge(c, left_on='PostalCode',right_on = 'Postal Code', how='left')
df2.drop(columns=['Postal Code'],inplace=True)
df2

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor , Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway , Montgomery Road , Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East TorontoBusiness reply mail Processing Cen...,Enclave of M4L,43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South , King's Mill Park , Sunnylea ,...",43.636258,-79.498509


# Examine the North York Borough of Toronto

In [None]:
df3 = df2[df2['Borough']=='North York']
df3

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
3,M6A,North York,"Lawrence Manor , Lawrence Heights",43.718518,-79.464763
7,M3B,North York,Don Mills,43.745906,-79.352188
10,M6B,North York,Glencairn,43.709577,-79.445073
13,M3C,North York,Don Mills,43.7259,-79.340923
27,M2H,North York,Hillcrest Village,43.803762,-79.363452
28,M3H,North York,"Bathurst Manor , Wilson Heights , Downsview North",43.754328,-79.442259
33,M2J,North York,"Fairview , Henry Farm , Oriole",43.778517,-79.346556
34,M3J,North York,"Northwood Park , York University",43.76798,-79.487262


### import libaries

In [None]:
from geopy.geocoders import Nominatim 
import folium
import json
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

# Getting Longitude and latitude of Toronto City

In [None]:
address = 'Toronto'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 43.6534817, -79.3839347.


# Visulisation of all the neighborhoods of Toronto city

In [None]:
map_north_york = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, lng, label in zip(df2['Latitude'],df2['Longitude'], df2['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_north_york)  
    
map_north_york

# Adding foursquare credential

In [None]:
CLIENT_ID = 'K5BR4LYCYZRKQGKKR1UEXCFBB4BNZVO4XJ3NG3TZYYC3NXF2' 
CLIENT_SECRET = 'N3HVKBN2421EIBC3NYGHQVOBKDRHAVYQYJENGMBUQ0ERMF5W'
VERSION = '20210515' 
LIMIT = 100 

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: K5BR4LYCYZRKQGKKR1UEXCFBB4BNZVO4XJ3NG3TZYYC3NXF2
CLIENT_SECRET:N3HVKBN2421EIBC3NYGHQVOBKDRHAVYQYJENGMBUQ0ERMF5W


# longitude and lattitude of the first neighbourhood

In [None]:
neighborhood_latitude = df3.loc[0, 'Latitude']
neighborhood_longitude = df3.loc[0, 'Longitude']

neighborhood_name = df3.loc[0, 'Neighbourhood'] 

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Parkwoods are 43.7532586, -79.3296565.


# request to foursquare API to explore venues near Parkwood

In [None]:
LIMIT =100
radius =500
AUTH = 'IKEMNNFA2KMSR1AEWHKN2HUZEV3HCQB3GNZHGQKJ4EZ3XL3O'
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&oauth_token={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    AUTH,
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

In [None]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '609e95c2112cbf088820e65e'},
 'notifications': [{'item': {'unreadCount': 0}, 'type': 'notificationTray'}],
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4e8d9dcdd5fbbbb6b3003c7b-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/park_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d163941735',
         'name': 'Park',
         'pluralName': 'Parks',
         'primary': True,
         'shortName': 'Park'}],
       'id': '4e8d9dcdd5fbbbb6b3003c7b',
       'location': {'address': 'Toronto',
        'cc': 'CA',
        'city': 'Toronto',
        'country': 'Canada',
        'distance': 245,
        'formattedAddress': ['Toronto', 'Toronto ON', 'Canada'],
        'labeledLatLngs': [{'label': 'display',
          

# Retrieving categories of venues obtained

In [None]:
df3 = df2

In [None]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# Cleaning response and generating Dataframe

In [None]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) 

filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Brookbanks Park,Park,43.751976,-79.33214
1,Careful & Reliable Painting,Construction & Landscaping,43.752622,-79.331957
2,649 Variety,Convenience Store,43.754513,-79.331942
3,Towns On The Ravine,Hotel,43.754754,-79.332552
4,Sun Life,Construction & Landscaping,43.75476,-79.332783


In [None]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

10 venues were returned by Foursquare.


# Getting venues near all the neighbourhoods in north york borough

In [None]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    AUTH = 'IKEMNNFA2KMSR1AEWHKN2HUZEV3HCQB3GNZHGQKJ4EZ3XL3O'
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        url = 'https://api.foursquare.com/v2/venues/explore?&oauth_token={}&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            AUTH,
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

# Generating dataframe of all neighbourhoods

In [None]:
north_york_venues = getNearbyVenues(names=df3['Neighbourhood'],
                                   latitudes=df3['Latitude'],
                                   longitudes=df3['Longitude'])
north_york_venues

Parkwoods
Victoria Village
Regent Park , Harbourfront
Lawrence Manor , Lawrence Heights
Ontario Provincial Government
Islington Avenue
Malvern , Rouge
Don Mills
Parkview Hill , Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park , Princess Gardens , Martin Grove , Islington , Cloverdale
Rouge Hill , Port Union , Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate , Bloordale Gardens , Old Burnhamthorpe , Markland Wood
Guildwood , Morningside , West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor , Wilson Heights , Downsview North
Thorncliffe Park
Richmond , Adelaide , King
Dufferin , Dovercourt Village
Scarborough Village
Fairview , Henry Farm , Oriole
Northwood Park , York University
The Danforth  East
Harbourfront East , Union Station , Toronto Islands
Little Portugal , Trinity
Kennedy Park , Ionview , East Birchmount Park
Bayview Village
Downsview
T

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.332140,Park
1,Parkwoods,43.753259,-79.329656,Careful & Reliable Painting,43.752622,-79.331957,Construction & Landscaping
2,Parkwoods,43.753259,-79.329656,649 Variety,43.754513,-79.331942,Convenience Store
3,Parkwoods,43.753259,-79.329656,Towns On The Ravine,43.754754,-79.332552,Hotel
4,Parkwoods,43.753259,-79.329656,Sun Life,43.754760,-79.332783,Construction & Landscaping
...,...,...,...,...,...,...,...
3100,"Mimico NW , The Queensway West , South of Bloo...",43.628841,-79.520999,Koala Tan Tanning Salon & Sunless Spa,43.631370,-79.519006,Tanning Salon
3101,"Mimico NW , The Queensway West , South of Bloo...",43.628841,-79.520999,Once Upon A Child,43.631075,-79.518290,Kids Store
3102,"Mimico NW , The Queensway West , South of Bloo...",43.628841,-79.520999,Value Village,43.631269,-79.518238,Thrift / Vintage Store
3103,"Mimico NW , The Queensway West , South of Bloo...",43.628841,-79.520999,Kingsway Boxing Club,43.627254,-79.526684,Gym


In [None]:
north_york_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Careful & Reliable Painting,43.752622,-79.331957,Construction & Landscaping
2,Parkwoods,43.753259,-79.329656,649 Variety,43.754513,-79.331942,Convenience Store
3,Parkwoods,43.753259,-79.329656,Towns On The Ravine,43.754754,-79.332552,Hotel
4,Parkwoods,43.753259,-79.329656,Sun Life,43.75476,-79.332783,Construction & Landscaping


# Getting count of venues near neighbourhoods

In [None]:
north_york_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,7,7,7,7,7,7
"Alderwood , Long Branch",12,12,12,12,12,12
"Bathurst Manor , Wilson Heights , Downsview North",34,34,34,34,34,34
Bayview Village,6,6,6,6,6,6
"Bedford Park , Lawrence Manor East",54,54,54,54,54,54
...,...,...,...,...,...,...
"Willowdale , Newtonbrook",3,3,3,3,3,3
Woburn,4,4,4,4,4,4
Woodbine Heights,16,16,16,16,16,16
"York Mills , Silver Hills",3,3,3,3,3,3


# Unique categories of venues

In [None]:
print('There are {} uniques categories.'.format(len(north_york_venues['Venue Category'].unique())))

There are 322 uniques categories.


# One hot encoding categories for clustering

In [None]:
north_york_onehot = pd.get_dummies(north_york_venues[['Venue Category']], prefix="", prefix_sep="")

north_york_onehot['Neighborhood'] = north_york_venues['Neighborhood'] 

fixed_columns = [north_york_onehot.columns[-1]] + list(north_york_onehot.columns[:-1])
north_york_onehot = north_york_onehot[fixed_columns]

north_york_onehot.head()

Unnamed: 0,Yoga Studio,ATM,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Stadium,Bath House,Beach,Beer Bar,Beer Store,Belgian Restaurant,Bike Rental / Bike Share,Bike Shop,Bistro,Board Shop,...,South American Restaurant,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Stationery Store,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Truck Stop,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Watch Shop,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [None]:
north_york_onehot.shape

(3105, 322)

# Mean of neighbourhood categories

In [None]:
north_york_grouped = north_york_onehot.groupby('Neighborhood').mean().reset_index()
north_york_grouped

Unnamed: 0,Neighborhood,Yoga Studio,ATM,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Stadium,Bath House,Beach,Beer Bar,Beer Store,Belgian Restaurant,Bike Rental / Bike Share,Bike Shop,Bistro,...,South American Restaurant,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Stationery Store,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Truck Stop,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Watch Shop,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Agincourt,0.0,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
1,"Alderwood , Long Branch",0.0,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
2,"Bathurst Manor , Wilson Heights , Downsview North",0.0,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.029412,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
3,Bayview Village,0.0,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
4,"Bedford Park , Lawrence Manor East",0.0,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.092593,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.018519
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
92,"Willowdale , Newtonbrook",0.0,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
93,Woburn,0.0,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
94,Woodbine Heights,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.062500,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,...,0.0,0.0,0.062500,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
95,"York Mills , Silver Hills",0.0,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000


## Results and Discussion

There are 5 different clusters of neighborhoods. Red and Purple clusters have more neighborhoods compared to other clusters. There are basically 5 different types. The red clusters are mostly on the airport side of the City which seems less populated. Purple neighborhoods are near University of Toronto and beach side. This side is more dense than other sides. The yellow cluster is of neighborhoods which are very far from main city area. The sea blue cluster has only one neighborhood in it which is inside city region but it is only one neighborhood in the area. The Cyan clusters are nearly on the border of the city.

The results include 5 clusters and are of different properties and characteristics. The sea blue cluster has only one neighborhood and it is very deserted area. This area does not all the necessary facilities which makes it very weak candidate for the selection of this neighborhood. The Cyan cluster is at very end of the city which makes it very obvious for having less amenities so it is also not good for selection. The yellow cluster has very similar properties as Cyan s it is also a very bad candidate. There are two clusters remaining for the selection Red and Purple. The red cluster has no ATMs. The purple has few ATMs but is scarce in terms of Gyms and Shopping Malls. The red cluster is very scattered and purple is very dense in the area. The decision of choosing neighborhood now depends on distance, area of choice and which facilities are more important than others. 

# Getting top 5 venues of each neighbourhood and their frequencies

In [None]:
num_top_venues = 5

for hood in north_york_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = north_york_grouped[north_york_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
             venue  freq
0   Breakfast Spot  0.14
1           Lounge  0.14
2  Fireworks Store  0.14
3   Hardware Store  0.14
4     Skating Rink  0.14


----Alderwood , Long Branch----
            venue  freq
0     Pizza Place  0.17
1        Pharmacy  0.08
2     Coffee Shop  0.08
3  Sandwich Place  0.08
4      Playground  0.08


----Bathurst Manor , Wilson Heights , Downsview North----
               venue  freq
0  Mobile Phone Shop  0.06
1     Ice Cream Shop  0.06
2           Pharmacy  0.06
3               Bank  0.06
4        Coffee Shop  0.06


----Bayview Village----
                 venue  freq
0   Chinese Restaurant  0.17
1                 Bank  0.17
2                  Spa  0.17
3  Japanese Restaurant  0.17
4                 Café  0.17


----Bedford Park , Lawrence Manor East----
                venue  freq
0                 Spa  0.09
1         Pizza Place  0.06
2  Italian Restaurant  0.06
3           Juice Bar  0.04
4      Massage Studio  0.04


----Berczy Park--

In [None]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

# Neighbourhoods with their most common 10 venues

In [None]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = north_york_grouped['Neighborhood']

for ind in np.arange(north_york_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(north_york_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Hardware Store,Breakfast Spot,Skating Rink,Fireworks Store,Lounge,Latin American Restaurant,Clothing Store,Empanada Restaurant,Drugstore,Dry Cleaner
1,"Alderwood , Long Branch",Pizza Place,Skating Rink,Gym,Pharmacy,Sandwich Place,Athletics & Sports,Playground,Coffee Shop,Pub,Pool
2,"Bathurst Manor , Wilson Heights , Downsview North",Ice Cream Shop,Pharmacy,Bank,Coffee Shop,Spa,Mobile Phone Shop,Grocery Store,Sandwich Place,Sushi Restaurant,Supermarket
3,Bayview Village,Café,Japanese Restaurant,Bank,Gym,Spa,Chinese Restaurant,Women's Store,Eastern European Restaurant,Donut Shop,Drugstore
4,"Bedford Park , Lawrence Manor East",Spa,Italian Restaurant,Pizza Place,Boutique,Restaurant,Massage Studio,Business Service,Sushi Restaurant,Mobile Phone Shop,Sandwich Place


# K-means clustering with 3 clusters

In [None]:
kclusters = 3

north_york_grouped_clustering = north_york_grouped.drop('Neighborhood', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(north_york_grouped_clustering)

kmeans.labels_[0:10]

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 0], dtype=int32)

# Merge tables

In [None]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

north_york_merged = df3

north_york_merged = north_york_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')

north_york_merged.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,1.0,Construction & Landscaping,Convenience Store,Park,Hotel,Fireworks Store,BBQ Joint,Bus Stop,Food & Drink Shop,Electronics Store,Donut Shop
1,M4A,North York,Victoria Village,43.725882,-79.315572,1.0,Portuguese Restaurant,Pizza Place,Hockey Arena,Bridal Shop,Financial or Legal Service,Intersection,Coffee Shop,Eastern European Restaurant,Dog Run,Doner Restaurant
2,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65426,-79.360636,1.0,Coffee Shop,Pub,Theater,Art Gallery,Bakery,Park,Furniture / Home Store,Café,Performing Arts Venue,Breakfast Spot
3,M6A,North York,"Lawrence Manor , Lawrence Heights",43.718518,-79.464763,1.0,Clothing Store,Furniture / Home Store,Accessories Store,Home Service,Women's Store,Coffee Shop,Medical Center,Boutique,Lighting Store,Miscellaneous Shop
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494,1.0,Coffee Shop,Sandwich Place,Chinese Restaurant,Café,Pizza Place,Fast Food Restaurant,Salad Place,Sushi Restaurant,Burger Joint,Bank


# map of clusters

In [None]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(df_nec_merged['Latitude'], df_nec_merged['Longitude'],df_nec_merged['Neighbourhood'], df_nec_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Examining data in each cluster

In [None]:
north_york_merged.loc[north_york_merged['Cluster Labels'] == 0, north_york_merged.columns[[1] + list(range(5, north_york_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,York,0.0,Park,Women's Store,Spa,Miscellaneous Shop,Empanada Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant
35,East YorkEast Toronto,0.0,Convenience Store,Film Studio,Park,Metro Station,Women's Store,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dry Cleaner
45,North York,0.0,Martial Arts School,Cafeteria,Park,Women's Store,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant
49,North York,0.0,Construction & Landscaping,Park,Bakery,Massage Studio,Falafel Restaurant,Eastern European Restaurant,Doner Restaurant,Donut Shop,Drugstore,Farmers Market
52,North York,0.0,Home Service,Gym,Park,Women's Store,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dry Cleaner
64,York,0.0,Electronics Store,Convenience Store,Park,Jewelry Store,Women's Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dry Cleaner
66,North York,0.0,Construction & Landscaping,Park,Electronics Store,Flower Shop,Convenience Store,Falafel Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dry Cleaner
68,Central Toronto,0.0,Health & Beauty Service,Park,Sushi Restaurant,Trail,Bus Line,Jewelry Store,Eastern European Restaurant,Doner Restaurant,Donut Shop,Farm
83,Central Toronto,0.0,Tennis Court,Park,Playground,Beer Store,Women's Store,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dry Cleaner
85,Scarborough,0.0,Playground,Park,Intersection,Women's Store,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dry Cleaner


In [None]:
north_york_merged.loc[north_york_merged['Cluster Labels'] == 1, north_york_merged.columns[[1] + list(range(5, north_york_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,1.0,Construction & Landscaping,Convenience Store,Park,Hotel,Fireworks Store,BBQ Joint,Bus Stop,Food & Drink Shop,Electronics Store,Donut Shop
1,North York,1.0,Portuguese Restaurant,Pizza Place,Hockey Arena,Bridal Shop,Financial or Legal Service,Intersection,Coffee Shop,Eastern European Restaurant,Dog Run,Doner Restaurant
2,Downtown Toronto,1.0,Coffee Shop,Pub,Theater,Art Gallery,Bakery,Park,Furniture / Home Store,Café,Performing Arts Venue,Breakfast Spot
3,North York,1.0,Clothing Store,Furniture / Home Store,Accessories Store,Home Service,Women's Store,Coffee Shop,Medical Center,Boutique,Lighting Store,Miscellaneous Shop
4,Queen's Park,1.0,Coffee Shop,Sandwich Place,Chinese Restaurant,Café,Pizza Place,Fast Food Restaurant,Salad Place,Sushi Restaurant,Burger Joint,Bank
...,...,...,...,...,...,...,...,...,...,...,...,...
97,Downtown Toronto,1.0,Coffee Shop,Café,Hotel,Gym,Restaurant,Japanese Restaurant,Steakhouse,Seafood Restaurant,American Restaurant,Deli / Bodega
99,Downtown Toronto,1.0,Coffee Shop,Gay Bar,Sushi Restaurant,Japanese Restaurant,Yoga Studio,Burger Joint,Burrito Place,Theater,Bubble Tea Shop,Restaurant
100,East TorontoBusiness reply mail Processing Cen...,1.0,Light Rail Station,Gym / Fitness Center,Fast Food Restaurant,Brewery,Spa,Farmers Market,Smoke Shop,Martial Arts School,Skate Park,Yoga Studio
101,Etobicoke,1.0,Construction & Landscaping,Park,Home Service,Business Service,Locksmith,Women's Store,Empanada Restaurant,Donut Shop,Drugstore,Dry Cleaner


In [None]:
north_york_merged.loc[north_york_merged['Cluster Labels'] == 2, north_york_merged.columns[[1] + list(range(5, north_york_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,Scarborough,2.0,Women's Store,Health Food Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Eastern European Restaurant,Electronics Store


## Conclusion

The project overall helps person select best neighborhood to live in. The other aspect of the project may help shop owners and businessmen to determine what kind of shops would be required in the area. If a person could identify basic needs of people living in the neighborhood than one place with all those facilities can be built and would give guaranteed business. Finally, this project would help all the stakeholders to solve the problem and get the best solution.