<a href="https://colab.research.google.com/github/quduyn/capstone/blob/master/Explore_and_Cluster_Toronto_Neighborhood_q1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1 align=center><font size = 8>Cluster and Explore Toronto Neighborhood</font></h1>

**In this document, you will find the process and result of exploring, segmenting, and clustering the neighborhoods in the city of Toronto. 
The analysis is structured with three parts as followed:**

1. Load and pre-process data
2. Geocode the data
3. Visualise and analysis
 





# **Part 1. Load and pre-process data**

## **1. Install necessary packages**

Install 'opencage' for Geocoding since it is free and more stable than Google at this moment

In [1]:
pip install opencage

Collecting opencage
  Downloading https://files.pythonhosted.org/packages/6d/f2/ed48d7e2fbd06f0ac8dbd511fecc233b68b523daccaae9fb1e6e56b240d4/opencage-1.2-py3-none-any.whl
Installing collected packages: opencage
Successfully installed opencage-1.2


Install 'requests' for web scrapping to retreive Toronto postcode data table from wikipedia

In [60]:
pip install requests



Install 'beautifulsoup4' for parsing from html and xml

In [3]:
pip install beautifulsoup4



##**2. Scrape Toronto postcode data from wikipedia into panda dataframe**

In [0]:
# import the library we use to open URLs and organise the data
import requests
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

from opencage.geocoder import OpenCageGeocode

website_url = requests.get('https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=945633050').text

from bs4 import BeautifulSoup
soup = BeautifulSoup(website_url,'lxml')


My_table = soup.find('table', class_ = 'wikitable sortable')
My_table

A=[]
B=[]
C=[]

# Apply .strip() to  remove newline \n characters
for row in My_table.findAll('tr'):
    cells=row.findAll('td')
    if len(cells)==3:
        A.append(cells[0].find(text=True).strip())
        B.append(cells[1].find(text=True).strip())
        C.append(cells[2].find(text=True).strip())

dfinitial=pd.DataFrame(A,columns=['Postcode'])
dfinitial['Borough']=B
dfinitial['Neighborhood']=C

df1 = dfinitial[dfinitial['Borough']!="Not assigned"]

df2 = df1.groupby(['Postcode','Borough'])['Neighborhood'].apply(', '.join).reset_index()
df2.loc[df2['Neighborhood'] == 'Not assigned', 'Neighborhood'] = df2['Borough']
df2

####**Check the shape of the dataframe**

In [62]:
df2.shape

(39, 5)

# **Part 2. Geocode the Boroughs**

##**3. Geocode the Boroughs using OpenCage**

In [0]:
key = 'f8a232db3569416ba2c68a30b17448df' # get api key from:  https://opencagedata.com
	
geocoder = OpenCageGeocode(key)

list_lat = []   # create empty lists
list_long = []
	
for index, row in df2.iterrows(): # iterate over rows in dataframe

    postcode=row['Postcode']
    City = 'Toronto'
    State ='Ontario'  
    Country='Canada'
    query = str(postcode)+','+str(City)+','+str(State)+','+str(Country)

    results = geocoder.geocode(query)   
    lat = results[0]['geometry']['lat']
    long = results[0]['geometry']['lng']

    list_lat.append(lat)
    list_long.append(long)
	
# create new columns from lists    

df2['latitude'] = list_lat   
df2['longitude'] = list_long



For illustration and runtime purposes, let's simplify the data set and segment and cluster only the boroughs containing "Toronto" in their names.

In [0]:
df2= df2[df2['Borough'].str.contains("Toronto")]
df2

# **Part 3. Visualisation and analysis**

###**Use geopy library to get the latitude and longitude values of Toronto.**

In [27]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


###**Create a map of Toronto with neighborhoods superimposed on top.**

In [28]:
import folium # map rendering library

# create map of New York using latitude and longitude values
map_canada = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df2['latitude'], df2['longitude'], df2['Borough'], df2['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_canada)  
    
map_canada

##**4. Get top 100 Toronto venues**

###**Create the GET request URL. Name URL url.**

In [30]:
CLIENT_ID = 'TJINGYMOFENQHSUYFL04LTORYAK2TJRAZDWXFM2MUW1EHZME' # your Foursquare ID
CLIENT_SECRET = '53ZN22E1MYTTZFBRDRQLBL3WD5GEEOSKDCGR0FEEKP5VZ30O' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: TJINGYMOFENQHSUYFL04LTORYAK2TJRAZDWXFM2MUW1EHZME
CLIENT_SECRET:53ZN22E1MYTTZFBRDRQLBL3WD5GEEOSKDCGR0FEEKP5VZ30O


In [31]:


# type your answer here
LIMIT = 100 # limit of number of venues returned by Foursquare API

# create URL
radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=TJINGYMOFENQHSUYFL04LTORYAK2TJRAZDWXFM2MUW1EHZME&client_secret=53ZN22E1MYTTZFBRDRQLBL3WD5GEEOSKDCGR0FEEKP5VZ30O&v=20180605&ll=43.6534817,-79.3839347&radius=500&limit=100'

###**Send the GET request and examine the resutls**

In [32]:
import requests # library to handle requests

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e8247be60ba08001bfe2a02'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-5227bb01498e17bf485e6202-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/neighborhood_',
          'suffix': '.png'},
         'id': '4f2a25ac4b909258e854f55f',
         'name': 'Neighborhood',
         'pluralName': 'Neighborhoods',
         'primary': True,
         'shortName': 'Neighborhood'}],
       'id': '5227bb01498e17bf485e6202',
       'location': {'cc': 'CA',
        'city': 'Toronto',
        'country': 'Canada',
        'distance': 113,
        'formattedAddress': ['Toronto ON', 'Canada'],
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.65323167517444,
          'lng': -79.38529600606677}],
        'lat': 43.6532

###**Form function (borrowed from Foursquare lab) to get information from 'item' key**

In [0]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

###**Clean the json and structure it into a pandas dataframe.**

In [35]:
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

venues = results['response']['groups'][0]['items']

nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Downtown Toronto,Neighborhood,43.653232,-79.385296
1,Nathan Phillips Square,Plaza,43.65227,-79.383516
2,Eggspectation Bell Trinity Square,Breakfast Spot,43.653144,-79.38198
3,Japango,Sushi Restaurant,43.655268,-79.385165
4,Indigo,Bookstore,43.653515,-79.380696


Check the number of venues returned by Foursquare

In [36]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


##**5. Explore Neighborhoods in Toronto**

###**Create a function to repeat the same process to all the neighborhoods in Toronto**

In [0]:
def getNearbyVenues(names, latitudes, longitudes, radius=750):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

###**Run the above function on each neighborhood and create a new dataframe called Toronto_venues.**

In [38]:
Toronto_venues = getNearbyVenues(names=df2['Neighborhood'],
                                   latitudes=df2['latitude'],
                                   longitudes=df2['longitude']
                                  )

The Beaches
The Danforth West, Riverdale
The Beaches West, India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park, Summerhill East
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North, Forest Hill West
The Annex, North Midtown, Yorkville
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie
Dovercourt Village, Dufferin
Little Portugal, Trinity
Brockton, Exhibition Place, Parkdale Village
High Park, The Junction Sout

###**Check the size of the resulting dataframe**

In [39]:
print(Toronto_venues.shape)
Toronto_venues.head()

(2727, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.6784,-79.2941,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.6784,-79.2941,Beaches Bake Shop,43.680363,-79.289692,Bakery
2,The Beaches,43.6784,-79.2941,The Beech Tree,43.680493,-79.288846,Gastropub
3,The Beaches,43.6784,-79.2941,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
4,The Beaches,43.6784,-79.2941,The Feathers Pub,43.680501,-79.287522,Pub


###**Check how many venues were returned for each neighborhood group (grouped by postcode)**

In [40]:
Toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Berczy Park,100,100,100,100,100,100
"Brockton, Exhibition Place, Parkdale Village",69,69,69,69,69,69
Business Reply Mail Processing Centre 969 Eastern,37,37,37,37,37,37
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",100,100,100,100,100,100
"Cabbagetown, St. James Town",75,75,75,75,75,75
Central Bay Street,100,100,100,100,100,100
"Chinatown, Grange Park, Kensington Market",100,100,100,100,100,100
Christie,95,95,95,95,95,95
Church and Wellesley,100,100,100,100,100,100


### **Check how many unique categories can be curated from all the returned venues**

In [41]:
print('There are {} uniques categories.'.format(len(Toronto_venues['Venue Category'].unique())))

There are 273 uniques categories.


# **6. Analyze Each Neighborhood**

In [42]:
# one hot encoding
Toronto_onehot = pd.get_dummies(Toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Toronto_onehot['Neighborhood'] = Toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Toronto_onehot.columns[-1]] + list(Toronto_onehot.columns[:-1])
Toronto_onehot = Toronto_onehot[fixed_columns]

Toronto_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,African Restaurant,American Restaurant,Animal Shelter,Antique Shop,Aquarium,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Stadium,Beach Bar,Beer Bar,Beer Store,Belgian Restaurant,Big Box Store,Bistro,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,...,Speakeasy,Sporting Goods Shop,Sports Bar,Stationery Store,Steakhouse,Storage Facility,Street Art,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Track,Trail,Train Station,Tram Station,Tree,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### **Examine the new dataframe size.**

In [43]:
Toronto_onehot.shape


(2727, 273)

### **Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category**

In [44]:
Toronto_grouped = Toronto_onehot.groupby('Neighborhood').mean().reset_index()
Toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,African Restaurant,American Restaurant,Animal Shelter,Antique Shop,Aquarium,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Stadium,Beach Bar,Beer Bar,Beer Store,Belgian Restaurant,Big Box Store,Bistro,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,...,Speakeasy,Sporting Goods Shop,Sports Bar,Stationery Store,Steakhouse,Storage Facility,Street Art,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Track,Trail,Train Station,Tram Station,Tree,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.01,...,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.02,0.01,0.0,0.0,0.0,...,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Brockton, Exhibition Place, Parkdale Village",0.0,0.014493,0.0,0.0,0.014493,0.0,0.0,0.0,0.0,0.014493,0.0,0.014493,0.014493,0.0,0.0,0.0,0.0,0.0,0.043478,0.014493,0.043478,0.0,0.0,0.0,0.014493,0.0,0.0,0.0,0.0,0.0,0.0,0.014493,0.0,0.0,0.014493,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014493,0.028986,0.0,0.0,0.0,0.0,0.0,0.0,0.028986,0.043478,0.0,0.0,0.014493,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.027027,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.027027,0.027027,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,...,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.01,...,0.02,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.026667,0.013333,0.013333,0.0,0.0,0.0,0.0,0.0,0.026667,0.0,0.0,0.013333,0.0,0.0,0.0,0.026667,0.0,0.0,0.013333,0.013333,0.0,...,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.02,0.0,0.01,0.01,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Chinatown, Grange Park, Kensington Market",0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.06,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.02,0.0,...,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.06,0.01,0.0,0.04,0.01,0.0,0.0,0.0
8,Christie,0.0,0.0,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.021053,0.0,0.010526,0.0,0.010526,0.0,...,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021053,0.0,0.0,0.021053,0.010526,0.0,0.0,0.0
9,Church and Wellesley,0.02,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.01,0.0,0.02,0.01,...,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.01,0.0


### **Confirm new size of the data set**

In [45]:
Toronto_grouped.shape

(39, 273)

### **Print each neighborhood along with the top 5 most common venues**

In [46]:
num_top_venues = 5

for hood in Toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = Toronto_grouped[Toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
                 venue  freq
0          Coffee Shop  0.08
1                Hotel  0.06
2           Restaurant  0.06
3                 Café  0.05
4  Japanese Restaurant  0.03


----Berczy Park----
         venue  freq
0  Coffee Shop  0.11
1        Hotel  0.06
2   Restaurant  0.06
3         Café  0.05
4         Park  0.03


----Brockton, Exhibition Place, Parkdale Village----
                venue  freq
0                Café  0.06
1  Tibetan Restaurant  0.04
2                 Bar  0.04
3              Bakery  0.04
4           Gift Shop  0.03


----Business Reply Mail Processing Centre 969 Eastern----
                  venue  freq
0            Restaurant  0.14
1  Gym / Fitness Center  0.08
2           Coffee Shop  0.05
3    Italian Restaurant  0.05
4   Sporting Goods Shop  0.05


----CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara----
                venue  freq
0  Italian Restaurant  0.08
1        

### **Put the result into a pandas dataframe**

#### Write a function to sort the venues in descending order.

In [0]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

####Create the new dataframe and display the top 10 venues for each neighborhood.

In [48]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Toronto_grouped['Neighborhood']

for ind in np.arange(Toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Hotel,Restaurant,Café,Gastropub,Seafood Restaurant,Japanese Restaurant,Asian Restaurant,Bar,Cosmetics Shop
1,Berczy Park,Coffee Shop,Restaurant,Hotel,Café,Japanese Restaurant,Park,Italian Restaurant,Plaza,Sandwich Place,Beer Bar
2,"Brockton, Exhibition Place, Parkdale Village",Café,Bakery,Bar,Tibetan Restaurant,Diner,Thrift / Vintage Store,Park,Pharmacy,Coffee Shop,Restaurant
3,Business Reply Mail Processing Centre 969 Eastern,Restaurant,Gym / Fitness Center,Italian Restaurant,Sporting Goods Shop,Coffee Shop,Japanese Restaurant,Juice Bar,Sushi Restaurant,Big Box Store,Bookstore
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",Italian Restaurant,Coffee Shop,Restaurant,Hotel,Yoga Studio,Sushi Restaurant,Beer Bar,Speakeasy,Spa,Sandwich Place


## **7. Cluster Neighborhoods**

### **Run k-means to cluster the neighborhood into 5 clusters**

In [49]:
# set number of clusters
kclusters = 5

Toronto_grouped_clustering = Toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

### **Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.**

In [0]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Toronto_merged = df2
Toronto_merged = Toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
Toronto_merged=Toronto_merged.dropna(subset=['Cluster Labels'])


In [52]:
#Toronto_merged=Toronto_merged.astype({'Cluster Labels': 'int32'})


Toronto_merged.head()

Unnamed: 0,Postcode,Borough,Neighborhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,M4E,East Toronto,The Beaches,43.6784,-79.2941,1,Pub,Coffee Shop,Caribbean Restaurant,Flower Shop,Bakery,Gastropub,Shoe Store,Trail,Sandwich Place,Furniture / Home Store
41,M4K,East Toronto,"The Danforth West, Riverdale",43.6803,-79.3538,1,Greek Restaurant,Coffee Shop,Pub,Café,Italian Restaurant,Fast Food Restaurant,Park,Grocery Store,Yoga Studio,Bakery
42,M4L,East Toronto,"The Beaches West, India Bazaar",43.67276,-79.304058,1,Coffee Shop,Pizza Place,Japanese Restaurant,BBQ Joint,Café,Pub,Tea Room,Bar,Bakery,Park
43,M4M,East Toronto,Studio District,43.6561,-79.3406,1,Café,Coffee Shop,Bar,American Restaurant,Sandwich Place,Bakery,Brewery,Gastropub,Bookstore,Discount Store
44,M4N,Central Toronto,Lawrence Park,43.7301,-79.3935,0,Pool,Photography Studio,Bus Line,Business Service,Park,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant
45,M4P,Central Toronto,Davisville North,43.7135,-79.3887,1,Park,Pizza Place,Sushi Restaurant,Food & Drink Shop,Japanese Restaurant,Deli / Bodega,Breakfast Spot,Indoor Play Area,Brewery,Sandwich Place
46,M4R,Central Toronto,North Toronto West,43.7143,-79.4065,1,Clothing Store,Sporting Goods Shop,Coffee Shop,Café,Italian Restaurant,Bakery,Restaurant,Diner,Dessert Shop,Yoga Studio
47,M4S,Central Toronto,Davisville,43.702,-79.3853,1,Dessert Shop,Coffee Shop,Gym,Italian Restaurant,Café,Trail,Sandwich Place,Pizza Place,Restaurant,Cemetery
48,M4T,Central Toronto,"Moore Park, Summerhill East",43.6899,-79.3853,1,Coffee Shop,Grocery Store,Pharmacy,Thai Restaurant,Café,Park,Gym,Cantonese Restaurant,Deli / Bodega,Sandwich Place
49,M4V,Central Toronto,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",43.6861,-79.4025,1,Coffee Shop,Italian Restaurant,Sushi Restaurant,Pharmacy,Bank,Skating Rink,Restaurant,Pub,Gym,Pizza Place


### **Visualize the resulting clusters**

In [54]:

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Toronto_merged['latitude'], Toronto_merged['longitude'], Toronto_merged['Neighborhood'],Toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#**8. Examine Clusters**

### **Cluster 1**

In [55]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 0, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
44,Central Toronto,0,Pool,Photography Studio,Bus Line,Business Service,Park,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant


### **Cluster 2**

In [56]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 1, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,East Toronto,1,Pub,Coffee Shop,Caribbean Restaurant,Flower Shop,Bakery,Gastropub,Shoe Store,Trail,Sandwich Place,Furniture / Home Store
41,East Toronto,1,Greek Restaurant,Coffee Shop,Pub,Café,Italian Restaurant,Fast Food Restaurant,Park,Grocery Store,Yoga Studio,Bakery
42,East Toronto,1,Coffee Shop,Pizza Place,Japanese Restaurant,BBQ Joint,Café,Pub,Tea Room,Bar,Bakery,Park
43,East Toronto,1,Café,Coffee Shop,Bar,American Restaurant,Sandwich Place,Bakery,Brewery,Gastropub,Bookstore,Discount Store
45,Central Toronto,1,Park,Pizza Place,Sushi Restaurant,Food & Drink Shop,Japanese Restaurant,Deli / Bodega,Breakfast Spot,Indoor Play Area,Brewery,Sandwich Place
46,Central Toronto,1,Clothing Store,Sporting Goods Shop,Coffee Shop,Café,Italian Restaurant,Bakery,Restaurant,Diner,Dessert Shop,Yoga Studio
47,Central Toronto,1,Dessert Shop,Coffee Shop,Gym,Italian Restaurant,Café,Trail,Sandwich Place,Pizza Place,Restaurant,Cemetery
48,Central Toronto,1,Coffee Shop,Grocery Store,Pharmacy,Thai Restaurant,Café,Park,Gym,Cantonese Restaurant,Deli / Bodega,Sandwich Place
49,Central Toronto,1,Coffee Shop,Italian Restaurant,Sushi Restaurant,Pharmacy,Bank,Skating Rink,Restaurant,Pub,Gym,Pizza Place
51,Downtown Toronto,1,Coffee Shop,Park,Pizza Place,Grocery Store,Restaurant,Café,Breakfast Spot,Bakery,Pub,Beer Store


### **Cluster 3**

In [57]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 2, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
63,Central Toronto,2,Bank,Playground,Pharmacy,Garden,Café,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant


### **Cluster 4**

In [58]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 3, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
64,Central Toronto,3,Gym / Fitness Center,Jewelry Store,Park,Sushi Restaurant,Trail,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant


### **Cluster 5**

In [59]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 4, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
50,Downtown Toronto,4,Park,Trail,Historic Site,Candy Store,Skating Rink,Café,Farmers Market,Flower Shop,Food Truck,Playground
