<h1>Clustering Toronto's Neighborhoods  

*IBM Data Science Course 9 Week 3 Project*

<h2>PART 1

Directions: Scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, (to obtain the data in the table of postal codes) and transform the data into a pandas dataframe.

In [90]:
#Importing the 'requests' library and designating which URL Toronto's neighborhoods will be scraped from.
import requests
URL = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

#Importing the 'BeautifulSoup' library
from bs4 import BeautifulSoup

#Designating webpage's html code as 'soup'
soup = BeautifulSoup(URL,'lxml')

#prints HTML code. prettify() tag adds indents to code for readability
#print(soup.prettify()) This is commented out to reoduce exessively long printout

In [91]:
#Calls upon specific table in the webpage
TNTable = soup.find('table',{'class':'wikitable sortable'})

longlist = []
for entries in TNTable.findAll('tr'):
    columns = entries.findAll('td')
    list = []
    for column in columns:
        list.append(column.text)
    longlist.append(list)
#print(longlist)


In [93]:
#Import pandas library
import pandas as pd

# Creates dataframe ‘df’ with new column labels "PostalCode", "Borough", and "Neighborhood" instead of "0", "1", and "2".
df = pd.DataFrame(longlist) 
df.columns = ['Postal Code', 'Borough', 'Neighborhood']

#resets indexes
df.reset_index()

#Deletes "None" (blank) entries from first line of dataframe
df = df.drop(0)

#Gets names of indexes in column "Borough" that have value "Not assigned"
indexNames = df[ df['Borough'] == 'Not assigned' ].index
 
#Deletes these row indexes from dataFrame
df.drop(indexNames , inplace=True)

#Removes string “\n” from entries in Neighborhood column
df.Neighborhood = [x.strip('\n') for x in df.Neighborhood]

#Merge all neighborhoods together that have similar Postal Codes and Boroughs
#df_neigh = df.groupby(['Postal Code','Borough'])['Neighborhood'].apply(', '.join).reset_index()
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront
6,M5A,Downtown Toronto,Regent Park
7,M6A,North York,Lawrence Heights


<H2>Part 2

Directions: Use the Geocoder package or the .csv file found at http://cocl.us/Geospatial_data to create a dataframe containing Postal Code, Borough, Neighborhood, Latitude, and Longitude.

In [94]:
#extracting longitude adn latitude data from specified file
df_lat_long=pd.read_csv('http://cocl.us/Geospatial_data')

#Merge Toronto's Neighborhood dataframe with Lat-Long dataframe.
df_all = pd.merge(df, df_lat_long)
df_all.head()


Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
3,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636
4,M6A,North York,Lawrence Heights,43.718518,-79.464763


## PART 3

#### Step 1: Import all necessary libraries.

In [50]:
import numpy as np # library to handle data in a vectorized manner

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    altair-2.2.2               |           py35_1         462 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         560 KB

The following NEW packages will be INSTALLED:

    altair:  2.2.2-py35_1 conda-forge
    branca:  0.3.1-py_0   conda-forge
    folium:  0.5.0-py_0   conda-forge
    vincent: 0.4.4-py_1   conda-

#### Step 2: Define an instance of the geocoder - Toronto, Canada.

In [56]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="canada_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Toronto are 43.653963, -79.387207.


#### Step 3: Create a map of Toronto with neighborhoods superimposed on top.

In [52]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_all['Latitude'], df_all['Longitude'], df_all['Borough'], df_all['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### Step 4: Take a closer look at a particular borough in the dataframe.
Splice original dataframe to take a closer look at the Scarborough borough and its longitudes and latitudes.

In [95]:
scarborough_data = df_all[df_all['Borough'] == 'Scarborough'].reset_index(drop=True)
scarborough_data.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,Rouge,43.806686,-79.194353
1,M1B,Scarborough,Malvern,43.806686,-79.194353
2,M1C,Scarborough,Highland Creek,43.784535,-79.160497
3,M1C,Scarborough,Rouge Hill,43.784535,-79.160497
4,M1C,Scarborough,Port Union,43.784535,-79.160497


In [96]:
#Get geographical location of Scarborough
address = 'Scarborough, Canada'

geolocator = Nominatim(user_agent="canada_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinates of Scarborough are {}, {}.'.format(latitude, longitude))

The geographical coordinates of Scarborough are 43.773077, -79.257774.


#### Step 5: Visualize neighboorhoods in Scarborough.

In [97]:
#create map of Scarborough using latitude and longitude values
map_scarborough = folium.Map(location=[latitude, longitude], zoom_start=11)

#add markers to map
for lat, lng, label in zip(scarborough_data['Latitude'], scarborough_data['Longitude'], scarborough_data['Neighborhood']):
   label = folium.Popup(label, parse_html=True)
   folium.CircleMarker(
       [lat, lng],
       radius=5,
       popup=label,
       color='blue',
       fill=True,
       fill_color='#3186cc',
       fill_opacity=0.7,
       parse_html=False).add_to(map_scarborough) 

map_scarborough

#### Step 6: Define Foursquare Credentials and Version

In [63]:
CLIENT_ID = '40MJ1SO4QAKBDDYWERBA1J0UQX1F0VIKG5C10LKDEEGN42OD' # your Foursquare ID
CLIENT_SECRET = 'T24RCSCEXRFNRI2FOH5YABRE5WB0H0QTAKWCS5RSQ1BNWFQC' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentials:
CLIENT_ID: 40MJ1SO4QAKBDDYWERBA1J0UQX1F0VIKG5C10LKDEEGN42OD
CLIENT_SECRET:T24RCSCEXRFNRI2FOH5YABRE5WB0H0QTAKWCS5RSQ1BNWFQC


#### Step 7: Evaluate venues in Scarborough
Explore first neighborhood in Scarboroghood dataframe

In [98]:
scarborough_data.loc[0, 'Neighborhood']

'Rouge'

In [99]:
#Define borough
scarborough_data.loc[0, 'Neighborhood']

neighborhood_latitude = scarborough_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = scarborough_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = scarborough_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))


Latitude and longitude values of Rouge are 43.806686299999996, -79.19435340000001.


Get the top 50 venues in Rough within a radius of 750 meters.

In [102]:
LIMIT = 50
radius = 750
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
#url 

results = requests.get(url).json()

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues


Unnamed: 0,name,categories,lat,lng
0,Images Salon & Spa,Spa,43.802283,-79.198565
1,Wendy's,Fast Food Restaurant,43.802008,-79.19808
2,Wendy's,Fast Food Restaurant,43.807448,-79.199056
3,Staples Morningside,Paper / Office Supplies Store,43.800285,-79.196607
4,Tim Hortons,Coffee Shop,43.802,-79.198169
5,Lee Valley,Hobby Shop,43.803161,-79.199681
6,FASTSIGNS,Business Service,43.807882,-79.201968
7,Bus Stop: 85 & 116,Bus Station,43.802198,-79.199389
8,Tim Hortons / Esso,Coffee Shop,43.80166,-79.199133
9,Mr Jerk,African Restaurant,43.801262,-79.199758


#### Step 8: Explore all neighborhoods in Scarborough

In [114]:
# Create a function to repeat the same process to all the neighborhoods in Scarborough.
def getNearbyVenues(names, latitudes, longitudes, radius=750):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [115]:
#List neighborhoods in Scarborough
scarborough_venues = getNearbyVenues(names=scarborough_data['Neighborhood'],
                                   latitudes=scarborough_data['Latitude'],
                                   longitudes=scarborough_data['Longitude']
                                  )

Rouge
Malvern
Highland Creek
Rouge Hill
Port Union
Guildwood
Morningside
West Hill
Woburn
Cedarbrae
Scarborough Village
East Birchmount Park
Ionview
Kennedy Park
Clairlea
Golden Mile
Oakridge
Cliffcrest
Cliffside
Scarborough Village West
Birch Cliff
Cliffside West
Dorset Park
Scarborough Town Centre
Wexford Heights
Maryvale
Wexford
Agincourt
Clarks Corners
Sullivan
Tam O'Shanter
Agincourt North
L'Amoreaux East
Milliken
Steeles East
L'Amoreaux West
Upper Rouge


In [116]:
#Check the size of the resulting dataframe
print(scarborough_venues.shape)
scarborough_venues.head()

(485, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Rouge,43.806686,-79.194353,Images Salon & Spa,43.802283,-79.198565,Spa
1,Rouge,43.806686,-79.194353,Wendy's,43.802008,-79.19808,Fast Food Restaurant
2,Rouge,43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
3,Rouge,43.806686,-79.194353,Staples Morningside,43.800285,-79.196607,Paper / Office Supplies Store
4,Rouge,43.806686,-79.194353,Tim Hortons,43.802,-79.198169,Coffee Shop


In [117]:
#Check how many venues were returned for each borough
scarborough_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,13,13,13,13,13,13
Agincourt North,19,19,19,19,19,19
Birch Cliff,8,8,8,8,8,8
Cedarbrae,22,22,22,22,22,22
Clairlea,17,17,17,17,17,17
Clarks Corners,20,20,20,20,20,20
Cliffcrest,4,4,4,4,4,4
Cliffside,4,4,4,4,4,4
Cliffside West,8,8,8,8,8,8
Dorset Park,15,15,15,15,15,15


In [120]:
print('There are {} uniques categories.'.format(len(scarborough_venues['Venue Category'].unique())))

There are 81 uniques categories.


#### Step 9: Analyze each neighborhood.

In [121]:
# one hot encoding
scarborough_onehot = pd.get_dummies(scarborough_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
scarborough_onehot['Neighborhood'] = scarborough_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [scarborough_onehot.columns[-1]] + list(scarborough_onehot.columns[:-1])
scarborough_onehot = scarborough_onehot[fixed_columns]

scarborough_onehot.head()

TypeError: 'list' object is not callable

In [122]:
scarborough_onehot.shape

(485, 82)

In [124]:
#Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
scarborough_grouped = scarborough_onehot.groupby('Neighborhood').mean().reset_index()
scarborough_grouped

Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Asian Restaurant,Athletics & Sports,Auto Garage,BBQ Joint,Badminton Court,Bakery,Bank,Bar,Beer Store,Breakfast Spot,Burger Joint,Bus Line,Bus Station,Bus Stop,Business Service,Café,Caribbean Restaurant,Chinese Restaurant,Coffee Shop,College Stadium,Convenience Store,Department Store,Diner,Discount Store,Electronics Store,Fast Food Restaurant,Fish Market,Flower Shop,Fried Chicken Joint,Furniture / Home Store,Gas Station,General Entertainment,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hakka Restaurant,Hobby Shop,Hockey Arena,Indian Restaurant,Intersection,Italian Restaurant,Korean Restaurant,Latin American Restaurant,Light Rail Station,Lounge,Malay Restaurant,Mediterranean Restaurant,Metro Station,Middle Eastern Restaurant,Motorcycle Shop,Music Store,Noodle House,Other Great Outdoors,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Pool Hall,Rental Car Location,Restaurant,Sandwich Place,Seafood Restaurant,Shanghai Restaurant,Shop & Service,Shopping Mall,Skating Rink,Soccer Field,Spa,Sports Bar,Supermarket,Sushi Restaurant,Thai Restaurant,Thrift / Vintage Store,Vietnamese Restaurant,Wings Joint,Yoga Studio
0,Agincourt,0.0,0.076923,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.076923,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.076923,0.076923,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.076923,0.0,0.0,0.0,0.0,0.0
1,Agincourt North,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.105263,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.052632,0.0,0.052632,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Birch Cliff,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.0,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0
3,Cedarbrae,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.090909,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455
4,Clairlea,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.0,0.0,0.0,0.117647,0.058824,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.058824,0.0,0.117647,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.176471,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Clarks Corners,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.15,0.1,0.0,0.0,0.0,0.05,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0
6,Cliffcrest,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0
7,Cliffside,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0
8,Cliffside West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.0,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0
9,Dorset Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.0,0.0,0.0,0.0,0.0,0.133333,0.066667,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.0


In [125]:
#Confirm size
scarborough_grouped.shape

(36, 82)

In [128]:
#Print tope 5 venue for each Scarborough neighborhood
num_top_venues = 5

for hood in scarborough_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = scarborough_grouped[scarborough_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                 venue  freq
0            Pool Hall  0.08
1               Lounge  0.08
2       Sandwich Place  0.08
3   Seafood Restaurant  0.08
4  Shanghai Restaurant  0.08


----Agincourt North----
                  venue  freq
0             BBQ Joint  0.11
1  Fast Food Restaurant  0.11
2           Pizza Place  0.11
3    Chinese Restaurant  0.11
4            Hobby Shop  0.05


----Birch Cliff----
                   venue  freq
0        College Stadium  0.12
1                   Bank  0.12
2                   Café  0.12
3  General Entertainment  0.12
4           Skating Rink  0.12


----Cedarbrae----
                  venue  freq
0     Indian Restaurant  0.09
1                Bakery  0.09
2           Coffee Shop  0.09
3           Yoga Studio  0.05
4  Caribbean Restaurant  0.05


----Clairlea----
          venue  freq
0  Intersection  0.18
1         Diner  0.12
2   Coffee Shop  0.12
3        Bakery  0.12
4      Bus Line  0.12


----Clarks Corners----
                ve

#### Step 10: Create new dataframe and display the top 10 venues for each neighborhood in descending order.

In [129]:
#Descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#Create new dataframe and display the top 10 venues for each neighborhood
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = scarborough_grouped['Neighborhood']

for ind in np.arange(scarborough_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(scarborough_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Shanghai Restaurant,Breakfast Spot,Sandwich Place,Seafood Restaurant,Motorcycle Shop,Discount Store,Badminton Court,Lounge,Supermarket,Sushi Restaurant
1,Agincourt North,Pizza Place,Fast Food Restaurant,BBQ Joint,Chinese Restaurant,Hobby Shop,Fried Chicken Joint,Korean Restaurant,Malay Restaurant,Coffee Shop,Noodle House
2,Birch Cliff,Café,General Entertainment,College Stadium,Diner,Discount Store,Skating Rink,Bank,Thai Restaurant,Gas Station,Furniture / Home Store
3,Cedarbrae,Indian Restaurant,Bakery,Coffee Shop,Yoga Studio,Burger Joint,Gym / Fitness Center,Hakka Restaurant,Fried Chicken Joint,Flower Shop,Lounge
4,Clairlea,Intersection,Coffee Shop,Diner,Bakery,Bus Line,Park,Convenience Store,Soccer Field,Metro Station,Bus Station


#### Step 11: Cluster neighborhoods

In [131]:
# Run k-means to cluster the neighborhood into k = 6 clusters

kclusters = 6

scarborough_grouped_clustering = scarborough_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(scarborough_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 4, 5, 4, 1, 4, 3, 3, 5, 4], dtype=int32)

#### Step 12: Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [132]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

scarborough_merged = scarborough_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
scarborough_merged = scarborough_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

scarborough_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,Rouge,43.806686,-79.194353,4.0,Coffee Shop,Fast Food Restaurant,Hobby Shop,Spa,Bus Station,Business Service,Paper / Office Supplies Store,African Restaurant,Sports Bar,Greek Restaurant
1,M1B,Scarborough,Malvern,43.806686,-79.194353,4.0,Coffee Shop,Fast Food Restaurant,Hobby Shop,Spa,Bus Station,Business Service,Paper / Office Supplies Store,African Restaurant,Sports Bar,Greek Restaurant
2,M1C,Scarborough,Highland Creek,43.784535,-79.160497,0.0,Breakfast Spot,Italian Restaurant,Bar,Burger Joint,Yoga Studio,Department Store,Diner,Discount Store,Electronics Store,Fast Food Restaurant
3,M1C,Scarborough,Rouge Hill,43.784535,-79.160497,0.0,Breakfast Spot,Italian Restaurant,Bar,Burger Joint,Yoga Studio,Department Store,Diner,Discount Store,Electronics Store,Fast Food Restaurant
4,M1C,Scarborough,Port Union,43.784535,-79.160497,0.0,Breakfast Spot,Italian Restaurant,Bar,Burger Joint,Yoga Studio,Department Store,Diner,Discount Store,Electronics Store,Fast Food Restaurant


#### Step 13: Visualize clusters

In [133]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(scarborough_merged['Latitude'], scarborough_merged['Longitude'], scarborough_merged['Neighborhood'], scarborough_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

TypeError: list indices must be integers or slices, not float

#### Step 14: Examine Clusters

In [135]:
#Cluster 1
scarborough_merged.loc[scarborough_merged['Cluster Labels'] == 0, scarborough_merged.columns[[1] + list(range(5, scarborough_merged.shape[1]))]]

TypeError: 'list' object is not callable