<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Toronto</font></h1>
<h2 align=center><font size = 4.5>Peer-Graded Assignment</font></h2>




In this assignment, you will be required to explore, segment, and cluster the neighborhoods in the city of Toronto. However, unlike New York, the neighborhood data is not readily available on the internet. What is interesting about the field of data science is that each project can be challenging in its unique way, so you need to learn to be agile and refine the skill to learn new libraries and tools quickly depending on the project.

For the Toronto neighborhood data, a Wikipedia page exists that has all the information we need to explore and cluster the neighborhoods in Toronto. You will be required to scrape the Wikipedia page and wrangle the data, clean it, and then read it into a pandas  dataframe so that it is in a structured format like the New York dataset.

Once the data is in a structured format, you can replicate the analysis that we did to the New York City dataset to explore and cluster the neighborhoods in the city of Toronto.

Your submission will be a link to your Jupyter Notebook on your Github repository.


## Part 1, Getting the Data

1.  Start by creating a new Notebook for this assignment.
2.  Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas  dataframe like the one shown below:


### Downloading all the dependencies we will need


In [496]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes     # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim        # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

#!conda install -c conda-forge beautifulsoup4 --yes   # Install BeautifulSoup4
from bs4 import BeautifulSoup as bs                # Import BeautifulSoup4

# Install lxml
#!conda install -c conda-forge lxml --yes
from lxml import etree

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


### Check if the website allows to legally scrape its contents

In [497]:
# get the response in the form of html
wikiurl = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
table_class = "wikitable sortable jquery-tablesorter"

response = requests.get(wikiurl)       # Sending a GET request to the Wikipedia URL whose table needs to be scraped, and storing the HTML response in a variable
print(response.status_code)            # Code 200 means it is OK to scrape the webpage, (i.e.: The page allows itself to be scraped from outside.)

200


### Now, that the website gave us OK to scrape - scrape the data you need from it

In [498]:
# Parse the data from the HTML into a beautifulsoup object
soup = bs(response.text, 'html.parser')
toronto = soup.find('table', {'class':"wikitable"})

#toronto

### Convert the HTML data from the website into a Pandas Dataframe

In [499]:
# Convert the Wikipedia table into a Python Dataframe 
Toronto = pd.read_html(str(toronto))

# convert list to dataframe, and rename the column 'Neighbourhood' to 'Neighborhood'
Toronto = pd.DataFrame(Toronto[0])
Toronto = Toronto.rename(columns = {'Neighbourhood': 'Neighborhood'})

Toronto.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


### Remove the rows with 'Not assigned' values

In [500]:
# Get names of indexes for which column Borough has value 'Not assigned'
indexNames = Toronto[Toronto['Borough'] == 'Not assigned'].index

# Delete these row indexes from dataFrame
Toronto.drop(indexNames, inplace=True)

# Reset the rows' indecies, so that they are in order, (i.e., don't have gaps in numbers)
Toronto = Toronto.reset_index(drop=True)

Toronto.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


## Part 2, Get the Latitude and the Longitude coordinates of each neighborhood

In [501]:
# Download the Latitude and Longitude file
!wget -O Geospatial_Coordinates.csv https://cocl.us/Geospatial_data


--2020-12-27 19:00:08--  https://cocl.us/Geospatial_data
Resolving cocl.us (cocl.us)... 169.63.96.194, 169.63.96.176
Connecting to cocl.us (cocl.us)|169.63.96.194|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2020-12-27 19:00:08--  https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Resolving ibm.box.com (ibm.box.com)... 107.152.29.197
Connecting to ibm.box.com (ibm.box.com)|107.152.29.197|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2020-12-27 19:00:09--  https://ibm.box.com/public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Reusing existing connection to ibm.box.com:443.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.ent.box.com/public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]


In [502]:
# View the file
geo_df = pd.read_csv('Geospatial_Coordinates.csv')

geo_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [503]:
# Merge two dataframes, Toronto and geo_df, into one, to assign the corresponding Latitude and Longitude values to the Boroughs
Toronto_LL = pd.merge(Toronto, geo_df, on='Postal Code')

Toronto_LL.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [504]:
#Check the number of the Boroughs and Neighborhoods
print('The Dataframe has {} boroughs and {} neighborhoods.'.format(
        len(Toronto_LL['Borough'].unique()),
        Toronto_LL.shape[0]
    )
)

The Dataframe has 10 boroughs and 103 neighborhoods.


## Part 3, Explore and cluster the neighborhoods in Toronto

You can decide to work with only boroughs that contain the word Toronto, and then, replicate the same analysis we did to the New York City data. It is up to you. 
Just make sure to:

   1. Add enough Markdown cells to explain what you decided to do, and to report any observations you make. 
   2. Generate maps to visualize your neighborhoods, and how they cluster together. 

In [505]:
# Use geopy library to get the latitude and longitude values of Toronto.

address = 'Toronto, ON'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Toronto are 43.6534817, -79.3839347.


### Create a map of Toronto with all the neighborhoods in the dataframe superimposed on top it

In [506]:
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighbourhood in zip(Toronto_LL['Latitude'], Toronto_LL['Longitude'], Toronto_LL['Borough'], Toronto_LL['Neighborhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Let's choose the Downtown Area for our exploration

In [507]:
toronto_dt_data = Toronto_LL[Toronto_LL['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
toronto_dt_data.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306


### Get the Geografical Loaction of Downtown Toronto

In [508]:
address = 'Downtown Toronto, ON'

geolocator = Nominatim(user_agent="toronto_dt_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Downtown Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Downtown Toronto are 43.6563221, -79.3809161.


### Visualize Downtown Toronto neighborhoods

In [509]:
# create map of Manhattan using latitude and longitude values
map_toronto_dt = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, label in zip(toronto_dt_data['Latitude'], toronto_dt_data['Longitude'], toronto_dt_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_dt)  

map_toronto_dt 

### Use Foursquare API to explore Downtown Toronoto

#### Define the Foursquare Credentials and Version

In [510]:
CLIENT_ID = 'XNOOOVT4NZM255RI0F03C10GNC3VXLATKRAVPUIBURQOTGF5' # your Foursquare ID
CLIENT_SECRET = 'NIPEULT0IYGUQ2P2G152U5GDZR1UUD1PB102XMBAF4RGY31S' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: XNOOOVT4NZM255RI0F03C10GNC3VXLATKRAVPUIBURQOTGF5
CLIENT_SECRET:NIPEULT0IYGUQ2P2G152U5GDZR1UUD1PB102XMBAF4RGY31S


### Explore a random neighborhood in the Toronoto Downtown Area

Let's pick University of Toronto, Harbord

In [511]:
toronto_dt_data.loc[11, 'Neighborhood']

'University of Toronto, Harbord'

### Get the neighborhood's geographical coordinates

In [512]:
university_latitude = toronto_dt_data.loc[11, 'Latitude'] # University of Toronto, Harbord latitude value
university_longitude = toronto_dt_data.loc[11, 'Longitude'] # University of Toronto, Harbord longitude value

neighborhood_name = toronto_dt_data.loc[11, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               university_latitude, 
                                                               university_longitude))

Latitude and longitude values of University of Toronto, Harbord are 43.6626956, -79.4000493.


### 1. Get 10 different venues around the University of Toronto within 500 meters

In [513]:
# First, create the GET request URL. Name the URL url
LIMIT = 10      # limit of number of venues returned by Foursquare API

radius = 500     # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    university_latitude, 
    university_longitude, 
    radius, 
    LIMIT)

url             

'https://api.foursquare.com/v2/venues/explore?&client_id=XNOOOVT4NZM255RI0F03C10GNC3VXLATKRAVPUIBURQOTGF5&client_secret=NIPEULT0IYGUQ2P2G152U5GDZR1UUD1PB102XMBAF4RGY31S&v=20180605&ll=43.6626956,-79.4000493&radius=500&limit=10'

## 2. Send the GET request, and examine the resutls

In [514]:
results = requests.get(url).json()
#results

### 3. Extract the Categories the venues fall under

In [515]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

### 4. Clean the json, and sput it into a pandas dataframe

In [516]:
venues = results['response']['groups'][0]['items']
    
nearby_ven = json_normalize(venues) # Flatten JSON

# Filter columns
filtered_col = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_ven = nearby_ven.loc[:, filtered_col]

# filter the category for each row
nearby_ven['venue.categories'] = nearby_ven.apply(get_category_type, axis=1)

# clean columns
nearby_ven.columns = [col.split(".")[-1] for col in nearby_ven.columns]

nearby_ven.head(10)

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Yasu,Japanese Restaurant,43.662837,-79.403217
1,Piano Piano,Italian Restaurant,43.662949,-79.402898
2,Rasa,Restaurant,43.662757,-79.403988
3,The Dessert Kitchen,Dessert Shop,43.662823,-79.402746
4,Almond Butterfly,Bakery,43.662836,-79.403365
5,Her Father's Cider Bar + Kitchen,Beer Bar,43.662448,-79.404703
6,Harbord Bakery & Calandria,Bakery,43.662519,-79.404443
7,Harbord House,Bar,43.662466,-79.40541
8,Sivananda Yoga Centre,Yoga Studio,43.662754,-79.402951
9,Akai Sushi,Sushi Restaurant,43.66247,-79.404946


### Total number of venues

In [517]:
print(' A total of {} venues were returned by Foursquare.'.format(nearby_ven.shape[0]))

 A total of 10 venues were returned by Foursquare.


### 5. Create a function to apply search to all the neighborhoods in the Downtown Toronto Area

In [518]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### 6. Run the above function on each neighborhood, and store the results in a new dataframe called toronto_dt_venues

In [519]:
toronto_dt_venues = getNearbyVenues(names=toronto_dt_data['Neighborhood'],
                                   latitudes=toronto_dt_data['Latitude'],
                                   longitudes=toronto_dt_data['Longitude']
                                  )
toronto_dt_venues = toronto_dt_venues.rename(columns = {'Neighbourhood': 'Neighborhood', 'Neighbourhood Latitude': 'Neighborhood Latitude', 
                                                        'Neighbourhood Longitude': 'Neighborhood Longitude'})


Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Stn A PO Boxes
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley


### 7. Check the size of the resulting dataframe

In [520]:
print(toronto_dt_venues.shape)
toronto_dt_venues.head()

(184, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront",43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
4,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa


### 8. Check how many venues were returned for each neighborhood

In [521]:
toronto_dt_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,10,10,10,10,10,10
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",10,10,10,10,10,10
Central Bay Street,10,10,10,10,10,10
Christie,10,10,10,10,10,10
Church and Wellesley,10,10,10,10,10,10
"Commerce Court, Victoria Hotel",10,10,10,10,10,10
"First Canadian Place, Underground city",10,10,10,10,10,10
"Garden District, Ryerson",10,10,10,10,10,10
"Harbourfront East, Union Station, Toronto Islands",10,10,10,10,10,10
"Kensington Market, Chinatown, Grange Park",10,10,10,10,10,10


### 9. How many unique categories of venues are there in the dataframe?

In [522]:
print('There are {} uniques categories.'.format(len(toronto_dt_venues['Venue Category'].unique())))

There are 77 uniques categories.


### 10. Analyze Each Neighborhood

In [523]:
# one-hot encoding
toronto_dt_onehot = pd.get_dummies(toronto_dt_venues[['Venue Category']], prefix="", prefix_sep="")

# Add neighborhood column back to dataframe
toronto_dt_onehot['Neighborhood'] = toronto_dt_venues['Neighborhood'] 

# Set the neighborhood column as the first column
fixed_columns = [toronto_dt_onehot.columns[-1]] + list(toronto_dt_onehot.columns[:-1])
toronto_dt_onehot = toronto_dt_onehot[fixed_columns]

toronto_dt_onehot.head()

Unnamed: 0,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Terminal,Arts & Crafts Store,Asian Restaurant,Bakery,Bar,Beer Bar,Bookstore,Breakfast Spot,Bubble Tea Shop,Burger Joint,Burrito Place,Butcher,Café,Candy Store,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Dance Studio,Dessert Shop,Diner,Distribution Center,Farmers Market,Food Truck,Fountain,Gastropub,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,Hotel,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Lake,Liquor Store,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Museum,Music Venue,Neighborhood,Organic Grocery,Park,Performing Arts Venue,Pizza Place,Playground,Plaza,Portuguese Restaurant,Pub,Ramen Restaurant,Rental Car Location,Restaurant,Salad Place,Skating Rink,Spa,Speakeasy,Sporting Goods Shop,Steakhouse,Sushi Restaurant,Tailor Shop,Thai Restaurant,Theater,Theme Restaurant,Trail,Vegetarian / Vegan Restaurant
0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0


### 11. What is the size of the 'onehot' dataframe?

In [524]:
toronto_dt_onehot.shape

(184, 77)

### 12. Group by neighborhood

In [525]:
toronto_grouped = toronto_dt_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Terminal,Arts & Crafts Store,Asian Restaurant,Bakery,Bar,Beer Bar,Bookstore,Breakfast Spot,Bubble Tea Shop,Burger Joint,Burrito Place,Butcher,Café,Candy Store,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Dance Studio,Dessert Shop,Diner,Distribution Center,Farmers Market,Food Truck,Fountain,Gastropub,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,Hotel,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Lake,Liquor Store,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Museum,Music Venue,Organic Grocery,Park,Performing Arts Venue,Pizza Place,Playground,Plaza,Portuguese Restaurant,Pub,Ramen Restaurant,Rental Car Location,Restaurant,Salad Place,Skating Rink,Spa,Speakeasy,Sporting Goods Shop,Steakhouse,Sushi Restaurant,Tailor Shop,Thai Restaurant,Theater,Theme Restaurant,Trail,Vegetarian / Vegan Restaurant
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1
1,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.1,0.1,0.1,0.2,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0
3,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.3,0.1,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.3,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Church and Wellesley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0
5,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.3,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"First Canadian Place, Underground city",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.3,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Garden District, Ryerson",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.1,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0
8,"Harbourfront East, Union Station, Toronto Islands",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Kensington Market, Chinatown, Grange Park",0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.3,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [526]:
# Confirm new shape
toronto_grouped.shape

(19, 77)

### 13. Print each neighborhood with top 5 venues

In [527]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
                           venue  freq
0                 Farmers Market   0.2
1  Vegetarian / Vegan Restaurant   0.1
2                         Museum   0.1
3                   Concert Hall   0.1
4                           Park   0.1


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
                 venue  freq
0     Airport Terminal   0.2
1       Airport Lounge   0.2
2      Harbor / Marina   0.1
3  Rental Car Location   0.1
4              Airport   0.1


----Central Bay Street----
                       venue  freq
0                Coffee Shop   0.4
1  Middle Eastern Restaurant   0.1
2                        Spa   0.1
3         Italian Restaurant   0.1
4                       Park   0.1


----Christie----
                venue  freq
0       Grocery Store   0.3
1                Café   0.3
2  Italian Restaurant   0.1
3         Candy Store   0.1
4          Restaurant   0.1


----Church and Wellesley---

### 14. Put the venues into a pandas dataframe

In [528]:
## Sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


## Create the new dataframe and display the top 10 venues for each neighborhood
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
hoods_venues_sorted = pd.DataFrame(columns=columns)
hoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    hoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

hoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Farmers Market,Vegetarian / Vegan Restaurant,Liquor Store,Cocktail Bar,Museum,Restaurant,Concert Hall,Park,Fountain,Dance Studio
1,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Lounge,Airport Terminal,Coffee Shop,Rental Car Location,Airport,Airport Food Court,Airport Gate,Harbor / Marina,Comic Shop,Concert Hall
2,Central Bay Street,Coffee Shop,Spa,Modern European Restaurant,Middle Eastern Restaurant,Sushi Restaurant,Italian Restaurant,Park,Cosmetics Shop,Clothing Store,Cocktail Bar
3,Christie,Café,Grocery Store,Candy Store,Coffee Shop,Italian Restaurant,Restaurant,Dance Studio,Cocktail Bar,Comic Shop,Concert Hall
4,Church and Wellesley,Beer Bar,Restaurant,Theme Restaurant,Park,Mexican Restaurant,Dance Studio,Bubble Tea Shop,Breakfast Spot,Ramen Restaurant,Bookstore


### 15. Cluster Neighborhoods

Run _k_-means to cluster the neighborhoods into 5 clusters

In [529]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 3, 0, 1, 2, 1, 1, 3, 3, 1], dtype=int32)

### 16. Create a new dataframe that includes the clusters, as well as, the top 10 venues for each neighborhood

In [530]:
# label clusters
hoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# merge toronto_grouped with toronto_dt_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_dt_data.join(hoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head(20) # check the last columns

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,2,Gym / Fitness Center,Bakery,Coffee Shop,Breakfast Spot,Restaurant,Park,Spa,Chocolate Shop,Distribution Center,Historic Site
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0,Coffee Shop,Yoga Studio,Italian Restaurant,Portuguese Restaurant,Mexican Restaurant,Beer Bar,Park,Creperie,Sushi Restaurant,Dessert Shop
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,3,Pizza Place,Comic Shop,Music Venue,Burrito Place,Plaza,Burger Joint,Clothing Store,Ramen Restaurant,Café,Theater
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Japanese Restaurant,Coffee Shop,Gastropub,Restaurant,Middle Eastern Restaurant,Cosmetics Shop,Creperie,Food Truck,Diner,Dessert Shop
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,3,Farmers Market,Vegetarian / Vegan Restaurant,Liquor Store,Cocktail Bar,Museum,Restaurant,Concert Hall,Park,Fountain,Dance Studio
5,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,0,Coffee Shop,Spa,Modern European Restaurant,Middle Eastern Restaurant,Sushi Restaurant,Italian Restaurant,Park,Cosmetics Shop,Clothing Store,Cocktail Bar
6,M6G,Downtown Toronto,Christie,43.669542,-79.422564,1,Café,Grocery Store,Candy Store,Coffee Shop,Italian Restaurant,Restaurant,Dance Studio,Cocktail Bar,Comic Shop,Concert Hall
7,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,3,Vegetarian / Vegan Restaurant,Steakhouse,Plaza,Concert Hall,Restaurant,Asian Restaurant,Speakeasy,Café,Hotel,Dessert Shop
8,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,3,Performing Arts Venue,Hotel,Plaza,Lake,Salad Place,Sporting Goods Shop,Skating Rink,Dessert Shop,Park,Dance Studio
9,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576,1,Café,Restaurant,Gym / Fitness Center,Bakery,Gym,Coffee Shop,Pub,Beer Bar,Dessert Shop,Diner


### 17. Visualize the clusters

In [531]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=14)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### 18. Examine Clusters

#### Cluster 1

In [532]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Downtown Toronto,0,Coffee Shop,Yoga Studio,Italian Restaurant,Portuguese Restaurant,Mexican Restaurant,Beer Bar,Park,Creperie,Sushi Restaurant,Dessert Shop
3,Downtown Toronto,0,Japanese Restaurant,Coffee Shop,Gastropub,Restaurant,Middle Eastern Restaurant,Cosmetics Shop,Creperie,Food Truck,Diner,Dessert Shop
5,Downtown Toronto,0,Coffee Shop,Spa,Modern European Restaurant,Middle Eastern Restaurant,Sushi Restaurant,Italian Restaurant,Park,Cosmetics Shop,Clothing Store,Cocktail Bar


#### Cluster 2

In [533]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Downtown Toronto,1,Café,Grocery Store,Candy Store,Coffee Shop,Italian Restaurant,Restaurant,Dance Studio,Cocktail Bar,Comic Shop,Concert Hall
9,Downtown Toronto,1,Café,Restaurant,Gym / Fitness Center,Bakery,Gym,Coffee Shop,Pub,Beer Bar,Dessert Shop,Diner
10,Downtown Toronto,1,Café,Bakery,Coffee Shop,Museum,Pub,Restaurant,Gym / Fitness Center,Gym,Cocktail Bar,Gastropub
12,Downtown Toronto,1,Café,Dessert Shop,Arts & Crafts Store,Organic Grocery,Mexican Restaurant,Bakery,Cocktail Bar,Farmers Market,Distribution Center,Diner
16,Downtown Toronto,1,Café,Butcher,Diner,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Bakery,Jewelry Store,Restaurant,Vegetarian / Vegan Restaurant
17,Downtown Toronto,1,Restaurant,Café,Gym / Fitness Center,Gym,Coffee Shop,Steakhouse,Bakery,Creperie,Cocktail Bar,Comic Shop


#### Cluster 3

In [534]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,2,Gym / Fitness Center,Bakery,Coffee Shop,Breakfast Spot,Restaurant,Park,Spa,Chocolate Shop,Distribution Center,Historic Site
11,Downtown Toronto,2,Bakery,Yoga Studio,Italian Restaurant,Dessert Shop,Restaurant,Beer Bar,Japanese Restaurant,Bar,Sushi Restaurant,Diner
18,Downtown Toronto,2,Beer Bar,Restaurant,Theme Restaurant,Park,Mexican Restaurant,Dance Studio,Bubble Tea Shop,Breakfast Spot,Ramen Restaurant,Bookstore


#### Cluster 4

In [535]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Downtown Toronto,3,Pizza Place,Comic Shop,Music Venue,Burrito Place,Plaza,Burger Joint,Clothing Store,Ramen Restaurant,Café,Theater
4,Downtown Toronto,3,Farmers Market,Vegetarian / Vegan Restaurant,Liquor Store,Cocktail Bar,Museum,Restaurant,Concert Hall,Park,Fountain,Dance Studio
7,Downtown Toronto,3,Vegetarian / Vegan Restaurant,Steakhouse,Plaza,Concert Hall,Restaurant,Asian Restaurant,Speakeasy,Café,Hotel,Dessert Shop
8,Downtown Toronto,3,Performing Arts Venue,Hotel,Plaza,Lake,Salad Place,Sporting Goods Shop,Skating Rink,Dessert Shop,Park,Dance Studio
13,Downtown Toronto,3,Airport Lounge,Airport Terminal,Coffee Shop,Rental Car Location,Airport,Airport Food Court,Airport Gate,Harbor / Marina,Comic Shop,Concert Hall
15,Downtown Toronto,3,Vegetarian / Vegan Restaurant,Restaurant,Food Truck,Concert Hall,Cocktail Bar,Museum,Park,Fountain,Thai Restaurant,Tailor Shop


#### Cluster 5

In [536]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Downtown Toronto,4,Park,Playground,Trail,Airport Food Court,Airport Gate,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Concert Hall


# Lab Complete