<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Toronto</font></h1>

## Introduction

In this lab, I scrape postal codes of Toronto from Wikipedia. After geocoding the data I use the Foursquare API to explore neighborhoods in Toronto. I use the **explore** function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. I  use the *k*-means clustering algorithm to complete this task. Finally, I use the Folium library to visualize the neighborhoods in Toronto and their emerging clusters.

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Download and Transform Toronto Postal Codes from Wikipedia</a>


Let's download all the dependencies that we will need.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


<a id='item1'></a>

## 1. Download and Transform Toronto Postal Codes from Wikipedia

#### Scrape

Making use of the pandas read_html method:https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.read_html.html.
The first table found and returned is the table of postal codes.

In [84]:
try:
    toronto_post_codes = pd.read_html('http://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
except:
    print("Pandas read html failed!")

In [85]:
toronto_post_codes.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


In [86]:
toronto_post_codes.shape

(288, 3)

#### Transform
Dropping postal codes with unassigned boroughs - (77)

Renaming column Postcode to Postal Code, Neighbourhood to Neighborhood

Renaming cases of unassigned neighbourhoods but assigned boroughs (Queen's Park)

In [87]:
toronto_post_codes = toronto_post_codes[toronto_post_codes['Borough'] != 'Not assigned']
toronto_post_codes.columns = ['PostalCode', 'Borough', 'Neighborhood']
toronto_post_codes['Neighborhood'] = toronto_post_codes.apply(lambda row: row['Borough'] if row['Neighborhood']=='Not assigned' else row['Neighborhood'], axis = 1)

In [88]:
toronto_post_codes.shape

(211, 3)

In [89]:
toronto_post_codes.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Queen's Park
10,M9A,Etobicoke,Islington Avenue
11,M1B,Scarborough,Rouge
12,M1B,Scarborough,Malvern


Combining in the same row instances of the same postal code and borough, with neighborhoods as a list (and resetting index)
#### postal_codes is the desired dataframe

In [90]:
postal_codes = toronto_post_codes.groupby(['PostalCode', 'Borough'])['Neighborhood'].apply(list).reset_index()

In [91]:
postal_codes.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"[Rouge, Malvern]"
1,M1C,Scarborough,"[Highland Creek, Rouge Hill, Port Union]"
2,M1E,Scarborough,"[Guildwood, Morningside, West Hill]"
3,M1G,Scarborough,[Woburn]
4,M1H,Scarborough,[Cedarbrae]
5,M1J,Scarborough,[Scarborough Village]
6,M1K,Scarborough,"[East Birchmount Park, Ionview, Kennedy Park]"
7,M1L,Scarborough,"[Clairlea, Golden Mile, Oakridge]"
8,M1M,Scarborough,"[Cliffcrest, Cliffside, Scarborough Village West]"
9,M1N,Scarborough,"[Birch Cliff, Cliffside West]"


In [92]:
postal_codes.shape

(103, 3)