# Segmenting and Clustering Neighborhoods in Toronto

## Introduction

In this lab, we'll explore neighborhoods in Toronto Canada by using foursquare API and segment those data about most common venues into different cluster by K-mean clustering. 
The result will be visuallized by Folium then. 

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Download and Explore Dataset</a>

2. <a href="#item2">Explore Neighborhoods in Toronto</a>

3. <a href="#item3">Analyze Each Neighborhood</a>

4. <a href="#item4">Cluster Neighborhoods</a>

5. <a href="#item5">Examine Clusters</a>    
</font>
</div>

## 1. Download and Explore Dataset

Scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe 

In [1]:
#Download data
!wget https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M -O canada_postal_code.xml

--2019-05-20 05:53:33--  https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
Resolving en.wikipedia.org (en.wikipedia.org)... 103.102.166.224
Connecting to en.wikipedia.org (en.wikipedia.org)|103.102.166.224|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 79026 (77K) [text/html]
Saving to: ‘canada_postal_code.xml’


2019-05-20 05:53:34 (552 KB/s) - ‘canada_postal_code.xml’ saved [79026/79026]



In [2]:
#Install and import BeautifulSoup library to parse XML data above
!pip install bs4



In [3]:
#import needed lib
from bs4 import BeautifulSoup
import pandas as pd
import requests

In [4]:
#Load and process xml data 
with open('canada_postal_code.xml') as f:
    soup=BeautifulSoup(f,'html.parser')

In [5]:
#Extract data from postal table
L=[]
for i in range(1,len(soup.table.find_all('tr'))):
    L.append([ x.rstrip('\n') for x in soup.table.find_all('tr')[i].strings if x.rstrip('\n') != '' ])
print("Let's see first 5 values")
L[:5]

Let's see first 5 values


[['M1A', 'Not assigned', 'Not assigned'],
 ['M2A', 'Not assigned', 'Not assigned'],
 ['M3A', 'North York', 'Parkwoods'],
 ['M4A', 'North York', 'Victoria Village'],
 ['M5A', 'Downtown Toronto', 'Harbourfront']]

#### Create a dataframe from data above that consist of three columns: PostalCode, Borough, and Neighborhood

In [6]:
#Import our lists to a dataframe
columns=['PostCode','Borough','Neighborhood']
df=pd.DataFrame(L,columns=columns)
df.head()

Unnamed: 0,PostCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


#### Ignore cells with a borough that is Not assigned

In [7]:
#Drop rows with Borough == 'Not assigned'
df=df[df['Borough'] != 'Not assigned']
df.head()

Unnamed: 0,PostCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


#### Group neighborhoods by PostCode

In [8]:
df=df.groupby(['PostCode','Borough'],as_index=True)['Neighborhood'].apply(', '.join).reset_index()
df.head()

Unnamed: 0,PostCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


#### Replace "Not assigned" Neighborhood by Borough name

In [12]:
df.Neighborhood[df.Neighborhood == 'Not assigned'] = df.Borough

#Verify result
try:
    df.set_index('Neighborhood').loc["Not assigned"]
except KeyError:
    print("There is no Not assigned neighborhood anymore\n")
    
print("The value of Neighboorhood for Queen's park borough now is: %s" % df.set_index('Borough').loc["Queen's Park"].Neighborhood)

There is no Not assigned neighborhood anymore

The value of Neighboorhood for Queen's park borough now is: Queen's Park


#### Our dataframe shape

In [14]:
df.shape

(103, 3)

## 2. Explore Neighborhoods in Toronto

## 3. Analyze Each Neighborhood

## 4. Cluster Neighborhoods

## 5. Examine clusters 