<a href="https://cognitiveclass.ai"><img src = "https://ibm.box.com/shared/static/9gegpsmnsoo25ikkbl4qzlvlyjbgxs5x.png" width = 400> </a>

<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Toronto City</font></h1>

## Introduction

In this lab, you will learn how to convert addresses into their equivalent latitude and longitude values. Also, you will use the Foursquare API to explore neighborhoods in New York City. You will use the **explore** function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. You will use the *k*-means clustering algorithm to complete this task. Finally, you will use the Folium library to visualize the neighborhoods in New York City and their emerging clusters.

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Download and Explore Dataset</a>

2. <a href="#item2">Explore Neighborhoods in Toronto City</a>

3. <a href="#item3">Analyze Each Neighborhood</a>

4. <a href="#item4">Cluster Neighborhoods</a>

5. <a href="#item5">Examine Clusters</a>    
</font>
</div>

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.8.2
  latest version: 4.8.3

Please update conda by running

    $ conda update -n base conda



## Package Plan ##

  environment location: /opt/conda

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    certifi-2020.4.5.1         |   py37hc8dfbb8_0         151 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.21.0               |             py_0          58 KB  conda-forge
    openssl-1.1.1f             |       h516909a_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be I

# Import url and html parser libraries

In [59]:
import urllib.request
!pip install beautifulsoup4
from bs4 import BeautifulSoup
!pip install lxml

Collecting lxml
  Downloading lxml-4.5.0-cp37-cp37m-manylinux1_x86_64.whl (5.7 MB)
[K     |████████████████████████████████| 5.7 MB 5.3 MB/s eta 0:00:01
[?25hInstalling collected packages: lxml
Successfully installed lxml-4.5.0


In [None]:
# Screen scap web page and displaying the content

In [143]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
page = urllib.request.urlopen(url)

soup = BeautifulSoup(page, features="html")
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of postal codes of Canada: M - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"201e9f5e-99d5-4e51-a21f-c0098a1cdb3f","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":949497198,"wgRevisionId":949497198,"wgArticleId":539066,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with short description","Communications in Ontario","Postal codes in Canada","Toron

# Locating required HTML object, parsing and processing data and storing into collection

In [277]:
right_table=soup.find('table', class_='wikitable')
right_table
A=[]
B=[]
C=[]


for row in right_table.findAll('tr'):
    cells=row.findAll('td')
    print(cells)
    if len(cells)==3:
        pc = cells[0].find(text=True)
        A.append(pc.strip('\n'))
        pc = cells[1].find(text=True)
        B.append(pc.strip('\n'))
        pc = cells[2].find(text=True).replace("/",",")     
        C.append(pc.strip('\n'))



[]
[<td>M1A
</td>, <td>Not assigned
</td>, <td>
</td>]
[<td>M2A
</td>, <td>Not assigned
</td>, <td>
</td>]
[<td>M3A
</td>, <td>North York
</td>, <td>Parkwoods
</td>]
[<td>M4A
</td>, <td>North York
</td>, <td>Victoria Village
</td>]
[<td>M5A
</td>, <td>Downtown Toronto
</td>, <td>Regent Park / Harbourfront
</td>]
[<td>M6A
</td>, <td>North York
</td>, <td>Lawrence Manor / Lawrence Heights
</td>]
[<td>M7A
</td>, <td>Downtown Toronto
</td>, <td>Queen's Park / Ontario Provincial Government
</td>]
[<td>M8A
</td>, <td>Not assigned
</td>, <td>
</td>]
[<td>M9A
</td>, <td>Etobicoke
</td>, <td>Islington Avenue
</td>]
[<td>M1B
</td>, <td>Scarborough
</td>, <td>Malvern / Rouge
</td>]
[<td>M2B
</td>, <td>Not assigned
</td>, <td>
</td>]
[<td>M3B
</td>, <td>North York
</td>, <td>Don Mills
</td>]
[<td>M4B
</td>, <td>East York
</td>, <td>Parkview Hill / Woodbine Gardens
</td>]
[<td>M5B
</td>, <td>Downtown Toronto
</td>, <td>Garden District, Ryerson
</td>]
[<td>M6B
</td>, <td>North York
</td>, <td>Glenca

# Preparing dataframe from collections

In [278]:
df=pd.DataFrame(A,columns=['Postal code'])
df['Borough']=B
df['Neighborhood']=C

print(df)


    Postal code           Borough  \
0           M1A      Not assigned   
1           M2A      Not assigned   
2           M3A        North York   
3           M4A        North York   
4           M5A  Downtown Toronto   
5           M6A        North York   
6           M7A  Downtown Toronto   
7           M8A      Not assigned   
8           M9A         Etobicoke   
9           M1B       Scarborough   
10          M2B      Not assigned   
11          M3B        North York   
12          M4B         East York   
13          M5B  Downtown Toronto   
14          M6B        North York   
15          M7B      Not assigned   
16          M8B      Not assigned   
17          M9B         Etobicoke   
18          M1C       Scarborough   
19          M2C      Not assigned   
20          M3C        North York   
21          M4C         East York   
22          M5C  Downtown Toronto   
23          M6C              York   
24          M7C      Not assigned   
25          M8C      Not assigned   
2

# Removing "Not assigned" Borough from dataframe

In [279]:
df1 = df[df.Borough != 'Not assigned']

# Merged rows by grouped Postal code and Borough

In [280]:
df_grouped = df1.groupby(['Postal code', 'Borough'])['Neighborhood'].apply(', '.join).reset_index()

df_grouped.shape

(103, 3)

# Notebook Github Url

https://github.com/subahanma/Coursera_Capstone/blob/master/Neighborhoods-Toronto.ipynb

In [146]:
!pip install geocoder
import geocoder

Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[K     |████████████████████████████████| 98 kB 6.8 MB/s eta 0:00:011
Collecting future
  Downloading future-0.18.2.tar.gz (829 kB)
[K     |████████████████████████████████| 829 kB 21.6 MB/s eta 0:00:01
[?25hCollecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Building wheels for collected packages: future
  Building wheel for future (setup.py) ... [?25ldone
[?25h  Created wheel for future: filename=future-0.18.2-py3-none-any.whl size=491058 sha256=057818227059a62c48025c6dc71b42fe57cf2824cb4bd67f6f2150376ffc86b8
  Stored in directory: /home/jovyan/.cache/pip/wheels/56/b0/fe/4410d17b32f1f0c3cf54cdfb2bc04d7b4b8f4ae377e2229ba0
Successfully built future
Installing collected packages: future, ratelim, geocoder
Successfully installed future-0.18.2 geocoder-1.38.1 ratelim-0.1.6


# Geocoder API did not respond and hence use another method

# Downloaded Canada postal codde and its lattitude and longtitude from http://download.geonames.org/export/zip/ and converted as CSV an uploaded

# Dataframe created from CA.csv

In [299]:
canadata_postal_df = pd.read_csv("./CA.csv")


canadata_postal_df  = canadata_postal_df.drop(columns=['Country', 'Neighbourhood', 'Unnamed: 3', 'Admin1', 'Admin2', 'Admin3', 'Admin4', 'Admin5', 'Admin6'], axis=1)

canadata_postal_df.head()

Unnamed: 0,PostalCode,Lattitude,Longitude
0,T0A,54.766,-111.7174
1,T0B,53.0727,-111.5816
2,T0C,52.1431,-111.6941
3,T0E,53.6758,-115.0948
4,T0G,55.6993,-114.4529


In [298]:
df_grouped.rename(columns={"Postal code":"PostalCode"}, inplace=True)
df_grouped.head()
canda_pc_lat_lng_df = pd.merge(df_grouped, canadata_postal_df, on="PostalCode")
canda_pc_lat_lng_df
canda_pc_lat_lng_df.shape

(102, 5)

# updated source code github repo

https://github.com/subahanma/Coursera_Capstone/blob/master/Neighborhoods-Toronto.ipynb