<b>Segmenting and Clustering Neighborhoods in Toronto</b> - Step 1

In [2]:
import pandas as pd
!conda install -c conda lxml
!pip install et_xmlfile
!pip install bs4
!pip install html5lib
!pip install lxml
from lxml import etree
import requests
from bs4 import BeautifulSoup

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - lxml


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2020.4.5.1         |           py36_0         155 KB
    libxslt-1.1.33             |       h7d1a2b0_0         426 KB
    lxml-4.5.0                 |   py36hefd8a0e_0         1.4 MB
    ------------------------------------------------------------
                                           Total:         2.0 MB

The following NEW packages will be INSTALLED:

  libxslt            pkgs/main/linux-64::libxslt-1.1.33-h7d1a2b0_0
  lxml               pkgs/main/linux-64::lxml-4.5.0-py36hefd8a0e_0

The following packages will be SUPERSEDED by a higher-priority channel:

  certifi            conda-forge::certifi-2020.4.5.1-py36h~ --> pkgs/mai

<b>Import the table in the Wikipedia website and use it with Pandas library

In [4]:
r = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(r.content,features='lxml')
table = soup.find_all('table')[0]
df = pd.read_html(str(table))
df_toronto=pd.DataFrame(df[0])

<b>Prepare the data

In [5]:
df_toronto.drop( df_toronto[ df_toronto['Borough'] == 'Not assigned' ].index , inplace=True)
df_toronto.loc[df_toronto['Neighborhood'] == "Not assigned"]
df_toronto.reset_index(drop=True, inplace=True)

In [6]:
df_toronto.head()

Unnamed: 0,Postal code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park / Harbourfront
3,M6A,North York,Lawrence Manor / Lawrence Heights
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government


In [7]:
df_toronto.shape

(103, 3)

<b>Add longitude and latitude for the Postal code</b> - Step 2

In [9]:
filename = "http://cocl.us/Geospatial_data"
df_ll = pd.read_csv(filename)

In [10]:
df_toronto_final = pd.merge(df_toronto, df_ll, left_on='Postal code', right_on='Postal Code')
df_toronto_final.head()

Unnamed: 0,Postal code,Borough,Neighborhood,Postal Code,Latitude,Longitude
0,M3A,North York,Parkwoods,M3A,43.753259,-79.329656
1,M4A,North York,Victoria Village,M4A,43.725882,-79.315572
2,M5A,Downtown Toronto,Regent Park / Harbourfront,M5A,43.65426,-79.360636
3,M6A,North York,Lawrence Manor / Lawrence Heights,M6A,43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,M7A,43.662301,-79.389494


<b>Explore and cluster the neighborhoods in Toronto</b> - Step 3

Let's use only boroughs that contain the word Toronto

In [11]:
df_toronto_final.query('Borough.str.contains("Toronto")', engine='python',inplace=True)
df_toronto_final.head()

Unnamed: 0,Postal code,Borough,Neighborhood,Postal Code,Latitude,Longitude
2,M5A,Downtown Toronto,Regent Park / Harbourfront,M5A,43.65426,-79.360636
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,M7A,43.662301,-79.389494
9,M5B,Downtown Toronto,"Garden District, Ryerson",M5B,43.657162,-79.378937
15,M5C,Downtown Toronto,St. James Town,M5C,43.651494,-79.375418
19,M4E,East Toronto,The Beaches,M4E,43.676357,-79.293031


In [12]:
#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

#!conda install -c conda-forge folium=0.5.0 --yes
import folium

In [13]:
address = 'Toronto'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [15]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto_final['Latitude'], df_toronto_final['Longitude'], df_toronto_final['Borough'], df_toronto_final['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto