#  Exploring and Clustering of Neighborhoods in Toronto

#### For this exercise the first step would be to webscrape the postal codes and the corresponding boroughs and neighborhoods.

We will then load this webscaped data in a tabular form into a *pandas* dataframe.

### Installing the required libraries

We will use BeatifulSoup for webscraping

Installing the libraries for *BeautifulSoup* and *requests*

In [1]:
# we will use BeatifulSoup for webscraping
# installing the libraries for BeautifulSoup and requests

!pip install bs4
!pip install requests

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes


### Importing the necessary libraries

In [2]:
from bs4 import BeautifulSoup 
import requests 
import pandas as pd
import numpy as np

### Webscraping

Webscraping from Wikipedia using **BeautifulSoup**.

Finding out how many tables present in the page

In [3]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
data  = requests.get(url).text 
soup = BeautifulSoup(data,"html5lib")
#soup.prettify()
tables = soup.find_all('table')
print(len(tables))

3


### Data loading and cleaning

Finding out our required table containing the neighborhood data by using the *'Not assigned'* string

Using *find_all* to find the datapoints and loading them into *"neighborhood_data"*

transforming the scraped data to a pandas dataframe *"neighborhood_df"*

**Cleaning** the data

In [4]:
for index,table in enumerate(tables):
    if ("Not assigned" in str(table)):
        table_index = index
#print(table_index)

neighborhood_data = []

for row in tables[table_index].tbody.find_all("td"):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        neighborhood_data.append(cell)
        
#print(neighborhood_data)
neighborhood_df = pd.DataFrame(neighborhood_data)
#neighborhood_df

#print(neighborhood_df['Borough'].unique())
neighborhood_df['Borough']=neighborhood_df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})
#print(neighborhood_df['Borough'].unique())
print("data cleaned.")

data cleaned.


In [5]:
# the dataframe

neighborhood_df

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto Business,Enclave of M4L
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


In [6]:
# shape of dataframe

neighborhood_df.shape

(103, 3)

### Mapping latitude and longitude data with neighborhood dataframe

We will be using the *GeoSpatial Dataset*.

Loading the data from the csv and merging with the neighborhood dataframe.

In [7]:
csv_url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv"
geo_data = pd.read_csv(csv_url)

geo_data = geo_data.rename(columns = {'Postal Code':'PostalCode'})

neighborhood_df = pd.merge(neighborhood_df, geo_data, on='PostalCode')

neighborhood_df
#geo_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto Business,Enclave of M4L,43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509
