# Obtain info on Toronto neighborhoods

We will use Beautiful Soup package in Python to scrap the Wikipedia page and obtain the table containing information about neighborhoods in Toronto

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

First, we pull the Wikipedia webpage from the Internet and then use `BeautifulSoup` to parse the HTML code.

In [2]:
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
soup = BeautifulSoup(website_url.text, 'lxml')

The first table on the webpage contain the desired information, which can be scrapped and store in the `my_table`:

In [3]:
my_table = soup.find('table')

We loop through each row of the table and pull out the contain in each cell.

In [4]:
Postcode = []
Borough = []
Neighborhood = []

row_idx = 0

for row in my_table.find_all('tr'):
    if row_idx > 0:
        cols = row.find_all('td')
        Postcode.append(cols[0].get_text().strip())
        Borough.append(cols[1].get_text().strip())
        Neighborhood.append(cols[2].get_text().strip())
    row_idx += 1
df = pd.DataFrame({'PostalCode':Postcode, 'Borough': Borough, 'Neighborhood' : Neighborhood})    

We remove all the rows whose Boroughs were not assigned.

In [5]:
df = df[df['Borough'] != 'Not assigned']

We combine all the rows that have the same postal code and borough together. The neighborhoods' name are concatenated and separated by a comma.

In [6]:
df = df.groupby(['PostalCode', 'Borough'])['Neighborhood'].apply(lambda x: ', '.join(x)).reset_index()

We check to see whether there is any borough whose neighborhood are not named (i.e., `Not assigned`). Fortunately, none happens 

In [7]:
df[df['Neighborhood'] == 'Not assigned']

Unnamed: 0,PostalCode,Borough,Neighborhood


In [8]:
df.shape

(103, 3)

In [9]:
coords = pd.read_csv("Geospatial_Coordinates.csv")
coords.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [13]:
df = df.merge(coords, how = 'left', left_on='PostalCode', right_on='Postal Code').drop('Postal Code', axis = 1)

In [14]:
df

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
98,M9N,York,Weston,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,43.696319,-79.532242
100,M9R,Etobicoke,"Kingsview Village, Martin Grove Gardens, Richv...",43.688905,-79.554724
101,M9V,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ...",43.739416,-79.588437
