<h4>Import Pandas, Beautiful Soup and Requests libraries</h4> 

In [7]:
import pandas as pd
from bs4 import BeautifulSoup
import requests
print('Libraries imported.')

Libraries imported.


<h4>Get the requred data and add it to dataframe</h4>
Get the page html from https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M. Parse it using Beautiful Soup, finding the table with the required data and create the dataframe

In [8]:
page_html = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(page_html, 'html.parser')
code_table = soup.find('table', {'class':'wikitable sortable'})
code_table_rows = code_table.find_all('tr')
rows = []
for tr in code_table_rows:
    td = tr.find_all('td')
    row = [tr.text.strip() for tr in td if tr.text.strip()]
    if row:
        rows.append(row)
df_loc = pd.DataFrame(rows, columns=['PostalCode', 'Borough', 'Neighbourhood'])

<h4>Clean the dataframe</h4>
Add column names then remove all rows where we have <b>Not assigned</b> values in <b>Borough</b> colunm. Replace the <b>Not assigned</b> values in <b>Neighborhood</b> with the corresponding values of <b>Borough</b>.

In [9]:
df_loc = pd.DataFrame(rows, columns=['PostalCode', 'Borough', 'Neighbourhood'])
df_loc = df_loc[df_loc.Borough != 'Not assigned']
#cols = df_loc.columns
#df_loc[cols] = df_loc[cols].mask(df_loc[cols].apply(lambda x: x.str.contains('Not assigned')), df_loc['Borough'], axis=0)

df_loc[['Neighbourhood']] = df_loc[['Neighbourhood']].mask(df_loc[['Neighbourhood']].apply(lambda x: x.str.contains('Not assigned')), df_loc['Borough'], axis=0)

df_loc.index = range(len(df_loc))
df_loc.head(12)

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,Queen's Park
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


<h4>Group Boroughs by PostalCode</h4>

In [10]:
df_loc = df_loc.groupby('PostalCode').agg({'Borough':'min', 'Neighbourhood':', '.join})
df_loc.reset_index(inplace=True)
df_loc.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [11]:
df_loc.shape

(103, 3)

<h4>Import Geospatial_data file</h4>
Read Geospatial_data file contaning the coordinates by postal code

In [12]:
filename = "http://cocl.us/Geospatial_data"
df_crd = pd.read_csv(filename, index_col=0)
df_crd = df_crd.reset_index()
df_crd.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


<h4>Add coordinates to Neighbourhood dataframe</h4>
Find respective coordinates by the Portal code in Geospatial_data dataframe and add it to Neighbourhood dataframe

In [13]:
for i in df_loc.index: 
    coordinates = df_crd.loc[df_crd['Postal Code'] ==  df_loc.iloc[i]['PostalCode'], ('Latitude', 'Longitude')]
    df_loc.loc[[0,i], 'Latitude'] = coordinates.Latitude.iloc[0]
    df_loc.loc[[0,i], 'Longitude'] = coordinates.Longitude.iloc[0]
df_loc.head(12)

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.706748,-79.594054
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848
