import pandas as pd
import requests
from bs4 import BeautifulSoup

res = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0] 
df = pd.read_html(str(table))
print(df[0].to_json(orient='records'))

First I imported all of the necessary libraries.I used the BeautifulSoup library to create this dataframe. I first found the table and made a json file of the text of the table. 

In [339]:
res = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0] 
df = pd.read_html(str(table))[0]
Postcode = df["Postcode"].tolist()
Borough = df["Borough"].tolist()
Neighbourhood = df["Neighbourhood"].tolist()



I made a dataframe by retriving the original table and converting the parts into columns. I later made the columns into a zippedlist and converted the zippedlist into a dataframe.

In [340]:
df = pd.read_html(str(table))
df

[    Postcode           Borough  \
 0        M1A      Not assigned   
 1        M2A      Not assigned   
 2        M3A        North York   
 3        M4A        North York   
 4        M5A  Downtown Toronto   
 5        M5A  Downtown Toronto   
 6        M6A        North York   
 7        M6A        North York   
 8        M7A      Queen's Park   
 9        M8A      Not assigned   
 10       M9A         Etobicoke   
 11       M1B       Scarborough   
 12       M1B       Scarborough   
 13       M2B      Not assigned   
 14       M3B        North York   
 15       M4B         East York   
 16       M4B         East York   
 17       M5B  Downtown Toronto   
 18       M5B  Downtown Toronto   
 19       M6B        North York   
 20       M7B      Not assigned   
 21       M8B      Not assigned   
 22       M9B         Etobicoke   
 23       M9B         Etobicoke   
 24       M9B         Etobicoke   
 25       M9B         Etobicoke   
 26       M9B         Etobicoke   
 27       M1C       

In [341]:
zippedList=list(zip(Postcode, Borough, Neighbourhood))
zippedList

[('M1A', 'Not assigned', 'Not assigned'),
 ('M2A', 'Not assigned', 'Not assigned'),
 ('M3A', 'North York', 'Parkwoods'),
 ('M4A', 'North York', 'Victoria Village'),
 ('M5A', 'Downtown Toronto', 'Harbourfront'),
 ('M5A', 'Downtown Toronto', 'Regent Park'),
 ('M6A', 'North York', 'Lawrence Heights'),
 ('M6A', 'North York', 'Lawrence Manor'),
 ('M7A', "Queen's Park", 'Not assigned'),
 ('M8A', 'Not assigned', 'Not assigned'),
 ('M9A', 'Etobicoke', 'Islington Avenue'),
 ('M1B', 'Scarborough', 'Rouge'),
 ('M1B', 'Scarborough', 'Malvern'),
 ('M2B', 'Not assigned', 'Not assigned'),
 ('M3B', 'North York', 'Don Mills North'),
 ('M4B', 'East York', 'Woodbine Gardens'),
 ('M4B', 'East York', 'Parkview Hill'),
 ('M5B', 'Downtown Toronto', 'Ryerson'),
 ('M5B', 'Downtown Toronto', 'Garden District'),
 ('M6B', 'North York', 'Glencairn'),
 ('M7B', 'Not assigned', 'Not assigned'),
 ('M8B', 'Not assigned', 'Not assigned'),
 ('M9B', 'Etobicoke', 'Cloverdale'),
 ('M9B', 'Etobicoke', 'Islington'),
 ('M9B', 

In [342]:
dfObj = pd.DataFrame(zippedList, columns = ['Postcode' , 'Borough', 'Neighborhood']) 

In [343]:
dfObj.head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


In [344]:
dfObj = dfObj[dfObj.Borough != 'Not assigned']
dfObj.head()

Unnamed: 0,Postcode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


In [345]:
df1 = dfObj.groupby("Postcode")
df1.head(10)

Unnamed: 0,Postcode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
10,M9A,Etobicoke,Islington Avenue
11,M1B,Scarborough,Rouge
12,M1B,Scarborough,Malvern


I grouped the neighborhoods by postcode. I then added back the boroughs column and swapped the positions of the "Neighborhood" and "Borough" columns to resemble original dataframe.

In [346]:
df1 = dfObj.groupby('Postcode')['Neighborhood'].apply(lambda x: ', '.join(x.astype(str))).reset_index()
df1


Unnamed: 0,Postcode,Neighborhood
0,M1B,"Rouge, Malvern"
1,M1C,"Highland Creek, Rouge Hill, Port Union"
2,M1E,"Guildwood, Morningside, West Hill"
3,M1G,Woburn
4,M1H,Cedarbrae
5,M1J,Scarborough Village
6,M1K,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,"Clairlea, Golden Mile, Oakridge"
8,M1M,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,"Birch Cliff, Cliffside West"


In [347]:
df1['Borough']=dfObj['Borough']
df1

Unnamed: 0,Postcode,Neighborhood,Borough
0,M1B,"Rouge, Malvern",
1,M1C,"Highland Creek, Rouge Hill, Port Union",
2,M1E,"Guildwood, Morningside, West Hill",North York
3,M1G,Woburn,North York
4,M1H,Cedarbrae,Downtown Toronto
5,M1J,Scarborough Village,Downtown Toronto
6,M1K,"East Birchmount Park, Ionview, Kennedy Park",North York
7,M1L,"Clairlea, Golden Mile, Oakridge",North York
8,M1M,"Cliffcrest, Cliffside, Scarborough Village West",Queen's Park
9,M1N,"Birch Cliff, Cliffside West",


In [348]:
df1=df1[["Postcode", "Neighborhood", "Borough"]]
df1

Unnamed: 0,Postcode,Neighborhood,Borough
0,M1B,"Rouge, Malvern",
1,M1C,"Highland Creek, Rouge Hill, Port Union",
2,M1E,"Guildwood, Morningside, West Hill",North York
3,M1G,Woburn,North York
4,M1H,Cedarbrae,Downtown Toronto
5,M1J,Scarborough Village,Downtown Toronto
6,M1K,"East Birchmount Park, Ionview, Kennedy Park",North York
7,M1L,"Clairlea, Golden Mile, Oakridge",North York
8,M1M,"Cliffcrest, Cliffside, Scarborough Village West",Queen's Park
9,M1N,"Birch Cliff, Cliffside West",


In [350]:
df1.Borough.fillna(df1.Neighborhood, inplace=True)
df1

Unnamed: 0,Postcode,Neighborhood,Borough
0,M1B,"Rouge, Malvern","Rouge, Malvern"
1,M1C,"Highland Creek, Rouge Hill, Port Union","Highland Creek, Rouge Hill, Port Union"
2,M1E,"Guildwood, Morningside, West Hill",North York
3,M1G,Woburn,North York
4,M1H,Cedarbrae,Downtown Toronto
5,M1J,Scarborough Village,Downtown Toronto
6,M1K,"East Birchmount Park, Ionview, Kennedy Park",North York
7,M1L,"Clairlea, Golden Mile, Oakridge",North York
8,M1M,"Cliffcrest, Cliffside, Scarborough Village West",Queen's Park
9,M1N,"Birch Cliff, Cliffside West","Birch Cliff, Cliffside West"


I set all the Borough values that were missing to that of the respective neighborhood. Below is the shape of the dataset.

In [251]:
df1.shape

(103, 3)

In [351]:
geo_df = pd.read_csv("http://cocl.us/Geospatial_data")
geo_df

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


In [328]:
geo_df.shape

(103, 3)

In [352]:
df_all_cols = pd.concat([df1, geo_df], axis = 1)


df_all_cols

I merged the datasets.

In [354]:
df_all_cols.drop("Postal Code", axis=1, inplace=True)

In [355]:
df_all_cols

Unnamed: 0,Postcode,Neighborhood,Borough,Latitude,Longitude
0,M1B,"Rouge, Malvern","Rouge, Malvern",43.806686,-79.194353
1,M1C,"Highland Creek, Rouge Hill, Port Union","Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,"Guildwood, Morningside, West Hill",North York,43.763573,-79.188711
3,M1G,Woburn,North York,43.770992,-79.216917
4,M1H,Cedarbrae,Downtown Toronto,43.773136,-79.239476
5,M1J,Scarborough Village,Downtown Toronto,43.744734,-79.239476
6,M1K,"East Birchmount Park, Ionview, Kennedy Park",North York,43.727929,-79.262029
7,M1L,"Clairlea, Golden Mile, Oakridge",North York,43.711112,-79.284577
8,M1M,"Cliffcrest, Cliffside, Scarborough Village West",Queen's Park,43.716316,-79.239476
9,M1N,"Birch Cliff, Cliffside West","Birch Cliff, Cliffside West",43.692657,-79.264848


In [None]:
#visualizing Woburn
!conda install -c conda-forge folium=0.5.0 
import folium # plotting libra

I started by visualizing Woburn (red), Weston (green), and Cederbrae (blue).

In [363]:
venues_map = folium.Map(location=[43.770992, -79.216917], zoom_start=10) 
# add a red circle marker to represent the zipcode MIG
folium.features.CircleMarker(
    [43.770992, -79.216917],
    radius=10,
    color='red',
    popup='MIG',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

venues_map

In [365]:
# add a green circle marker to represent Weston
folium.features.CircleMarker(
    [43.706876, -79.518188],
    radius=10,
    color='green',
    popup='Weston',
    fill = True,
    fill_color = 'green',
    fill_opacity = 0.6
).add_to(venues_map)

venues_map

In [367]:


folium.features.CircleMarker(
    [43.773136, -79.239476],
    radius=10,
    color='blue',
    popup='Cedarbrae',
    fill = True,
    fill_color = 'blue',
    fill_opacity = 0.6
).add_to(venues_map)

venues_map

We can visualize how Cedarbrae and Woburn are very close to each other and Weston is further away.