This Notebook will be used to scrape the Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe like the one shown below:

First import the necessary libraries. I will use Pandas and wikipdia libraries to get the Postal Codes data.

In [23]:
import pandas as pd
import wikipedia as wp

Now we will get the HTML page and load the into the dataframe

In [25]:
#Get the html source and load to dataframe
html = wp.page("List_of_postal_codes_of_Canada:_M").html()
wikiDF = pd.read_html(html)[0] #This will load the table to dataframe

wikiDF = wikiDF.iloc[1:] # get all rows except the first one since it has table headings

#Define Column Names and assign the to dataframe
columnNames = ['Postal Code','Borough','Neighborhood']
wikiDF.columns = columnNames
wikiDF.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront


Start cleaning the DataFrame

In [26]:
#Drop rows where Borough = "Not Assigned"
wikiDF = wikiDF[wikiDF["Borough"] != "Not assigned"]
wikiDF.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront
6,M5A,Downtown Toronto,Regent Park
7,M6A,North York,Lawrence Heights
8,M6A,North York,Lawrence Manor
9,M7A,Queen's Park,Not assigned
11,M9A,Etobicoke,Islington Avenue
12,M1B,Scarborough,Rouge
13,M1B,Scarborough,Malvern


Let's first Create a function that will help us to make "Neighborhood" = "Borough", if "Neighborhood" = "Not assigned"

In [27]:
#Function to get neighborhood name
def get_Borough_Name(data):
    if data['Neighborhood'] == "Not assigned":
        neighborhood_Name = data['Borough']
    else:
        neighborhood_Name = data['Neighborhood']
        
    return neighborhood_Name

Now let's us update "Not assigned" Neighborhoods.

In [28]:
# Change Neighborhood = Borough if Neighborhood = "Not assigned" by invoking get_Borough_Name function
wikiDF['Neighborhood']=wikiDF[['Borough', 'Neighborhood']].apply(get_Borough_Name, axis=1)
wikiDF.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront
6,M5A,Downtown Toronto,Regent Park
7,M6A,North York,Lawrence Heights
8,M6A,North York,Lawrence Manor
9,M7A,Queen's Park,Queen's Park
11,M9A,Etobicoke,Islington Avenue
12,M1B,Scarborough,Rouge
13,M1B,Scarborough,Malvern


More than one neighborhood can exist in one postal code area. These two rows will be combined into one row with the neighborhoods separated with a comma.

In [29]:
#Combine Neighborhood if same PostCode
wikiDF = wikiDF.groupby(['Postal Code','Borough'])['Neighborhood'].apply(', '.join).reset_index()
wikiDF.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


Now we need to get Latidude and Longitude for the postal codes
Plan was to use "geocoder". However, it was not retrieving coordinates. Hence, used Geospatial_Coordinats.csv file

In [30]:
#Read Geospatial Coordinates file into a dataframe
latlngDF = pd.read_csv("http://cocl.us/Geospatial_data/Geospatial_Coordinates.csv")
latlngDF.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Now let's join the two DataFrame (wikiDF and latlngDF) on "Postal Code"

In [31]:
#Join the wikiDF and latlngDF on Postal Code
mergedDF = wikiDF.join(latlngDF.set_index('Postal Code'), on='Postal Code')
mergedDF.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


Import folium for plotting maps and geopy to get the coordinates of Toronto

In [32]:
import folium
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

In [33]:
address = 'Toronto'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [34]:
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, postalcode, borough, neighborhood in zip(mergedDF['Latitude'], mergedDF['Longitude'], mergedDF['Postal Code'],mergedDF['Borough'], mergedDF['Neighborhood']):
    label = '{}, {}, {}'.format(postalcode, borough, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto