Importing libraries, that will be required for the analysis.
Obtaining URL of data for dataset.
Using pandas to transform raw data from URL into  dataframe
Droping rows, where Borough is not assigned

In [25]:
import pandas as pd
import numpy as np
import requests
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

df=pd.read_html(url, header=0)[0]

df = df[df.Borough != "Not assigned"]
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


Grouping the Neighbourhoods together on account of the same Postcode they share.
Arrange them in a list

In [26]:
optim = df.groupby('Postcode', axis=0)['Neighbourhood'].apply(list)
optim.head(20)

Postcode
M1B                                     [Rouge, Malvern]
M1C             [Highland Creek, Rouge Hill, Port Union]
M1E                  [Guildwood, Morningside, West Hill]
M1G                                             [Woburn]
M1H                                          [Cedarbrae]
M1J                                [Scarborough Village]
M1K        [East Birchmount Park, Ionview, Kennedy Park]
M1L                    [Clairlea, Golden Mile, Oakridge]
M1M    [Cliffcrest, Cliffside, Scarborough Village West]
M1N                        [Birch Cliff, Cliffside West]
M1P    [Dorset Park, Scarborough Town Centre, Wexford...
M1R                                  [Maryvale, Wexford]
M1S                                          [Agincourt]
M1T            [Clarks Corners, Sullivan, Tam O'Shanter]
M1V    [Agincourt North, L'Amoreaux East, Milliken, S...
M1W                                    [L'Amoreaux West]
M1X                                        [Upper Rouge]
M2H                   

Creating a new dataframe: for every Postcode in the old one, we paste all iterations of Neighbourhood in the optimized dataframe

In [29]:
for i in range(0, len(df)):
    N = df['Neighbourhood'].iloc[i]
    for i2 in range(0, len(optim)):
        if str(N) in optim[i2]:
            df['Neighbourhood'].iloc[i] = optim[i2]
df.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,[Parkwoods]
3,M4A,North York,[Victoria Village]
4,M5A,Downtown Toronto,"[Harbourfront, Regent Park]"
5,M5A,Downtown Toronto,"[Harbourfront, Regent Park]"
6,M6A,North York,"[Lawrence Heights, Lawrence Manor]"
7,M6A,North York,"[Lawrence Heights, Lawrence Manor]"
8,M7A,Queen's Park,[Not assigned]
10,M9A,Etobicoke,[Islington Avenue]
11,M1B,Scarborough,"[Rouge, Malvern]"
12,M1B,Scarborough,"[Rouge, Malvern]"


Turning the list of entries into the string format
Getting rid of all punctuation, that is not required

In [30]:
df['Neighbourhood'] = df['Neighbourhood'].astype(str).str.strip("]").str.strip("[").str.lstrip("'").str.rstrip("'")
df.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Harbourfront', 'Regent Park"
5,M5A,Downtown Toronto,"Harbourfront', 'Regent Park"
6,M6A,North York,"Lawrence Heights', 'Lawrence Manor"
7,M6A,North York,"Lawrence Heights', 'Lawrence Manor"
8,M7A,Queen's Park,Not assigned
10,M9A,Etobicoke,Islington Avenue
11,M1B,Scarborough,"Rouge', 'Malvern"
12,M1B,Scarborough,"Rouge', 'Malvern"


Lets drop the duplicate Postcodes

In [31]:
df = df.drop_duplicates(subset=None, keep='first', inplace=False)
df.head(15)

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Harbourfront', 'Regent Park"
6,M6A,North York,"Lawrence Heights', 'Lawrence Manor"
8,M7A,Queen's Park,Not assigned
10,M9A,Etobicoke,Islington Avenue
11,M1B,Scarborough,"Rouge', 'Malvern"
14,M3B,North York,Don Mills North
15,M4B,East York,"Woodbine Gardens', 'Parkview Hill"
17,M5B,Downtown Toronto,"Ryerson', 'Garden District"


Lets get the shape of the dataframe

In [32]:
df.shape

(105, 3)

Importing CSV
Importing a csv file with geo data 

In [33]:
import csv
url = 'https://cocl.us/Geospatial_data'
glob = pd.read_csv(url)
glob.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Changing Postal Code to Postcode for future merger of dataframes

In [34]:
glob = glob.rename(columns={"Postal Code" : "Postcode"})
glob.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Merging Neighbourhood dataframe with the Geo data

In [35]:
NDF = pd.merge(df, glob, on="Postcode")
NDF.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront', 'Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights', 'Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Not assigned,43.662301,-79.389494


Installing Folium for mapping
Creating markers hor neighbourhoods
Displaying map

In [36]:
!pip install folium
import folium
Latt = 43.654260
Long = -79.360636
Mmap = folium.Map(location=[Latt, Long], zoom_start=15)


for lat, lng, borough, neighbourhood in zip(NDF['Latitude'], NDF['Longitude'], NDF['Borough'], NDF['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng], radius=9, popup=label, color='red', fill=True).add_to(Mmap)

Mmap

