Hello fellow peers :) please be kind while grading. Have a great day!

PART 1: Let's install all required packages. If you already installed these, please skip this cell.

In [1]:
!pip install bs4
!pip install lxml
!pip install geopy
!pip install folium



Next, let us import all required libraries.

In [2]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup as BS
import lxml
import requests
import folium
from sklearn.cluster import KMeans
import matplotlib
import json
import matplotlib.cm as cm
import matplotlib.colors as colors
from geopy.geocoders import Nominatim 
from pandas.io.json import json_normalize 

Now let's get the table from the given website, remove the header from the first row and assign the column headers correctly.

In [3]:
source = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BS(source.content, 'lxml')
table = soup.find_all('table')[0]
df = pd.read_html(str(table))[0]
df = df.iloc[1:]
df.columns=["PostalCode", "Borough", "Neighbourhood"]
df

Unnamed: 0,PostalCode,Borough,Neighbourhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront
6,M5A,Downtown Toronto,Regent Park
7,M6A,North York,Lawrence Heights
8,M6A,North York,Lawrence Manor
9,M7A,Queen's Park,Not assigned
10,M8A,Not assigned,Not assigned


Next, we are dropping the cells with no borough assigned.

In [4]:
df = df[df.Borough != 'Not assigned']
df

Unnamed: 0,PostalCode,Borough,Neighbourhood
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront
6,M5A,Downtown Toronto,Regent Park
7,M6A,North York,Lawrence Heights
8,M6A,North York,Lawrence Manor
9,M7A,Queen's Park,Not assigned
11,M9A,Etobicoke,Islington Avenue
12,M1B,Scarborough,Rouge
13,M1B,Scarborough,Malvern


Now let us join the identical PostalCodes.

In [5]:
df = df.groupby('PostalCode').agg({'Borough':'first',
                                   'Neighbourhood':', '.join}).reset_index()
df

Unnamed: 0,PostalCode,Neighbourhood,Borough
0,M1B,"Rouge, Malvern",Scarborough
1,M1C,"Highland Creek, Rouge Hill, Port Union",Scarborough
2,M1E,"Guildwood, Morningside, West Hill",Scarborough
3,M1G,Woburn,Scarborough
4,M1H,Cedarbrae,Scarborough
5,M1J,Scarborough Village,Scarborough
6,M1K,"East Birchmount Park, Ionview, Kennedy Park",Scarborough
7,M1L,"Clairlea, Golden Mile, Oakridge",Scarborough
8,M1M,"Cliffcrest, Cliffside, Scarborough Village West",Scarborough
9,M1N,"Birch Cliff, Cliffside West",Scarborough


Giving the 'Not Assigned' cells from the Neighbourhood column the identical names as the Borough column.

In [6]:
df['Neighbourhood'] = df['Neighbourhood'].replace('Not assigned', df['Borough'])
df


Unnamed: 0,PostalCode,Neighbourhood,Borough
0,M1B,"Rouge, Malvern",Scarborough
1,M1C,"Highland Creek, Rouge Hill, Port Union",Scarborough
2,M1E,"Guildwood, Morningside, West Hill",Scarborough
3,M1G,Woburn,Scarborough
4,M1H,Cedarbrae,Scarborough
5,M1J,Scarborough Village,Scarborough
6,M1K,"East Birchmount Park, Ionview, Kennedy Park",Scarborough
7,M1L,"Clairlea, Golden Mile, Oakridge",Scarborough
8,M1M,"Cliffcrest, Cliffside, Scarborough Village West",Scarborough
9,M1N,"Birch Cliff, Cliffside West",Scarborough


As the last action of part 1, let us print out the shape of the dataframe: 

In [7]:
df.shape

(103, 3)

PART 2: Getting the geographical data as a second dataframe

In [8]:
url = pd.read_csv('https://cocl.us/Geospatial_data')
df2 = pd.DataFrame(url)
df2.rename(columns={'Postal Code':'PostalCode'}, 
                 inplace=True)
df2

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


Now we are merging both dataframes based on the PostalCode.

In [9]:
df_com = pd.merge(df, df2, on='PostalCode', how='outer')
df_com

Unnamed: 0,PostalCode,Neighbourhood,Borough,Latitude,Longitude
0,M1B,"Rouge, Malvern",Scarborough,43.806686,-79.194353
1,M1C,"Highland Creek, Rouge Hill, Port Union",Scarborough,43.784535,-79.160497
2,M1E,"Guildwood, Morningside, West Hill",Scarborough,43.763573,-79.188711
3,M1G,Woburn,Scarborough,43.770992,-79.216917
4,M1H,Cedarbrae,Scarborough,43.773136,-79.239476
5,M1J,Scarborough Village,Scarborough,43.744734,-79.239476
6,M1K,"East Birchmount Park, Ionview, Kennedy Park",Scarborough,43.727929,-79.262029
7,M1L,"Clairlea, Golden Mile, Oakridge",Scarborough,43.711112,-79.284577
8,M1M,"Cliffcrest, Cliffside, Scarborough Village West",Scarborough,43.716316,-79.239476
9,M1N,"Birch Cliff, Cliffside West",Scarborough,43.692657,-79.264848


PART 3: We are retrieving the location values of our chosen city, Toronto.

In [10]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print("The geograpical coordinates of " + str(address) + " are: " + str(latitude) + ", " + str(longitude))

The geograpical coordinates of Toronto, Ontario are: 43.653963, -79.387207


As a final step, let us print the map and assign some labels for each PostalCode.

In [11]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, Borough, Neighbourhood in zip(df_com["Latitude"], df_com["Longitude"], df_com["Borough"], df_com["Neighbourhood"]):
    label = str(Neighbourhood) + ", " + str(Borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [float(lat), float(lng)],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  

map_toronto