This notebook is a part of Data Science Capstone project.

1) Scrape the data from the wiki page and convert the required table into a dataframe.

In [1]:
import pandas as pd
import folium
from geopy.geocoders import Nominatim

print("Libraries imported")

Libraries imported


In [2]:
# Web scraping
scrape = pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
print("No of tables in the webpage: ",len(scrape))
df=scrape[0]
df.head()

No of tables in the webpage:  3


Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [3]:
#to check if there is assigned boroughs with unassigned neighborhood

print("No of assigned boroughs with unassigned neighborhood :",
      len(df[(df["Borough"]!="Not assigned") & (df["Neighborhood"]=="Not assigned")]))

#to check if there is any unassigned boroughs

print("No of unassigned boroughs :",
      len(df[df["Borough"]=="Not assigned"]))

No of assigned boroughs with unassigned neighborhood : 0
No of unassigned boroughs : 77


In [4]:
#creating the dataframe after removing the 77 rows of unassigned boroughs and printing the number of rows

can_df = df[df["Borough"]!= "Not assigned"].reset_index(drop=True)
print("No of rows of new DataFrame : ",can_df.shape[0])

No of rows of new DataFrame :  103


2) Load the geospatial coordinates data and merge it with the neighborhood data.

In [5]:
coordinates=pd.read_csv("C:\\Users\\91814\\Downloads\\Geospatial_Coordinates.csv")
coordinates.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [6]:
#merge the data from the geospatial .csv file

toronto_df = pd.merge(can_df,coordinates,on="Postal Code", how="left")
toronto_df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


3) Generate maps to plot the neighborhoods

In [7]:
# Get the latitude and longitude values for Toronto

geolocator = Nominatim(user_agent="loc_finder")
coordinates=geolocator.geocode("Toronto, Ontario")
coordinates

Location(Toronto, Golden Horseshoe, Ontario, M5H 2N2, Canada, (43.6534817, -79.3839347, 0.0))

In [8]:
# to print a list containing all the unique borough names 

list=toronto_df["Borough"].unique()
print(list)

['North York' 'Downtown Toronto' 'Etobicoke' 'Scarborough' 'East York'
 'York' 'East Toronto' 'West Toronto' 'Central Toronto' 'Mississauga']


In [9]:
map_toronto = folium.Map(location=[coordinates.latitude, coordinates.longitude], zoom_start=10)

#to set a list of colors which would be used to plot the neighborhood on the map

col_series=["Blue","Red","Yellow","Green","Black","Pink","Purple","Grey","Teal","Maroon","Brown"]

for i,fil in enumerate(list):
    filtered=toronto_df[toronto_df["Borough"]==fil]
    n=i
    for lat, long, borough, neighborhood in zip(filtered['Latitude'], filtered['Longitude'], filtered['Borough'], filtered['Neighborhood']):
        label = '{}, {}'.format(neighborhood, borough)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, long],
            radius=5,
            popup=label,
            color=col_series[i],
            fill=True,
            fill_opacity=0.7,
            parse_html=False).add_to(map_toronto)  

map_toronto