### Part ONE: Prepare the dataframe

_Scrape the table from website._

In [127]:
import requests
import pandas as pd
from bs4 import BeautifulSoup

URL = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
res = requests.get(URL).text
soup = BeautifulSoup(res,'lxml')

_Read the table, build the dataframe, discard invalid rows and cleaning the data._

In [151]:
po=list()
bo=list()
nb=list()
for items in soup.find('table', class_='wikitable').find_all('tr')[1::1]:
    data = items.find_all(['th','td'])
    try:
        aa=data[0].text
        bb=data[1].text
        cc=str.rstrip(data[1].find_next_sibling().text)
        if cc=='Not assigned'and bb!='Not assigned':
            cc=bb
        po.append(aa)
        bo.append(bb)
        nb.append(cc)
    except IndexError:pass
dfo=pd.DataFrame(list(zip(po,bo,nb)),columns=['Postcode','Borough','Neighbourhood'])
dfn=dfo[dfo['Borough']!='Not assigned']
df0=dfn.groupby('Postcode',as_index=False).agg({
    'Borough': lambda x: x.max(),
    'Neighbourhood': lambda x: ', '.join(x)
})      #df0 is the dataframe of part one

_Print the number of rows of the dataframe._

In [152]:
df0.shape

(103, 3)

### Part TWO: Add latitude and longitude coordinates info into dataframe

In [156]:
url="https://cocl.us/Geospatial_data"
tab=pd.read_csv(url)
tab.columns=['Postcode','Latitude','Longitude']
df= df0.merge(tab, on="Postcode", how = 'inner')

In [155]:
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


### Part Three: Explore the data

In [165]:
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
import folium # map rendering library

_Create a map of Toronto._

In [185]:
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[43.651070, -79.347015], zoom_start=11)
map_toronto

_Visualize the neighbourhoods int the map of Toronto._

In [186]:
# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

_(*From the map above, we can clearly find that clusters are gethered much closer in downtown)_

_Let's try to visualize the neibourhood named with Toronto._

In [187]:
map_toronto = folium.Map(location=[43.651070, -79.347015], zoom_start=11)
map_toronto

In [188]:
cen=df[df['Borough'].str.contains('oronto')]
# add markers to map
for lat, lng, borough, neighborhood in zip(cen['Latitude'], cen['Longitude'], cen['Borough'], cen['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### In conclusion, neighbourhoods are much closer to each other in downtown Toronto.