### Importing libraries

In [1]:
import pandas as pd
from bs4 import BeautifulSoup
import requests

### I use Requests library to get wikipedia page as html, then I parse it with BeautifulSoup and finally get the table

In [2]:
test=requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
html=test.content

In [3]:
soup = BeautifulSoup(html, 'html.parser')
table=soup.find('table')

### I find table rows in the table's body and then cells in each row, then I assign the row in the table to the ith row of my dataframe

In [4]:
tbody=table.tbody
table_rows = table.find_all('tr')
df=pd.DataFrame(columns=['PostalCode','Borough','Neighborhood'])

for i,tr in zip(range(len(table_rows)),table_rows):
    td = tr.find_all('td')
    if len(td)==0: #If row is empty, skip to next row
        continue
    row = [traw.text.replace('\n','') for traw in td] #Strip \n character at the end of each row
    df.loc[i]=row

    

Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.

In [5]:
df=df[df['Borough']!='Not assigned']

More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.

In [6]:
df1=df.groupby(['PostalCode','Borough'])['Neighborhood'].apply(lambda x: "%s" % ', '.join(x)).reset_index()

If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.

In [7]:
df1['Neighborhood'][df1['Neighborhood']=='Not assigned']=df1['Borough'][df1['Neighborhood']=='Not assigned']

In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

In [8]:
print("Number of DataFrame rows: "+str(df1.shape[0]))

Number of DataFrame rows: 103


# Get Latitude and Longitude from file

In [9]:
latlon=pd.read_csv('Geospatial_Coordinates.csv')

In [10]:
df2=df1.merge(latlon,how='left',left_on='PostalCode',right_on='Postal Code')

In [11]:
df2.drop('Postal Code',inplace=True,axis=1)

In [12]:
df2

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
98,M9N,York,Weston,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,43.696319,-79.532242
100,M9R,Etobicoke,"Kingsview Village, Martin Grove Gardens, Richv...",43.688905,-79.554724
101,M9V,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ...",43.739416,-79.588437


# Clustering and Mapping

In [25]:
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium


ModuleNotFoundError: No module named 'geopy'

In [26]:
latitude=43.70011 
longitude=-79.4163
#Toronto latitude and longitude

In [43]:
map_toronto=folium.Map(location=[latitude,longitude],zoom_start=11)

In [44]:

for lat,lon,borough,neighborhood in zip(df2['Latitude'],df2['Longitude'],df2['Borough'],df2['Neighborhood']):
    label='{},{}'.format(neighborhood,borough)
    label=folium.Popup(label,parse_html=False)
    color='blue'
    if 'Toronto' in borough: color='red'
    folium.CircleMarker(
        [lat,lon],
        radius=5,
        popup=label,
        color=color,
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)

In [45]:
map_toronto

In [46]:
df2Tor=df2[df2['Borough'].str.contains('Toronto')]

In [47]:
df2Tor.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
37,M4E,East Toronto,The Beaches,43.676357,-79.293031
41,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
42,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572
43,M4M,East Toronto,Studio District,43.659526,-79.340923
44,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


In [55]:
latitude,longitude=43.70011,-79.4163
Tor_map=folium.Map(location=[latitude,longitude],zoom_start=12)
for lat,lon,borough,neighborhood in zip(df2Tor['Latitude'],df2Tor['Longitude'],df2Tor['Borough'],df2Tor['Neighborhood']):
    label='{},{}'.format(neighborhood,borough)
    label=folium.Popup(label,parse_html=False)
    color='blue'
    if 'Toronto' in borough: color='red'
    folium.CircleMarker(
        [lat,lon],
        radius=5,
        popup=label,
        color=color,
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(Tor_map)

In [57]:
Tor_map