# Peer-graded Assignment: Segmenting and Clustering Neighborhoods in Toronto- PART 2

Import the required libraries. Use the **Beautiful Soup** library to scrape and extract the postal codes of the neighborhoods of Toronto from the wikipedia page. See the first part of the assignment for details.

In [44]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

In [45]:
url="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
html_doc=requests.get(url).text

soup = BeautifulSoup(html_doc, 'lxml')

#Find the table class that contains all the postal codes
PC_table=soup.find('table')

Import the html table to a dataframe with pandas. Transform the dataframe as in the first part of the assignment. 

In [52]:
df=pd.read_html(str(PC_table))[0]
df.columns = df.iloc[0]
df.drop(df.index[0], inplace=True)

#Remove all the rows from the dataframe where there is no borough assigned.
df = df.drop(df[df['Borough']=='Not assigned'].index)  

#Find the neighborhoods that do not have an assigned name and assign to them the name of the borough.
df['Neighbourhood'][df[df['Neighbourhood']=='Not assigned'].index]=df['Borough'][df[df['Neighbourhood']=='Not assigned'].index]

#Group the neighborhoods with the same postal code assigned to them. Use  "," as separator and reset the index.
separator = lambda a: ",".join(a) 
df_n=df.groupby(by=['Postcode','Borough']).agg({'Neighbourhood':  separator}).reset_index()

#Rename the column Postcode to Postal Code
df_n = df_n.rename(columns={'Postcode': 'Postal Code'})

df_n.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


Display the  number of rows in the dataframe

In [48]:
print('There are {} rows in the dataframe.'.format(df_n.shape[0]))

There are 103 rows in the dataframe.


## Assign geocoordinates to the postal codes

I have tried to use the geocoder library for a couple of postal codes but unfortunately I am not receiving any reply. Ideally I would have built a for loop around to obtain the coordinates for each postal code, but without any reply from the geocoder it didn't have any use so I have commented out the whole section. 

In [51]:
#!pip install geocoder
#import geocoder
#postal_code=df_n['Postcode'][1]
#print(postal_code)

#lat_lng_coords = None
#while(lat_lng_coords is None):
#    g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
#    lat_lng_coords = g.latlng
    
#latitude = lat_lng_coords[0]
#longitude = lat_lng_coords[1]


As instructed in the assignment description I have used the csv file to obtain the geocoordinates. 

In [53]:
#read the csv file into a pandas dataframe
df_location=pd.read_csv('http://cocl.us/Geospatial_data')
print(df_location.shape)
df_location.head()

(103, 3)


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


The location dataframe has the same number of rows as the dataframe containing the neighborhoods of Toronto. They also have a common column, the *Postal Code* which uniquely defines each row of both dataframes. Therefor use the *Postal Code* column to merge the two dataframes together using the merge functionality from pandas. 

In [54]:
df=pd.merge(df_n, df_location, on='Postal Code')
df.head(10)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff,Cliffside West",43.692657,-79.264848
