#### Getting the source code of 'List of postal codes of Canada: M' from wikipedia

In [29]:
import requests
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

In [30]:
source = requests.get(url)

Import the **beautifulsoup** for web scrapping

In [31]:
import bs4
soup = bs4.BeautifulSoup(source.text, "lxml")

#### We will find all tables in the source using `find_all` method and use pandas'  `read_html()` function to convert the html code of first table (which is the one we are looking for) in pandas dataframe

In [32]:
import pandas as pd
table = soup.find_all('table')
df = pd.read_html(str(table))[0]
df.head(10)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"


#### We noticed above that there are cells with a borough that is *Not assigned*. We will remove such cells

In [33]:
df[df['Borough']== 'Not assigned'].any(axis = 0) #checking if there are cells with Borough Not Assigned

Postal Code      True
Borough          True
Neighbourhood    True
dtype: bool

#### Let us drop the rows with Borough Not Assigned from our dataframe df

In [34]:
df.drop(df[df['Borough']== 'Not assigned'].index, inplace = True) 
df.head(15)

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"
11,M3B,North York,Don Mills
12,M4B,East York,"Parkview Hill, Woodbine Gardens"
13,M5B,Downtown Toronto,"Garden District, Ryerson"


#### We noticed that our index have changed, so let us reset index:

In [35]:
df.reset_index(drop = True, inplace = True) 
df.head(15)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


#### We will check if a cell has a borough but a 'Not assigned'  neighborhood

In [36]:
#len(df[df['Neighbourhood']== 'Not assigned'])

In [37]:
df[df['Neighbourhood']== 'Not assigned'].any(axis = 0)

Postal Code      False
Borough          False
Neighbourhood    False
dtype: bool

We find that there are no such cell, so we need do nothing.

#### Let's find the shape of data frame:

In [38]:
print(' The shape of our dataframe is {}'.format(df.shape))

 The shape of our dataframe is (103, 3)


In [39]:
pip install geocoder

Note: you may need to restart the kernel to use updated packages.


In [45]:
latlng_df = pd.read_csv('Geospatial_Coordinates.csv')

In [47]:
df.merge(latlng_df, left_on = 'Postal Code', right_on = 'Postal Code')

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509
