# Webscraping List of Postal codes of Canada: M

### Riyad Febrian on IBM Data Science Professional Coursera
### Link : https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

#### Import Required Library

In [1]:
import requests
import pandas as pd
from bs4 import BeautifulSoup

#### Fetch the HTML

In [2]:
url = ('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
response = requests.get(url)

In [3]:
soup = BeautifulSoup(response.content, 'html.parser')

#### Extracting html tag with BeautifulSoup

In [4]:
table = soup.find_all('table', class_='wikitable sortable')

In [5]:
# assigned list to save head and content of table
table_head = []
table_content = []

In [6]:
# iterate through the table and get for <th> tag and add the value to the list table_head
for row in table:
    head = row.find_all('th')
    table_head = [th.text.strip() for th in head]

In [7]:
table_head

['Postal Code', 'Borough', 'Neighborhood']

In [8]:
# iterate to get <td> tag and add the value to table_content
for row in table:
    content = row.find_all('td')
    table_content = [td.text.strip() for td in content]
    

In [9]:
table_content[:6]

['M1A', 'Not assigned', 'Not assigned', 'M2A', 'Not assigned', 'Not assigned']

#### Separate the table content

In [10]:
tb_content_1 = table_content[::3]
tb_content_2 = table_content[1::3]
tb_content_3 = table_content[2::3]

#### Create Dataframe from the list 

In [11]:
df = pd.DataFrame([])
df[table_head[0]] = tb_content_1
df[table_head[1]] = tb_content_2
df[table_head[2]] = tb_content_3

In [12]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


#### Remove cell Not Assigned in Borough

In [13]:
df.drop(df[df['Borough'] == 'Not assigned'].index, inplace=True)

#### Fill Not Assigned Neighborhood with Borough value

In [14]:
# Before we fill the Neighborhood value, let's check if its exists 'Not assigned' or not
check_na_neighbor = list(df['Neighborhood'] == 'Not assigned')
print(f' Neighborhood with Not Assigned value =  {check_na_neighbor.count(True)}')

 Neighborhood with Not Assigned value =  0


#### conclusion: because no value to  change, our final dataframe is look like this

In [15]:
df.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"
11,M3B,North York,Don Mills
12,M4B,East York,"Parkview Hill, Woodbine Gardens"
13,M5B,Downtown Toronto,"Garden District, Ryerson"


In [16]:
df.shape

(103, 3)

#### Save as csv

In [17]:
df.to_csv('webscraping_canada_m.csv')