# Web Scraping using BeautifulSoup

<b><u>Objectives:</u></b>
* Using `requests` to download server-side rendered HTML code 
* Using `BeautifulSoup` to parse HTML code


### <u>Scrape for Latest COE Price</u>

We will extract latest COE price from following website:
* https://www.onemotoring.com.sg/content/onemotoring/home/buying/coe-open-bidding.html

Confirm that the desired data in webpage is **server-side rendered**.
* Copy a string of the desired data on webpage
* Right click on webpage and select `View Page Source`
* The string should be found in the HTML code

In [2]:
!pip install beautifulsoup4



In [6]:
import bs4
bs4.__version__

'4.9.1'

## Make Soup

Import libraries.

In [1]:
from bs4 import BeautifulSoup
import requests

Use `requests` to send GET request to server and download HTML.
* Use status code to make sure request is successful.

In [12]:
URL = 'https://www.onemotoring.com.sg/content/onemotoring/home/buying/coe-open-bidding.html'

resp = requests.get(URL)
print(resp.status_code)

200


Make a soup from HTML code, which is in `resp.text`.

In [5]:
soup = BeautifulSoup(resp.text)
print(soup.title)
print(soup.title.text)

<title>COE Open Bidding | Buying | One Motoring</title>
COE Open Bidding | Buying | One Motoring


## Inspect HTML Elements

Open URL in web browser; Right click on targeted element in webpage and select `Inspect` from context menu.
* It will open the `Element` pane in **Chrome DevTools**
* Examine the HTML code. The data are contained in 2 `<table>` element with attribute `style="width: 100%;"`.

Find the 2 tables using `find_all()` method.

In [11]:
tables = soup.find_all('table', {'style':"width: 100%;"})
print(len(tables))

2


### Extract 1st Table - COE Price

Extract all `<tr>` which each contains a row.

In [47]:
tr_list = tables[0].find_all('tr')
print(len(tr_list))

6


#### Header

Extract table header from each `<tr>`.

In [48]:
th_list = tr_list[0].find_all('th')
header = [ th.text for th in th_list ]
print(header)
header.insert(1, 'Description')
print(header)

['Category', 'Quota', 'QP($)']
['Category', 'Description', 'Quota', 'QP($)']


#### Table Data
Extract table data from each `<tr>`.

In [49]:
data = []
for tr in tr_list:
    td_list = tr.find_all('td')
    row = [ td.text for td in td_list ]
    if row:
        data.append(row)

print(data)

[['A', 'CAR UP TO 1600CC & 97KW', '1035', '37766'], ['B', 'CAR ABOVE 1600CC OR 97KW', '904', '41510'], ['C', 'GOODS VEHICLE & BUS', '354', '26644'], ['D', 'MOTORCYCLE', '496', '7399'], ['E', 'OPEN-ALL EXCEPT MOTORCYCLE', '470', '40790']]


Write to csv file `coe_price.csv`.

In [50]:
import csv

with open('coe_price.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(header)
    writer.writerows(data)

Examine data in file `coe_price.csv`.

In [51]:
!notepad coe_price.csv

## Exercise

### Extract 2nd Table - COE Bids

Extract all `<tr>` which each contains a row.

In [52]:
tr_list = tables[1].find_all('tr')
print(len(tr_list))

6


#### Header

Extract table header from each `<tr>`.

In [53]:
th_list = tr_list[0].find_all('th')
header = [ th.text for th in th_list ]
print(header)
header.insert(1, 'Description')
print(header)

['Category', 'Received', 'Successful', 'Unsuccessful', 'Unused']
['Category', 'Description', 'Received', 'Successful', 'Unsuccessful', 'Unused']


#### Table Data
Extract table data from each `<tr>`.

In [55]:
data = []
for tr in tr_list:
    td_list = tr.find_all('td')
    row = [ td.text for td in td_list ]
    print(row)
    if row:
        data.append(row)

[]
['A', 'CAR UP TO 1600CC & 97KW', '1737', '1035', '702', '0']
['B', 'CAR ABOVE 1600CC OR 97KW', '1715', '892', '823', '12']
['C', 'GOODS VEHICLE & BUS', '525', '350', '175', '4']
['D', 'MOTORCYCLE', '691', '488', '203', '8']
['E', 'OPEN-ALL EXCEPT MOTORCYCLE', '672', '470', '202', '0']


In [56]:
print(data)

[['A', 'CAR UP TO 1600CC & 97KW', '1737', '1035', '702', '0'], ['B', 'CAR ABOVE 1600CC OR 97KW', '1715', '892', '823', '12'], ['C', 'GOODS VEHICLE & BUS', '525', '350', '175', '4'], ['D', 'MOTORCYCLE', '691', '488', '203', '8'], ['E', 'OPEN-ALL EXCEPT MOTORCYCLE', '672', '470', '202', '0']]


Write to csv file `coe_bids.csv`.

In [57]:
import csv

with open('coe_bids.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(header)
    writer.writerows(data)

Examine data in file `coe_bids.csv`.

In [58]:
!notepad coe_bids.csv