# Web Scraping FX Rates

I am refactoring my code using this notebook to:

1. show improvements with how I write code;

2. update BPI website; and

3. hopefully use some functions to serve as Airflow Python Operator.

In [1]:
import os
import requests
import re
from bs4 import BeautifulSoup as bs
import pandas as pd

## Helper Functions

In [2]:
class FX_WEBPAGE():
    
    def __init__(self, bank):
        websites = {'BPI':  "https://www.bpi.com.ph/forex/rates",
                    'Security Bank':    "https://www.securitybank.com/"
                                        "personal/investments/market-information/"
                                        "foreign-exchange-rate-forex/"}
        try:
                self.webpage = websites[bank]
        except:
            raise Exception('This is currently not supported.')
        
    def parse(self):
        return bs(requests.get(self.webpage).text, 'html.parser')
    

Scrape some websites to return a pandas DataFrame object

In [3]:
def bpi_dataframe(webpage):

    fx_table_rows = bs(str(webpage.find_all('tbody')), 'html.parser').find_all('td')

    values = [td.get_text() for td in fx_table_rows]
    
    return pd.DataFrame({'ccy': [re.search('[A-Z][A-Z][A-Z]', text)[0] for text in values[0::3]],
                         'buy': values[1::3],
                         'sell': values[2::3]})

In [4]:
def sb_dataframe(webpage):
    container = webpage.find_all(class_='et_pb_text_inner')[1]
    
    fx_table_rows = bs(str(container)).find_all('td')[3::]
    
    values = [td.get_text() for td in fx_table_rows[5::]]
    
    return pd.DataFrame({'ccy': [re.search('[A-Z][A-Z][A-Z]', text)[0] for text in values[0::5]],
                         'buy': values[2::5],
                         'sell': values[4::5]})

## Download from Website

In [5]:
BPI_WEBPAGE = FX_WEBPAGE('BPI').parse()
SB_WEBPAGE = FX_WEBPAGE('Security Bank').parse()

In [6]:
BPI = bpi_dataframe(BPI_WEBPAGE)
SecurityBank = sb_dataframe(SB_WEBPAGE)

In [7]:
BPI

Unnamed: 0,ccy,buy,sell
0,USD,47.85,48.35
1,EUR,56.134,59.0951
2,JPY,0.4528,0.4767
3,HKD,6.0737,6.3943
4,AUD,34.7322,36.5634
5,SGD,35.1766,37.0339
6,CAD,36.2138,38.124
7,GBP,62.9453,66.264
8,CHF,51.9861,54.7267
9,CNY,7.1558,7.5336


In [8]:
SecurityBank

Unnamed: 0,ccy,buy,sell
0,USD,47.9,48.4
1,EUR,54.7295,59.1054
2,JPY,0.4342,0.4763
3,GBP,61.007,66.2244
4,AUD,33.2941,36.5179
5,CHF,50.1777,54.7511
6,CAD,35.101,38.1029
7,HKD,5.8355,6.4318
8,SGD,33.7919,37.6063
9,KRW,0.040521,0.044913


# Remarks

When I first wrote the code in May 2020, the only problem I was trying to solve was to decide which bank would offer cheaper AUD for my Squarespace subscription. I was writing in imperative and it was my first time to use BeautifulSoup as I hadn't done any web scraping projects before.

Fast forward today, I have read literature (and still reading) about how to write code elegantly. That is, for my code to be read like a *well-written prose*. I have familiarized myself with programming paradigms. I believe I have been influenced by things like PEP8, PEP20, and Clean Code, among others.

In this project in particular, I was able to show:

- **Separation of concerns** — my previous code was one-function-fits-all. Everytime one would call `get_sb_data()`, it will send a GET request to the website and create a pandas table. Here, a GET request (*the what*) only has to be sent when an `FX_WEBPAGE` class is initiated. This will be passed as an argument to its respective function (*the how*) to be converted to a pandas DataFrame.

- **Better use of iterables** — `for i in range(n)` screams beginner. I have learned better. In this project in particular, list comprehensions and slicers are enough.

- **Regex** — getting three consecutive capital letters is a better way to identify a currency. It also saves me a lot of data cleaning tasks particularly encoding differences ("\\xa0"), unnecessary spaces (e.g. "USD &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; US Dollar") and newline strings ("\\n").

Noticeably, I didn't write functions involving dates, or function to save the scraped data. That's for my next project involving a particular stack. ;)

\#