# Introduction 

The website scraped is the Official Gazette of the Philippines (https://www.officialgazette.gov.ph/section/executive-orders/), the country's repository of legal documents. In particular, the section on executive orders. This site was chosen because legal documents are typically not hosted on APIs, and the site itself contains this information in a structured manner across different portals. Access to timely and effective jurisprudence is key to several undertakings that aim to democratize access to legal information, as well as a bevy of research into the natural-language processes of legal domains using AI (Dyevre, 2021; Ibarra & Revilla, 2014; Peramo et al., 2021; Virtucio et al., 2018). 

The website was scraped at <insert time and date here>. The robots text can be seen here: <insert img>

# Building and Running the Selenium Scraper 


##### Importing libraries 

In [1]:
# importing the necessary libraries 

from selenium import webdriver 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.chrome.options import Options 
from selenium.webdriver.common.by import By 
from time import sleep 
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager 
import pandas as pd 
from bs4 import BeautifulSoup
import requests 

##### Initializing Selenium 

In [3]:
# setting target-page
base_url = "https://www.officialgazette.gov.ph/section/executive-orders/"

# # window settings - UNCOMMENT after running the noteboko fully
# options = webdriver.ChromeOptions()
# options.binary_location = ""
# options.add_argument("--headless")
# options.add_argument("--start-maximized")
# options.add_argument("--incognito")

# initializing driver options 
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get(base_url)
sleep(3) 


##### Scraping the relevant content 

In [20]:
# access to list of individual entry options 

main_body = driver.find_element(by=By.XPATH, value="/html/body/div[2]/section/main/div/div[1]") # this is the main div that contains each individual EO as nested thing 
eo_articles = main_body.find_elements(by=By.TAG_NAME, value='article') # contains each EO card 

for eo in eo_articles: 
    entry_title = eo.find_element(by=By.CLASS_NAME, value='entry-title')
    title = entry_title.text

    # selecting 
    
    # choose one of the two formats 
    url = eo.find_element(by=By.TAG_NAME, value='a').get_attribute('href')
    url2 = eo.find_element(by=By.TAG_NAME, value='a').text 


    # testing the loop logic 
    print(title)
    print(url2)
    print('----') 

# finding the necessary sub-elements 



    

Executive Order No. 5, s. 2022
Executive Order No. 5, s. 2022
----
Executive Order No. 4, s. 2022
Executive Order No. 4, s. 2022
----
Executive Order No. 3, s. 2022
Executive Order No. 3, s. 2022
----
Executive Order No. 2, s. 2022
Executive Order No. 2, s. 2022
----
Executive Order No. 1, s. 2022
Executive Order No. 1, s. 2022
----
Executive Order No. 176, s. 2022
Executive Order No. 176, s. 2022
----
Executive Order No. 175, s. 2022
Executive Order No. 175, s. 2022
----
Executive Order No. 174, s. 2022
Executive Order No. 174, s. 2022
----
Executive Order No. 173, s. 2022
Executive Order No. 173, s. 2022
----
Executive Order No. 172, s. 2022
Executive Order No. 172, s. 2022
----


In [None]:
print(entry_title[1].text) 

In [8]:
entry_title = main_body.find_elements(by=By.CLASS_NAME, value='entry-title')


for entry in entry_title: 
    print(entry.text) 


Executive Order No. 5, s. 2022
Executive Order No. 4, s. 2022
Executive Order No. 3, s. 2022
Executive Order No. 2, s. 2022
Executive Order No. 1, s. 2022
Executive Order No. 176, s. 2022
Executive Order No. 175, s. 2022
Executive Order No. 174, s. 2022
Executive Order No. 173, s. 2022
Executive Order No. 172, s. 2022


##### Clicking on the next page 


In [None]:
# there are 10 documents per page, so we can set a for loop with range 1-10 to collect our 100 data points 

# find the lick to older entries and have the page click on it as it scrapes through the necessary content

# References 

Dyevre, A. (2021). Text-mining for lawyers: How machine learning techniques can advance our understanding of legal discourse. Erasmus Law Review, 14, 7. https://heinonline.org/HOL/Page?handle=hein.journals/erasmus14&id=9&div=&collection=

Ibarra, V. C., & Revilla, C. D. (2014). Consumers’ awareness on their eight basic rights: A comparative study of filipinos in the philippines and guam (SSRN Scholarly Paper No. 2655817). Social Science Research Network. https://papers.ssrn.com/abstract=2655817

Peramo, E., Cheng, C., & Cordel, M. (2021). Juris2vec: Building word embeddings from philippine jurisprudence. 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), 121–125. https://doi.org/10.1109/ICAIIC51459.2021.9415251

Virtucio, M. B. L., Aborot, J. A., Abonita, J. K. C., Aviñante, R. S., Copino, R. J. B., Neverida, M. P., Osiana, V. O., Peramo, E. C., Syjuco, J. G., & Tan, G. B. A. (2018). Predicting decisions of the philippine supreme court using natural language processing and machine learning. 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), 02, 130–135. https://doi.org/10.1109/COMPSAC.2018.10348