## Web Scraping Healthcare Professionals' Information (with Python & Selenium)

### Table of Contents
- [Section 1 - Exploration](#section1)  
- [Section 2 - Main Script](#section2)  
- [Section 3 - References](#section3)  
 ___

<a name="section1"></a>
## Section 1 - Experimentation

### (i) Initial attempt
Showing that direct scraping does not work on the Search Results page

In [24]:
from bs4 import BeautifulSoup
import urllib
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import ElementClickInterceptedException
from selenium.webdriver.common.action_chains import ActionChains
import time
import pandas as pd

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from selenium.common.exceptions import NoSuchElementException

import re

In [25]:
main_page = "https://prs.moh.gov.sg/prs/internet/profSearch/showSearchSummaryByName.action"
main_page_content = urllib.request.urlopen(main_page)
main_page_html = BeautifulSoup(main_page_content, 'html.parser')

In [26]:
main_page_html


<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<script src="https://assets.wogaa.sg/scripts/wogaa.js"></script>
<link href="/prs/css/site.css" rel="stylesheet" type="text/css"/>
<script src="/prs/scripts/iextend.js"></script>
</head>
<body class="margined" onload="setWindowTitleJS('PRS Error');">
<div id="main">
<div class="main-holder">
<div id="content">
<div class="content-t"> </div>
<div class="content-c">
<div class="frame"></div>
<div class="content-entry">
<div class="entry-holder">
<div class="article" style="width: 80%; min-height: 350px;">
<form>
<div class="table_title"></div>
<p align="center">
<span> The system encountered an error processing your request at 03/11/2020 09:46:45. Please email <br/> us at <a href="mailto:prs_helpdesk@ncs.com.sg">prs_helpdesk@ncs.com.sg</a> to report the problem.
										
 											
 																
										</span>
</p>
</form>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>

___

### (ii) Further Exploration
Working on the original homepage instead

In [27]:
home_page = "https://prs.moh.gov.sg/prs/internet/profSearch/main.action?hpe=SPC"
home_page_content = urllib.request.urlopen(home_page)
home_page_html = BeautifulSoup(home_page_content, 'lxml')

In [28]:
home_page_html

<html>
<head>
<title>Professional Registration System</title>
<script src="https://assets.wogaa.sg/scripts/wogaa.js"></script>
<meta content="IE=9" http-equiv="X-UA-Compatible" id="metaCompatible"/><!-- added for PRS-7074 on Apr23,2014  -->
<link href="/prs/css/site.css" rel="stylesheet" type="text/css"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="noindex, nofollow" name="robots"/>
</head>
<script language="Javascript1.1" type="text/javascript">
    var today = new Date();
    var expired = new Date(today.getTime() - 48 * 60 * 60 * 1000); // less 2 days
    var bikky = document.cookie;
	var isInSameFrame = true;

    function deleteCookie(attribute) {
        document.cookie = attribute + "=null; path=/; expires="
                + expired.toGMTString();
        bikky = document.cookie;
    }

    function dode() {
        deleteCookie("AA_JSessionInfo_Cookie_IP");
        deleteCookie("AA_JSessionInfo_Cookie");
    }
</script>
<style> html{disp

This is the frame we will be looking at:  

frame name="msg_main" noresize="" scrolling="auto" src="/prs/internet/profSearch/showSearchSummaryByName.action"/

#### We now need to switch to this msg_main frame, with the use of Chromium web driver

In [29]:
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
driver = webdriver.Chrome(options=options) # Initiate webdriver

In [30]:
driver.get(home_page)
driver.switch_to.frame(driver.find_element_by_name('msg_main'))

In [31]:
frame_html = BeautifulSoup(driver.page_source, 'lxml')

In [32]:
frame_html

<html lang="en" style="height:100%" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml"><head>
<title>Professionals Search</title>
<meta content="IE=9" http-equiv="X-UA-Compatible"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="no-cache" http-equiv="Cache-Control"/>
<meta content="no-cache" http-equiv="Pragma"/>
<meta content="0" http-equiv="Expires"/>
<meta content="noindex, nofollow" name="robots"/>
<script async="" src="https://assets.wogaa.sg/snowplow/2.14.0/sp.js"></script><script src="https://assets.wogaa.sg/scripts/wogaa.js"></script><script async="" src="https://assets.wogaa.sg/scripts/wogaa.js?url=https%3A%2F%2Fprs.moh.gov.sg%2Fprs%2Finternet%2FprofSearch%2FshowSearchSummaryByName.action"></script>
<link href="/prs/css/internet/spc/header.css" media="screen" rel="stylesheet" type="text/css"/>
<link href="/prs/css/internet/public-header.css" media="screen" rel="stylesheet" type="text/css"/>
<!--[if lt IE 8]><link rel="stylesheet" type="text

The Search button is located within <input type = "button" value="Search" name="btnSearch" onclick"resubmit();">

In [33]:
search_button = driver.find_elements_by_xpath("//input[@name='btnSearch']")[0]
search_button.click()

In [34]:
results_html = BeautifulSoup(driver.page_source, 'lxml')

In [35]:
results_html

<html lang="en" style="height:100%" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml"><head>
<title>Professionals Search</title>
<meta content="IE=9" http-equiv="X-UA-Compatible"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="no-cache" http-equiv="Cache-Control"/>
<meta content="no-cache" http-equiv="Pragma"/>
<meta content="0" http-equiv="Expires"/>
<meta content="noindex, nofollow" name="robots"/>
<script async="" src="https://assets.wogaa.sg/snowplow/2.14.0/sp.js"></script><script src="https://assets.wogaa.sg/scripts/wogaa.js"></script><script async="" src="https://assets.wogaa.sg/scripts/wogaa.js?url=https%3A%2F%2Fprs.moh.gov.sg%2Fprs%2Finternet%2FprofSearch%2FgetSearchSummaryByName.action"></script>
<link href="/prs/css/internet/spc/header.css" media="screen" rel="stylesheet" type="text/css"/>
<link href="/prs/css/internet/public-header.css" media="screen" rel="stylesheet" type="text/css"/>
<!--[if lt IE 8]><link rel="stylesheet" type="text/

At this stage, we are able to have the first page of the search results. This is the results from clicking the Search button directly (i.e. no criteria for Search), thus it will display all the pharmacists records (3396 records as at 26 Oct 2020)

### (iii) Testing on Single Record (Pharmacist)

In [36]:
# Get list of pharmacist PRN IDs on current page
results_html_text = results_html.get_text() # Convert HTML to text so that we can run regex
prn_id_list = re.findall("P[0-9]{5}[A-Z]{1}", results_html_text)

In [37]:
prn_id_list

['P02392B',
 'P03376F',
 'P03995J',
 'P03945D',
 'P04296Z',
 'P03956Z',
 'P01062F',
 'P04294C',
 'P04307I',
 'P03890C']

In [38]:
# Try out with one record first
prn_id_single_list = ['P02392B']

In [39]:
for index, prn_id in enumerate(prn_id_single_list):
    driver.find_element_by_xpath(f"//a[contains(@onclick,'{prn_id}')]").click()
    pcist_xml = BeautifulSoup(driver.page_source, 'lxml')
    pcist_name =  driver.find_element_by_xpath("//div[@class='table-head']").text
    
#   test_text = driver.find_element_by_xpath("//td[@class='no-border table-title']/following-sibling::td").text # This is to find next sibling
    all_fields = driver.find_elements_by_xpath("//td[@class='no-border table-data']") # Using find elementS since there are multiple elements

In [40]:
pcist_xml

<html lang="en" style="height:100%" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml"><head>
<title>Professionals Search</title>
<meta content="IE=9" http-equiv="X-UA-Compatible"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="no-cache" http-equiv="Cache-Control"/>
<meta content="no-cache" http-equiv="Pragma"/>
<meta content="0" http-equiv="Expires"/>
<meta content="noindex, nofollow" name="robots"/>
<script async="" src="https://assets.wogaa.sg/snowplow/2.14.0/sp.js"></script><script src="https://assets.wogaa.sg/scripts/wogaa.js"></script><script async="" src="https://assets.wogaa.sg/scripts/wogaa.js?url=https%3A%2F%2Fprs.moh.gov.sg%2Fprs%2Finternet%2FprofSearch%2FgetSearchDetails.action"></script>
<link href="/prs/css/internet/spc/header.css" media="screen" rel="stylesheet" type="text/css"/>
<link href="/prs/css/internet/public-header.css" media="screen" rel="stylesheet" type="text/css"/>
<!--[if lt IE 8]><link rel="stylesheet" type="text/css" h

In [41]:
pcist_name

'AARON CHEW SONG TA (P02392B)'

In [42]:
pcist_data = []
for field in all_fields:
    pcist_data.append(field.text)
    
#  # Remove empty strings - DO NOT do this
# pcist_data = list(filter(None, pcist_data)) 

In [43]:
pcist_data

['P02392B',
 '',
 '26/08/2014',
 '',
 'Full Registration',
 'Active',
 '01/01/2019',
 '31/12/2020',
 'B Pharm (Hons), Universiti Sains Malaysia, Malaysia, 2010',
 'Guardian Health & Beauty',
 '21 TAMPINES NORTH DRIVE 2\n#03 - 01 Singapore 528765',
 '68918000',
 'Google Map One Map']

Note: Do not remove empty strings as they are values for certain fields e.g. Registration End Date

In [44]:
pcist_data[2]

'26/08/2014'

In [45]:
back_to_results_link = driver.find_element_by_link_text('Back to Search Results')
back_to_results_link.click()

___

<a name="section2"></a>
## Section 2 - Running Full Script
If want to execute the entire script, can start from here. The earlier sections are to showcase the experimentation.

### (i) Import dependencies

In [72]:
import urllib
import re
import time
import pandas as pd
import os

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import ElementClickInterceptedException
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from selenium.common.exceptions import NoSuchElementException

### (ii) Initiate web driver

In [73]:
# Define healthcare professional (HCP) body 
# e.g. SPC for Singapore Pharmacy Council, SDC for Singapore Dentist Council, SMC for Singapore Medical Council etc
hcp_body = 'SPC'

In [74]:
# Set wait times
waittime = 20
sleeptime = 2

In [75]:
# Initiate web driver
try:
    driver.close() # Close any existing windows from drivers
except Exception:
    pass

# Access the professional registration system (PRS) homepage for the specified healthcare professional body
home_page = f"https://prs.moh.gov.sg/prs/internet/profSearch/main.action?hpe={hcp_body}"

# Set webdriver options
options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('ignore-certificate-errors')

# Initiate webdriver
driver = webdriver.Chrome(options=options) 

# Get driver to retrieve homepage
driver.get(home_page)

# Switch to frame which contains the HTML for the search section
driver.switch_to.frame(driver.find_element_by_name('msg_main'))

# Click Search button to load all results
WebDriverWait(driver, waittime).until(EC.presence_of_element_located((By.XPATH, "//input[@name='btnSearch']"))).click()

# Sleep a short while for page loading to be fully completed
time.sleep(sleeptime)

### (iii) Setting up key functions

#### Setup master list CSV to store all the records

In [76]:
file_name = 'master_list.csv'

if os.path.isfile(f'./{file_name}'):
    print('Master File already exists')
else:
    column_names = ['name','reg_number','reg_date','reg_end_date','reg_type','practice_status','cert_start_date',
                    'cert_end_date','qualification','practice_place_name','practice_place_address','practice_place_phone']
    df_template = pd.DataFrame(columns = column_names)
    df_template.to_csv(f'{file_name}', header=True)
    print('Created new master list file')

Master File already exists


#### Get current page number

In [77]:
# Get current page number
def get_current_page():
    current_page_elem = WebDriverWait(driver, waittime).until(EC.presence_of_element_located((By.XPATH, "//label[@class='pagination_selected_page']"))).text
    current_page_num = int(current_page_elem)
    return current_page_num

#### Get absolute last page number

In [78]:
def get_absolute_last_page():
    # Find all elements with pagination class (since it contains page numbers)
    WebDriverWait(driver, waittime).until(EC.presence_of_element_located((By.XPATH, "//a[@class='pagination']")))
    all_pages = driver.find_elements_by_xpath("//a[@class='pagination']")

    # Get the final element, which corresponds to 'Last' hyperlink (which will go to the last page number)
    last_elem = all_pages[-1].get_attribute('href')

    # Keep only the number of last page
    last_page_num = int(re.sub("[^0-9]", "", last_elem))
    
    return last_page_num

#### Extract all data from detailed information page of the selected healthcare professional (upon clicking View More Details)

In [79]:
def gen_hcp_dict():
    WebDriverWait(driver, waittime).until(EC.presence_of_element_located((By.XPATH, "//div[@class='table-head']")))
    hcp_name =  driver.find_element_by_xpath("//div[@class='table-head']").text    
    all_fields = driver.find_elements_by_xpath("//td[@class='no-border table-data']") # Using find elementS since there are multiple elements      
    hcp_data = []
    
    for field in all_fields:
        hcp_data.append(field.text)

    hcp_dict = {}
    hcp_dict['name'] = hcp_name
    hcp_dict['reg_number'] = hcp_data[0]
    # hcp_data[1] is just a blank space, so it can be ignored
    hcp_dict['reg_date'] = hcp_data[2]
    hcp_dict['reg_end_date'] = hcp_data[3]
    hcp_dict['reg_type'] = hcp_data[4]
    hcp_dict['practice_status'] = hcp_data[5]
    hcp_dict['cert_start_date'] = hcp_data[6]
    hcp_dict['cert_end_date'] = hcp_data[7]
    hcp_dict['qualification'] = hcp_data[8]
    hcp_dict['practice_place_name'] = hcp_data[9]
    hcp_dict['practice_place_address'] = hcp_data[10]
    hcp_dict['practice_place_phone'] = hcp_data[11]
    
    return hcp_dict

#### Get current pagination range

In [80]:
def get_current_pagination_range():
    WebDriverWait(driver, waittime).until(EC.presence_of_element_located((By.XPATH, "//a[@class='pagination']")))
    all_pages = driver.find_elements_by_xpath("//a[@class='pagination']")
    driver.implicitly_wait(1)
    pagination_range_on_page = []
    for elem in all_pages:
        if elem.text.isnumeric():
            pagination_range_on_page.append(int(elem.text))
            driver.implicitly_wait(1)
        else:
            pass
    driver.implicitly_wait(1)
    return pagination_range_on_page

#### Click last pagination number on current page

In [81]:
def click_last_pagination_num(pagination_range):
    last_pagination_num = pagination_range[-1] 
    driver.implicitly_wait(1)
    WebDriverWait(driver, waittime).until(EC.presence_of_element_located((By.LINK_TEXT, f'{last_pagination_num}'))).click() 
    driver.implicitly_wait(1)

#### Click first pagination number on current page
- Since we will go reverse from the last page when the target page number is more than the midway mark

In [82]:
def click_first_pagination_num(pagination_range):
    first_pagination_num = pagination_range[0] 
    driver.implicitly_wait(1)
    WebDriverWait(driver, waittime).until(EC.presence_of_element_located((By.LINK_TEXT, f'{first_pagination_num}'))).click() 
    driver.implicitly_wait(1)

#### Check whether target page is within current range
- If yes, click on that target page  
- If not, continue clicking the last pagination number (until the target page number is exposed)
- AVOID using EC.element_to_be_clickable (seems abit buggy)

In [83]:
def locate_target_page(target_page):
    
    last_page_num = get_absolute_last_page()
    midway_point = last_page_num/2

    if target_page < midway_point: # If target page is in the first half, then start clicking from the start
        current_page_num = get_current_page()
    
        if current_page_num == target_page:
            pass
        else:            
            pagination_range = get_current_pagination_range()

            while target_page not in pagination_range:
                driver.implicitly_wait(1)
                click_last_pagination_num(pagination_range) # If target page is not in pagination range, keep clicking last pagination number to go further down the list
                current_page_num = get_current_page()
                pagination_range = get_current_pagination_range()
                driver.implicitly_wait(1)
            else:
                WebDriverWait(driver, waittime).until(EC.presence_of_element_located((By.LINK_TEXT, f"{target_page}"))).click() # Once target page is in pagination page, go to the target page

    else: # If target page is in later half of list, then go to Last page and move backwards (This saves alot of time)
        WebDriverWait(driver, waittime).until(EC.presence_of_element_located((By.LINK_TEXT, 'Last'))).click()  # Go to last page
        time.sleep(sleeptime)
        current_page_num = get_current_page()

        if current_page_num == target_page:
            pass
        else:           
            pagination_range = get_current_pagination_range()

            while target_page not in pagination_range:
                driver.implicitly_wait(2)
                click_first_pagination_num(pagination_range)
                current_page_num = get_current_page()
                pagination_range = get_current_pagination_range()
            else:
                WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.LINK_TEXT, f"{target_page}"))).click() # Once target page is in pagination page, go to the target page

#### Create script that automates the navigation through the website portal

In [84]:
def full_scrape(target_page):
    
    last_page_num = get_absolute_last_page()
    driver.implicitly_wait(1)
    
    while target_page != last_page_num:
        locate_target_page(target_page)
        print('Starting with target page ' + str(target_page))
        
        # Retrieve the HTML from that search page
        target_page_html = driver.find_element_by_xpath("//body").get_attribute('outerHTML')
        driver.implicitly_wait(1)
        
        # Find the list of IDs on that page, and keep the unique IDs
        all_ids = re.findall("P[0-9]{5}[A-Z]{1}", target_page_html)
        id_list = list(dict.fromkeys(all_ids))

        for index, hcp_id in enumerate(id_list): # Tracking the healthcare professional (HCP)'s ID
            # Click 'View More Details' link to access the info page for that professional with the specific ID
            WebDriverWait(driver, waittime).until(EC.presence_of_element_located((By.XPATH, f"//a[contains(@onclick,'{hcp_id}')]"))).click()
            
            # Scrape the relevant data from the pharmacist info page into a dictionary
            hcp_dict = gen_hcp_dict()
            
            # Convert dict to pandas dataframe (Need to pass an index since we are passing scalar values)
            df_hcp_dict = pd.DataFrame(hcp_dict, index=[0])
            
            # Append df to existing master list csv
            df_hcp_dict.to_csv('master_list.csv', mode='a', header=False)
            
            # Print the row that has been scraped (To track progress)
            print(f'Scraped row {index+1} of {target_page}')
                  
            # After scrapping all records on that page, update (+1) the next target page to go to 
            if index == len(id_list):
                print(f'Completed scraping for page {target_page}')
                target_page += 1
                print('Updated target page ' + str(target_page))
            else:
                pass

            # Head back to home page by clicking the Back to Search Results link
            WebDriverWait(driver, waittime).until(EC.presence_of_element_located((By.LINK_TEXT, 'Back to Search Results'))).click() 
            
            # Go to the latest target page
            locate_target_page(target_page)
            
    else:
        locate_target_page(target_page)
        print('Working on last page')
        target_page_html = driver.find_element_by_xpath("//body").get_attribute('outerHTML')
        driver.implicitly_wait(1)
        all_ids = re.findall("P[0-9]{5}[A-Z]{1}", target_page_html)
        id_list = list(dict.fromkeys(all_ids))

        for index, hcp_id in enumerate(id_list): 
            WebDriverWait(driver, waittime).until(EC.presence_of_element_located((By.XPATH, f"//a[contains(@onclick,'{hcp_id}')]"))).click()           
            hcp_dict = gen_hcp_dict()
            df_hcp_dict = pd.DataFrame(hcp_dict, index=[0])
            df_hcp_dict.to_csv('master_list.csv', mode='a', header=False)
            print(f'Scraped row {index+1} of {target_page}')
                  
            if index == len(id_list)-1:
                print(f'Completed scraping for page {target_page}')
                print('Mission Complete')
            else:
                pass
            
            WebDriverWait(driver, waittime).until(EC.presence_of_element_located((By.LINK_TEXT, 'Back to Search Results'))).click() 
            locate_target_page(target_page)

### (iv) Kickstart automated web scraping

In [85]:
# Starting off with selected target page. target_page = 1 if starting from the beginning
target_page = 340

In [86]:
# Show time started
print(time.strftime("%H:%M:%S", time.localtime()))

09:18:01


In [87]:
# Run web scraping
full_scrape(target_page)

Working on last page
Scraped row 1 of 340
Scraped row 2 of 340
Scraped row 3 of 340
Scraped row 4 of 340
Scraped row 5 of 340
Scraped row 6 of 340
Completed scraping for page 340
Mission Complete


In [88]:
# Show what time completed
print(time.strftime("%H:%M:%S", time.localtime()))

09:18:42


### Data Cleaning of Master List

In [104]:
df_master = pd.read_csv('master_list.csv')

In [105]:
len(df_master)

3644

In [106]:
df_master.head()

Unnamed: 0.1,Unnamed: 0,name,reg_number,reg_date,reg_end_date,reg_type,practice_status,cert_start_date,cert_end_date,qualification,practice_place_name,practice_place_address,practice_place_phone
0,0,AARON CHEW SONG TA (P02392B),P02392B,26/08/2014,,Full Registration,Active,01/01/2019,31/12/2020,"B Pharm (Hons), Universiti Sains Malaysia, Mal...",Guardian Health & Beauty,21 TAMPINES NORTH DRIVE 2\n#03 - 01 Singapore ...,68918000
1,1,AARON JASON MARTIN (P03376F),P03376F,28/03/2016,,Full Registration,Inactive,01/01/2019,31/12/2020,"BSc (Pharm) (Hons), National University of Sin...",Not Working,Not Working,Not Working
2,2,AARON YAP JUN YI (P03995J),P03995J,01/01/2019,,Full Registration,Active,01/01/2019,31/12/2020,"BSc (Pharm) (Hons), National University of Sin...",National University Hospital,5 LOWER KENT RIDGE ROAD\nSingapore 119074,
3,3,ABDUL HAMEED S/O ANWARUDEEN (P03945D),P03945D,01/01/2019,,Full Registration,Active,01/01/2019,31/12/2020,"BSc (Pharm) (Hons), National University of Sin...",National University Hospital,5 LOWER KENT RIDGE ROAD\nSingapore 119074,
4,4,ADRIAN TOH SHU KHING (P04296Z),P04296Z,09/01/2020,08/01/2021,Conditional Registration,Active,09/01/2020,31/12/2020,"B Pharm, University of South Australia, Austra...",Khoo Teck Puat Hospital,90 YISHUN CENTRAL\nSingapore 768828,65558000


In [107]:
df_master.columns

Index(['Unnamed: 0', 'name', 'reg_number', 'reg_date', 'reg_end_date',
       'reg_type', 'practice_status', 'cert_start_date', 'cert_end_date',
       'qualification', 'practice_place_name', 'practice_place_address',
       'practice_place_phone'],
      dtype='object')

In [108]:
# Drop leftmost column
df_master.drop(columns = [df_master.columns[0]], axis = 1, inplace = True)

In [109]:
# Remove duplicates
df_master.drop_duplicates(keep='first', inplace=True)

In [110]:
len(df_master)

3396

In [112]:
df_master.to_excel('master_list_final.xlsx', index = False)

___

<a name="section3"></a>
## Section 3 - References  

- https://selenium-python.readthedocs.io/locating-elements.html
- https://stackoverflow.com/questions/49171370/python-selenium-how-can-click-on-onclick-elements
- https://stackoverflow.com/questions/23924008/get-the-text-from-multiple-elements-with-the-same-class-in-selenium-for-python
- https://selenium-python.readthedocs.io/waits.html
- https://stackoverflow.com/questions/28778142/selenium-webdriver-give-nosuchframeexception
- https://medium.com/better-programming/how-to-scrape-multiple-pages-of-a-website-using-a-python-web-scraper-4e2c641cff8
- https://towardsdatascience.com/how-to-use-selenium-to-web-scrape-with-example-80f9b23a843a