# Web Scraping using Selenium and Python

Selenium refers to a number of different open-source projects used for browser automation. It supports bindings for all major programming languages, including Python.

### Installation

We will use Chrome in our example, so make sure you have it installed on your local machine:

1. Chrome download page
2. Chrome driver binary
3. selenium package

In [None]:
#! pip install selenium

### Include Libraries

In [1]:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

In [2]:
options = Options()
# options.headless = True
# options.add_argument("--window-size=1920,1200")

DRIVER_PATH = 'chromedriver_win32\\chromedriver.exe'
driver = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)

driver.get("https://trove.nla.gov.au/search?keyword=skull")

In [3]:
popup = driver.switch_to.active_element
popup.find_element_by_xpath('//*[@id="culturalModal___BV_modal_footer_"]/div/div/div[2]/button').click()

### XPath

XPath is a technology that uses path expressions to select nodes or node- sets in an XML document (or HTML document).

Imagine we want to extract all of the links.

So, we will use one simple XPath expression: //a. And we will use LXML to run it. LXML is a fast and easy to use XML and HTML processing library that supports XPATH.

In [None]:
# ! pip install lxml

In [4]:
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait

topics_xpath = '/html/body/div/div[1]/div[4]/div[1]/div[4]/div/section[1]/div[4]/div/a'
WebDriverWait(driver, 10).until(expected_conditions.visibility_of_element_located((By.XPATH, topics_xpath)))
elem = driver.find_element_by_xpath(topics_xpath)
driver.execute_script("arguments[0].scrollIntoView(true);", elem)
driver.find_element_by_xpath(topics_xpath).click()

In [5]:
link_xpath = '/html/body/div/div[1]/div[4]/div[1]/div[4]/div[1]/section/div[3]/div[1]/div/div[2]/div[1]/h3/a'
WebDriverWait(driver, 10).until(expected_conditions.visibility_of_element_located((By.XPATH, link_xpath)))
elem = driver.find_element_by_xpath(link_xpath)
driver.execute_script("arguments[0].scrollIntoView(true);", elem)
elem.click()

In [6]:
text_xpath = '//*[@id="ui-layout-text"]/div[2]'
WebDriverWait(driver, 10).until(expected_conditions.visibility_of_element_located((By.XPATH, text_xpath)))
elem = driver.find_element_by_xpath(text_xpath)
driver.execute_script("arguments[0].scrollIntoView(true);", elem)

In [7]:
elem.text

'A FRACTURED SKULL.\nA man named Hay, lodging at No. 9 Mar-\ngaret-street, walked out of his bedroom win-\ndow last night on to the roof, and fell a dis-\ntance of about 20 feet. He was conveyed to\nSydney Hospital by the Civil Ambulance, and\nadmitted suffering from a fractured skull.'

In [None]:
# driver.quit()

References:

https://www.scrapingbee.com/blog/selenium-python/

https://www.scrapingbee.com/blog/web-scraping-101-with-python/