# Data Extraction with Selenium
In this tutorial, we discuss how to use Selenium to extract data from the web.  Please see https://selenium-python.readthedocs.io for more details.

## Installation
We first install selenium package.

In [None]:
%pip install selenium

In [None]:
from selenium import webdriver
import time
import os

In [None]:
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--window-size=1920,1080")

# Uncomment these two lines below if you run in colab. Though, you won't get interactive browser.
# chrome_options.add_argument('--headless=new')
# chrome_options.add_argument('--no-sandbox')

# Use a realistic User-Agent string for Chrome on Windows 10 (update if you want)
user_agent = ("Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
              "AppleWebKit/537.36 (KHTML, like Gecko) "
              "Chrome/117.0.0.0 Safari/537.36")
chrome_options.add_argument(f"user-agent={user_agent}")

browser = webdriver.Chrome(options=chrome_options)

## Browsing a webpage
Once the browser starts, we can tell it to visit a webpage.

In [None]:
url = 'https://www.duckduckgo.com'

In [None]:
browser.get(url=url)

In [None]:
html = browser.execute_script("return document.documentElement.outerHTML")
html[:3000]

## Interact with a webpage
When the page is loaded, we can interact with all elements in the webpage.  In this example, we will perform a search for a particular keyword in Google.  We will have to locate the correct element and then send the proper keys.

In [None]:
from selenium.webdriver.common.by import By

In [None]:
q_element = browser.find_element(By.CSS_SELECTOR, 'input[name=q]')
q_element.clear()
q_element.send_keys('ประเทศไทย')
q_element.send_keys(u'\ue007')

## Navigate the webpage
We can navigate the current webpage, similar to Beautiful Soup.  Selenium supports several navigation approaches.

In [None]:
# Wait 5 seconds for page to load
browser.implicitly_wait(5)

all_link = browser.find_elements(By.CSS_SELECTOR, 'li[data-layout=organic] h2 a')

In [None]:
for link in all_link:
    print('[link text]', link.text)
    print('[link href]', link.get_attribute('href'))
    print('---')

In [None]:
all_link[0].click()

In [None]:
all_toc = browser.find_elements(By.CSS_SELECTOR, 'li[class^="vector-toc-list-item"]')

In [None]:
for toc in all_toc:
    a = toc.find_element(By.CSS_SELECTOR, 'a')
    print('[text]', a.text)
    print('[class]', toc.get_attribute('class'))
    print('[href]', a.get_attribute('href'))
    print('---')

In [None]:
link = all_toc[2].find_element(By.CSS_SELECTOR, 'a')

print('[text]', link.text)
print('[class]', link.get_attribute('class'))
print('[href]', link.get_attribute('href'))

link.click()

## End browsing session

In [None]:
browser.quit()