# Data Extraction with Selenium
In this tutorial, we discuss how to use Selenium to extract data from the web.  Please see https://selenium-python.readthedocs.io for more details.

## Installation
We first install selenium package.

        pip install selenium

In [16]:
from selenium import webdriver
import time
import os

In [17]:
browser = webdriver.Chrome()

## Browsing a webpage
Once the browser starts, we can tell it to visit a webpage.

In [18]:
url = 'https://www.google.com'

In [19]:
browser.get(url=url)

In [20]:
html = browser.execute_script("return document.documentElement.outerHTML")
html[:3000]

'<html itemscope="" itemtype="http://schema.org/WebPage" lang="th"><head><meta charset="UTF-8"><meta content="origin" name="referrer"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script src="https://apis.google.com/_/scs/abc-static/_/js/k=gapi.gapi.en.7LPvRDgzcqA.O/m=gapi_iframes,googleapis_client/rt=j/sv=1/d=1/ed=1/am=AACA/rs=AHpOoo9wdgl3D0Cd5pn6O1gZXHwWDc_oTg/cb=gapi.loaded_0" nonce="" async=""></script><script nonce="">window._hst=Date.now();performance&&performance.mark&&performance.mark("SearchHeadStart");</script><script nonce="">(function(){var _g={kEI:\'vScKZ9-3CPmanesPmrS4kAw\',kEXPI:\'31\',kBL:\'hBAu\',kOPI:89978449};(function(){var a;((a=window.google)==null?0:a.stvsc)?google.kEI=_g.kEI:window.google=_g;}).call(this);})();(function(){google.sn=\'webhp\';google.kHL=\'th\';})();(function(){\nvar h=this||self;function l(){return window.google!==void 0&&window.google.kOPI!==void 0&&window.google.kOPI!==0?wind

## Interact with a webpage
When the page is loaded, we can interact with all elements in the webpage.  In this example, we will perform a search for a particular keyword in Google.  We will have to locate the correct element and then send the proper keys.

In [21]:
from selenium.webdriver.common.by import By

In [22]:
q_element = browser.find_element(By.CSS_SELECTOR, 'textarea[name=q]')
q_element.clear()
q_element.send_keys('ประเทศไทย')


In [23]:
q_element.send_keys(u'\ue007')

## Navigate the webpage
We can navigate the current webpage, similar to Beautiful Soup.  Selenium supports several navigation approaches.

In [27]:
all_link = browser.find_elements(By.CSS_SELECTOR, '.g a')

In [28]:
for link in all_link:
    print('[link text]', link.text)
    print('[link href]', link.get_attribute('href'))
    print('---')

[link text] ประเทศไทย
Wikipedia
https://th.wikipedia.org › wiki › ประเทศไทย
[link href] https://th.wikipedia.org/wiki/%E0%B8%9B%E0%B8%A3%E0%B8%B0%E0%B9%80%E0%B8%97%E0%B8%A8%E0%B9%84%E0%B8%97%E0%B8%A2
---
[link text] 
[link href] https://th.wikipedia.org/wiki/%E0%B8%9B%E0%B8%A3%E0%B8%B0%E0%B9%80%E0%B8%97%E0%B8%A8%E0%B9%84%E0%B8%97%E0%B8%A2
---
[link text] จังหวัดของประเทศไทย
[link href] https://th.wikipedia.org/wiki/%E0%B8%88%E0%B8%B1%E0%B8%87%E0%B8%AB%E0%B8%A7%E0%B8%B1%E0%B8%94%E0%B8%82%E0%B8%AD%E0%B8%87%E0%B8%9B%E0%B8%A3%E0%B8%B0%E0%B9%80%E0%B8%97%E0%B8%A8%E0%B9%84%E0%B8%97%E0%B8%A2
---
[link text] จำนวน ประชากร
[link href] https://th.wikipedia.org/wiki/%E0%B8%A3%E0%B8%B2%E0%B8%A2%E0%B8%8A%E0%B8%B7%E0%B9%88%E0%B8%AD%E0%B8%9B%E0%B8%A3%E0%B8%B0%E0%B9%80%E0%B8%97%E0%B8%A8%E0%B9%80%E0%B8%A3%E0%B8%B5%E0%B8%A2%E0%B8%87%E0%B8%95%E0%B8%B2%E0%B8%A1%E0%B8%88%E0%B8%B3%E0%B8%99%E0%B8%A7%E0%B8%99%E0%B8%9B%E0%B8%A3%E0%B8%B0%E0%B8%8A%E0%B8%B2%E0%B8%81%E0%B8%A3
---
[link text] ประชาธิปไตยอันมีพระมหาก

In [29]:
all_link[0].click()

In [30]:
all_toc = browser.find_elements(By.CSS_SELECTOR, 'li[class^="vector-toc-list-item"]')

In [31]:
for toc in all_toc:
    a = toc.find_element(By.CSS_SELECTOR, 'a')
    print('[text]', a.text)
    print('[class]', toc.get_attribute('class'))
    print('[href]', a.get_attribute('href'))
    print('---')

[text] บทนำ
[class] vector-toc-list-item vector-toc-level-1 vector-toc-level-1-active vector-toc-list-item-active
[href] https://th.wikipedia.org/wiki/%E0%B8%9B%E0%B8%A3%E0%B8%B0%E0%B9%80%E0%B8%97%E0%B8%A8%E0%B9%84%E0%B8%97%E0%B8%A2#
---
[text] ชื่อเรียก
[class] vector-toc-list-item vector-toc-level-1
[href] https://th.wikipedia.org/wiki/%E0%B8%9B%E0%B8%A3%E0%B8%B0%E0%B9%80%E0%B8%97%E0%B8%A8%E0%B9%84%E0%B8%97%E0%B8%A2#%E0%B8%8A%E0%B8%B7%E0%B9%88%E0%B8%AD%E0%B9%80%E0%B8%A3%E0%B8%B5%E0%B8%A2%E0%B8%81
---
[text] ประวัติศาสตร์
[class] vector-toc-list-item vector-toc-level-1
[href] https://th.wikipedia.org/wiki/%E0%B8%9B%E0%B8%A3%E0%B8%B0%E0%B9%80%E0%B8%97%E0%B8%A8%E0%B9%84%E0%B8%97%E0%B8%A2#%E0%B8%9B%E0%B8%A3%E0%B8%B0%E0%B8%A7%E0%B8%B1%E0%B8%95%E0%B8%B4%E0%B8%A8%E0%B8%B2%E0%B8%AA%E0%B8%95%E0%B8%A3%E0%B9%8C
---
[text] 
[class] vector-toc-list-item vector-toc-level-2
[href] https://th.wikipedia.org/wiki/%E0%B8%9B%E0%B8%A3%E0%B8%B0%E0%B9%80%E0%B8%97%E0%B8%A8%E0%B9%84%E0%B8%97%E0%B8%A2#%E0%B8%

[class] vector-toc-list-item vector-toc-level-1
[href] https://th.wikipedia.org/wiki/%E0%B8%9B%E0%B8%A3%E0%B8%B0%E0%B9%80%E0%B8%97%E0%B8%A8%E0%B9%84%E0%B8%97%E0%B8%A2#%E0%B9%81%E0%B8%AB%E0%B8%A5%E0%B9%88%E0%B8%87%E0%B8%82%E0%B9%89%E0%B8%AD%E0%B8%A1%E0%B8%B9%E0%B8%A5%E0%B8%AD%E0%B8%B7%E0%B9%88%E0%B8%99
---


In [32]:
all_toc[2].find_element(By.CSS_SELECTOR, 'a').click()

## End browsing session

In [33]:
browser.quit()