<p>This guide will introduce all common basic functions used when using Selenium for web scraping</p>
<p>The following is a content list of the functions</p>
<p>1. Import Libraries</p>
<p>2. Start Driver Session</p>
<p>3. Navigate to a URL</p>
<p>4. Title Attribute</p>
<p>5. Find Single Element</p>
<p>6. Find URL of element</p>
<p>7. Using sub-element</p>
<p>8. Finding multiple elements</p>
<p>9. Back to Last Page</p>
<p>10. Opening new tab</p>
<p>11. Switching Tabs</p>
<p>12. Close Tab</p>
<p>13. Take Screenshot for the current page</p>


<h2>1.Import Libraries</h2>
<p>Always remember to import the libraries</p>

In [1]:
from selenium import webdriver

<h2>2. Start Driver Session</h2>
<p>Remember to give the executable path of the Chrome driver, otherwise you may run into error when running this application in other directory</p>

In [3]:
driver = webdriver.Chrome(executable_path='/Users/haha1994/Desktop/Techvalley/WebScraping/Lesson2/chromedriver')

<h2>3. Navigate to a URL</h2>

In [4]:
driver.get("https://finance.yahoo.com/quote/0388.HK/")

<h2>4. Title Attribute</h2>
<p>This attribute allow you to ensure you are at the correct page</p>

In [5]:
print(driver.title)

0388.HK 249.400 -2.200 -0.87% : HKEX - Yahoo Finance


<h2>5. Find Single Element</h2>
<p>There are multiple ways for selenium to find the element. The following are the commonly used functions</p>
<p>1. Xpath</p>
<p>2. Tag Name</p>
<p>3. Class Name</p>
<p>4. ID</p>

<p>There is no absolute best one since each site designed in a different way. Generally, Xpath is the most universal function which is a positional way to find an element</p>

In [8]:
xpath_element = driver.find_element_by_xpath('//*[@id="quote-summary"]/div[1]/table/tbody/tr[8]/td[2]/span')
print(xpath_element.text)

4,697,839


In [13]:
tagname_element = driver.find_element_by_tag_name("a")
print(tagname_element.text)
print(tagname_element.get_attribute("href"))

Home
https://www.yahoo.com/


In [16]:
classname_element = driver.find_element_by_class_name('nr-applet-nav-item')
print(classname_element.text)
print(classname_element.get_attribute('href'))

Finance Home
https://finance.yahoo.com/


In [17]:
id_element = driver.find_element_by_id('quote-header-info')
print(id_element.text)

Hong Kong Exchanges and Clearing Limited (0388.HK)
HKSE - HKSE Delayed Price. Currency in HKD
Add to watchlist
249.400-2.200 (-0.87%)
At close: May 31 4:08PM HKT


<h2>6. Find URL of element</h2>
<p>You will find the URL from the href attribute of a web element</p>

In [19]:
classname_element = driver.find_element_by_class_name('nr-applet-nav-item')
print(classname_element.get_attribute('href'))

https://finance.yahoo.com/


<h2>7. Using sub-element</h2>
<p>Sometimes, you will find the website creator repeat themselves with the same classname or even id. You can always narrow down the session you would like to captured by first storing a parent element to memory and find element inside the parent element</p>

In [26]:
banner_info = id_element = driver.find_element_by_id('quote-header-info')
print(banner_info.text)

Hong Kong Exchanges and Clearing Limited (0388.HK)
HKSE - HKSE Delayed Price. Currency in HKD
Add to watchlist
249.400-2.200 (-0.87%)
At close: May 31 4:08PM HKT


In [45]:
first_line = banner_info.find_element_by_xpath("//*[@class='D(ib) Mt(-5px) Mend(20px) Maw(56%)--tab768 Maw(52%) Ov(h) smartphone_Maw(85%) smartphone_Mend(0px)']")
first_line.text

'Hong Kong Exchanges and Clearing Limited (0388.HK)\nHKSE - HKSE Delayed Price. Currency in HKD'

<h2>8. Finding multiple elements</h2>
<p>Just modify "find_element" to "find_elements" in the find function, Selenium will find all the element that match the criteria and return a iterable Python List</p>

In [47]:
top_tabs = driver.find_elements_by_class_name("nr-applet-nav-item")

In [49]:
print(len(top_tabs))
top_tabs[:5]

28


[<selenium.webdriver.remote.webelement.WebElement (session="e39f78d61264f7fc1afd6538b7fdd6e0", element="0.6944154944271981-3")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e39f78d61264f7fc1afd6538b7fdd6e0", element="0.6944154944271981-6")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e39f78d61264f7fc1afd6538b7fdd6e0", element="0.6944154944271981-7")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e39f78d61264f7fc1afd6538b7fdd6e0", element="0.6944154944271981-8")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e39f78d61264f7fc1afd6538b7fdd6e0", element="0.6944154944271981-9")>]

In [50]:
for i in top_tabs:
    print(i.text)

Finance Home
Watchlists
My Portfolio
Screeners
Markets
Industries
Videos


















News




<h2>9. Back to Last Page</h2>

In [51]:
driver.back()

<h2>10. Opening new tab</h2>

In [53]:
driver.execute_script("window.open('https://finance.yahoo.com/quote/0005.HK/');")

<h2>11. Switching Tabs</h2>

In [57]:
print(len(driver.window_handles))
print(driver.title)
driver.switch_to.window(driver.window_handles[0])
print("-----------------")
print(driver.title)

2
0005.HK 64.050 -0.450 -0.70% : HSBC HOLDINGS - Yahoo Finance
-----------------
0388.HK 249.400 -2.200 -0.87% : HKEX - Yahoo Finance


<h2>12. Close Tab</h2>

In [58]:
driver.close()

<h2>13. Take Screenshot for the current page</h2>

In [60]:
driver.switch_to.window(driver.window_handles[0])
driver.save_screenshot("stock.png")

True