# Selenium

Selenium is a nice tool for creating web scrappers. It can be used with C, Java and Python libraries and it is very popular for testing websites using browsers like Firefox, Chrome or Phantom.

In this small module we will review the basics of the usage of the Selenium python library

In [1]:
# Getting a website URL

from selenium import webdriver

browser = webdriver.Firefox()
url = "http://www.google.com"
browser.get(url)

In [7]:
# Let's save a screencapture of the website

browser.save_screenshot("screencaptures/google_website.png")

True

With the command above, we retrieve all the information from the website url.

Selenium have different tools to retrieve some parts of a website, let's extract the Google logo using its unique id

In [6]:
# Searchs the logo by its unique id
logo = browser.find_element_by_id("hplogo")

# Saves a screencapture
logo.screenshot("screencaptures/google_logo.png")

True

In [11]:
# Let's retrieve all the elements with a custom class id

elements = browser.find_elements_by_class_name("ctr-p")
for (i, element) in enumerate(elements):
    element.screenshot("screencaptures/ctr-p-{}.png".format(i))

We can implement also different logic routines to interact with the website, let's perform a google search

In [18]:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys
import time

# Loads the Url
browser.get(url)

# Waits at least 5 seconds to load the website
time.sleep(5)

# Searches for the text box and performs a search
text_box = browser.find_element_by_id("lst-ib")
text_box.send_keys("Pereira, the most beatiful city from Colombia")
text_box.send_keys(Keys.ENTER)

nice!! now let's look into the first result

In [25]:
element = browser.find_element_by_class_name("bkWMgd")
element.click()

Nice, but why should we wait 5 seconds always until the page is loaded? It is a waste of time but we need to be sure that our text box has been loaded, so let's use th wait selenium utility

In [26]:
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

timeout = 5

try:
    # Loads the website
    browser.get(url)
    
    # Waits up to 5 seconds to load the website
    element = WebDriverWait(browser, timeout).until(
                EC.presence_of_element_located((By.ID, "lst-ib"))
            )
    # Searches for the text box and performs a search
    text_box = browser.find_element_by_id("lst-ib")
    text_box.send_keys("Pereira, the most beatiful city from Colombia")
    text_box.send_keys(Keys.ENTER)
except:
    print("[ERROR] could not get the textbox to make the search")
    pass


In [28]:
# Let's come back to the initial page

browser.back()

In [29]:
# Let's come back to the search results page

browser.forward()

In [46]:
# Let's move inside our webpage

elements = browser.find_elements_by_class_name("rc")
element = elements[0]
for element in elements:
    (x, y) = (element.location['x'], element.location['y'])
    js_action = "window.scrollTo({}, {});".format(x, y)
    browser.execute_script(js_action)
    time.sleep(3)

Nice, these are the basics Of selenium. If you wish to know more, just reference the official website docs: http://selenium-python.readthedocs.io/

In [47]:
# Close the actual browser instance

browser.quit()