# Selenium

Selenium (https://www.seleniumhq.org/) automates browsers. Primarily, it is for automating web applications for testing purposes, but it not limited to just that. Boring web-based tasks can (and should!) be automated as well.

In the code cell below, the statement
```python
from selenium import webdriver
```
is used to import the webdriver, which is always necessary for automating browsing.

In [None]:
from selenium import webdriver

### Using selenium to open a website

We first need to create an object for the web driver, which we use to open the page. In our class we use the Firefox webdriver, but others are available (see section 1.3 here: https://selenium-python.readthedocs.io/installation.html)

Note that on a school computer, you will need to specify the executable path to the webdriver, but this (likely) will not be the case on your personal computer.

```python
driver = webdriver.Firefox(executable_path='C:\geckodriver\geckodriver.exe')
```

Create the web driver object that controls the browser; this will open a Firefox brower with an empty url.

In [None]:
driver = webdriver.Firefox()

To browse to a page, simply use the *driver.get* method and specify the URL.

In [None]:
driver.get('http://www.easternct.edu')

### Locating elements

There are various methods available for finding single or multiple elements. The most relevant methods (for us) are given below. For a complete list, see https://selenium-python.readthedocs.io/locating-elements.html. These methods can be called using the *driver* or any selenium web element.

Several methods can be used to find a single element:

- *find_element_by_id(id)* returns the (first) element with the specified *id*
- *find_element_by_tag_name(name)* returns the first element with the specified tag *name*
- *find_element_by_name(name)* returns the first element with the specified *name*
- *find_element_by_class(class)* returns the first element with the specified *class*
- *find_element_by_css_selector(selector)* returns the first element with the specified *css selector*, e.g. *div.class*

Several methods can be used to return a *list* of elements:

- *find_elements_by_tag_name(name)* returns a list of elements with the specified *tag name*
- *find_elements_by_class(class)* returns a list of elements with the specified *class*
- *find_elements_by_css_selector(selector)* returns a list of elements with the specified css *selector*

Note: if no elements exist, a *NoSuchElementException* will be raised.

The code below finds the first *ul* element on the page, which holds the list of menu items in the header of the page. The element is stored in a *webelement* object.

In [None]:
ul = driver.find_element_by_tag_name('ul')
ul

### Extracting text from elements
To extract text from an element, simply use its *text* method.

In [None]:
list_items = ul.find_elements_by_tag_name('li')
for li in list_items :
    print(li.text)

### Clicking on an element

You can click on an element using the *click* method. Note that you will get an error if the element cannot be clicked. For example, this happens if you run the cell below twice.

In [None]:
searchIcon = driver.find_element_by_class_name('search-icon-off')
searchIcon.click()

### Adding text to an input

The *send_keys* method can be used to add text to an input. Here we add "How are you" to the search input that is now visible because we clicked on the search icon.

In [None]:
elem = driver.find_element_by_name("q")
elem.send_keys("How are you?")

We can clear input using the *clear* method.

In [None]:
elem.clear()

Let's search for "Computer Science", by entering the text and then pressing the *Enter* key.

The statement
```python
from selenium.webdriver.common.keys import Keys
```
is needed so that we can simulate a user hitting the ENTER (RETURN) button.

In [None]:
from selenium.webdriver.common.keys import Keys
elem.send_keys("Computer Science")
elem.send_keys(Keys.RETURN)

### Getting the value of an attribute

The method *get_attribute* can be used to get the value of an attribute of an element. Here we get all links on the page, and display the text of the link as well as the URL (the *href* attribute).

In [None]:
links = driver.find_elements_by_tag_name('a')
for l in links :
        text = l.text
        if l.text != '':
            print(text, l.get_attribute("href"), sep = ': ')

### Exercise: 
Search for a movie on IMDB and go to the page for the first result by *clicking* on the link. 
Can you extract the title and rating?

Note: It is important to sleep for a second or two between carrying out the search and going to the first result. 


In [None]:
import time

### Close the driver

Close the driver when you are done.

In [None]:
driver.close()

### Final comments (headless browsers and screenshots)

It is possible to make a browser *headless* (meaning the browser no longer has a GUI and you therefore will not see it), by setting *options* as in the code below. You can also save a screenshot of   This is commonly done with testing. 

In [None]:
# configure headless browser
from selenium.webdriver.firefox.options import Options
options = Options()
options.headless = True
print('configuring headless browser ...')
driver = webdriver.Firefox(options=options)

# go to Google News and take a screenshot
print('opening http://news.google.com ...')
driver.get('http://news.google.com')

print('take a screenshot ...')
driver.save_screenshot('google_news.png')

# close the browser
print('close the browser...')
driver.close()

print('done!')