# Selenium Tutorial

Selenium is a suite of tools and libraries that can web scrape and automate web browser interactions. Web browser automation includes website testing, searching, gathering data, and any other interaction you have with a website. For example, you might use selenium to test various features of a website you build, or you might use selenium to help in gathering data or papers for a literature review. The Selenium WebDriver contains the most useful features for us. 

<img src="./types-of-selenium.png" alt="types of selenium" width="700" class="center"/>

## Capabilities

Selenium is a popular choice for web browser automation because of it's power and flexibility. Selenium:
-   is free and open source
-   supports multiple languages (Python, Java, CSS, Ruby, etc.)
-   works on multiple operating systems (Mac, Windows, Linux)
-   works on multiple web browsers (Chrome, Firefox, Internet Explorer)

## Installation
1) Run this in terminal: `pip install selenium` 
    (Use `pip3` for MacOS)

2) Import `selenium` at beginning of project

### Web Driver Installation

1) Download ChromeWebDriver (or whichever browser you would like to use)
    - Know your version of Chrome you are using
    - Select your OS accordingly
2) Establish path to wherever you saved the WebDriver download
    - `path = r"C:\Users\kiran\Desktop\eds217\chromedriver.exe"`   
            *** Mac users don't need to begin the string w/ r'__'. Only had to do this using Windows because of the direction of backslashes making it a different type
3) Import WebDriver using `from selenium import webdriver`
4) Initialize WebDriver: `driver = webdriver.Chrome(path)` 

**Note: make sure to use the path on your own local machine**

*Check out the troubleshooting tips at the bottom of this notebook if you run into any errors during the installation and setup*

## Tutorial

After you've successfully installed chromedriver and imported selenium and webdriver into your environment, you're ready to get started with some browser automation! Follow the steps below to see how you can open a browser window, access html elements to click or search, and quit out of the browser. 

### Import
To run some of the functions you'll need to import some classes to make your code run more smoothly.

**Import the By class to tell selenium which html or xpath elements to find**
`from selenium.webdriver.common.by import By`

**Import the Keys class to tell selenium what to input or search, such as search keywords, passwords, or login info**
`from selenium.webdriver.common.keys import Keys`

**Import the WebDriverWait, expected_conditions, and time classes so that you can make sure Selenium executes in the right order and doesn't error out because it runs faster than the browser**
`from selenium.webdriver.support.ui import WebDriverWait`
`from selenium.webdriver.support import expected_conditions as EC`
`import time`

Run the cell below:

In [None]:
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

### Open an automated web browser window

Now that you've imported everything you need, you're ready to open a browser window with selnium. 
Run the cell below:

In [None]:
#access whatever web page you want by pasting the URL in quotes
driver.get("https://www.dataone.org/")

#Check to see what page your on by printing the title of the webpage
print(driver.title)

#now access the search bar by class
search = driver.find_element(By.ID, "search-term")
search.send_keys("pisaster ochraceus")
search.send_keys(Keys.RETURN)

### Access elements

You're ready to start interacting with the html elements on the web page. use the By class to interact with html elements by ID, Name, or class. If possible, use ID to select a single element becuase it is gauranteed to be a unique identifier on an html page.

The code below shows you how you can open a browser window, and search for papers. In this example, we're looking for sea stars! But feel free to use any keywords you want.

Use the try, finally structure to time our your code so that selenium waits ten seconds (enough time for the web page to load) before looking for the elements and looping through them. 

In [None]:
try:
    results = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.ID, "results"))
    )

    papers = results.find_elements(By.TAG_NAME, "cite")
    for paper in papers:
        title = papers.find_elements(By.CLASS_NAME, "title")
        print(title)
finally:
    driver.quit()

This was great but what if you want to do more?? Check out our *advanced* tutorial in this repo: 

Search and take screenshots by running the code in: [search-frogs](/search-frogs.ipynb)

Search and save your results to a csv file by running the code in: [storing-frogs](/storing-frogs.ipynb)

When running the code in the notebook, make sure that the driver path at the top is directing to where you installed chromedriver on your computer. 


## Troubleshooting
When using Selenium for the first time, we encountered some common issues to be aware of.

### Pip issues
Make sure to install selenium into the same environment that you're working in or your project environment. If you install it in your base environment on your local machine you may get an error. 

### Driver verification
When running:
`from selenium import webdriver`
`driver = webdriver.Chrome(PATH)`
you may get an error that says you cannot open or run the driver because the source cannot be verified. Or something along the lines of:

<img src="./error_msg.png" alt="error message" width="350" class="middle"/>

To resolve this issue, run the following command in the the terminal to remove the restrictions:
`xattr -d com.apple.quarantine chromedriver`

Make sure to run this from wihtin the same directory where you saved chromedriver.

### Web issues
Because websites are constantly being updated and changed, and your script depends on the arrangement and elements in the website you are interacting with, your code could break. This makes reproducibility tougher, but thoroughly commenting your code can help fixing your code esier.

### Output errors
You may find it helpful to set your code to run headless so that no visual updates are output. To do this, run:
`options = Options()`
`options.headless = True`
`options.add_argument("--window-size=1920,1200")`
`driver = webdriver.Chrome(options=options, executable_path=PATH)`


## More resources

- About Selenium: https://www.selenium.dev/about/
- More on Selenium: https://pypi.org/project/selenium/ 
- Selenium Github: https://github.com/SeleniumHQ/Selenium 
- Tech with Tim Tutorials: https://www.youtube.com/watch?v=Xjv1sY630Uc&list=PLzMcBGfZo4-n40rB1XaJ0ak1bemvlqumQ 
- Website tutorials: https://www.scrapingbee.com/blog/selenium-python/ 
- For general browser automation info and other links: https://github.com/angrykoala/awesome-browser-automation 
- Challenges of using Selenium for web automation: https://www.browserstack.com/guide/top-limitations-of-selenium-automation 
- Webdriver examples: https://www.lambdatest.com/blog/selenium-webdriver-tutorial-with-examples/
- List of webdriver capabilities (all browsers):​​ https://www.selenium.dev/documentation/webdriver/capabilities/shared/  
- Bot tutorial: https://www.lambdatest.com/blog/automated-web-bot-with-selenium-python/ 
