##### STA 220 Data & Web Technologies for Data Analysis

# Lecture 11 - 02/10/26, Selenium

### Announcements

### Today's topics
 - Selenium Browser

### Ressources
 - [WhereTheISS](wheretheiss.at)

# Selenium WebDriver

## Preparations

Before diving into Selenium’s features, let’s **install Chrome**, **configure ChromeDriver**, and install the Python **selenium** package.

Alternatively, you may also use the geckodriver for Firefox instead. See [here](https://www.selenium.dev/documentation/webdriver/browsers/firefox/) for more details about using Firefox through the geckodriver.

### Install the Selenium Library

In [2]:
!pip install selenium



### Install the Browser Driver

There are two ways to set up a browser driver for Chrome:

1. **Manual Installation**  
   - Check your local **Chrome** version by typing `chrome://version` in Chrome’s address bar or via “Help → About Google Chrome.”  
   - Download the matching **ChromeDriver** from  
     <https://chromedriver.storage.googleapis.com/index.html>  
   - Either add the `chromedriver.exe` to your system’s PATH (e.g., drop it into Python’s `Scripts/` folder) or specify the absolute path directly in your code.

2. **Automatic Installation**  
   - Use a 3rd-party library such as **webdriver_manager** to install the appropriate driver automatically:

In [3]:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

ChromeDriverManager().install() # detects your Chrome version, downloads the matching driver, and places it in your local cache.

'/Users/nicolai/.wdm/drivers/chromedriver/mac64/143.0.7499.192/chromedriver-mac-arm64/chromedriver'

With this setup in place, we can start using Selenium.

## Basic Usage

This section covers **initializing the browser**, visiting pages, setting the **browser window size**, **refreshing**, **forward/back** navigation, etc.

### Initialize a Browser Object

In [4]:
# Option A: Direct initialization if ChromeDriver is in PATH
driver = webdriver.Chrome()

# Option B: Specify the absolute path to chromedriver
# path = r'C:\path\to\chromedriver.exe' for Windows
# driver = webdriver.Chrome(path)

driver.close()  # Closes the browser

### Access a Page

In [5]:
import time

url = 'https://statistics.ucdavis.edu/'

with webdriver.Chrome() as driver:
    driver.get(url)
    time.sleep(3)
# driver.close() automatically closes the window afterwards

### Headless Browser

Having a browser doing things might be distracting. Let's use the headless mode!

In [6]:
option = webdriver.ChromeOptions()
option.add_argument("headless") # no browser window visible

with webdriver.Chrome(options=option) as driver:
    driver.get(url)
    time.sleep(3)

### Screenshot

While using the headless mode may be useful in practice, you may take a screenshot sometimes, e.g., if an error occurs:

In [1]:
with webdriver.Chrome(options=option) as driver:
    driver.get(url)
    driver.get_screenshot_as_file('../output/screenshot_ucd.png')

NameError: name 'webdriver' is not defined

![Screenshot](../output/screenshot_ucd.png)

Well, that's only one quarter of the page. Seems like the browser windows is quite small, eh?

### Window Size

In [8]:
with webdriver.Chrome() as driver:
    driver.maximize_window()          # Fullscreen
    driver.get(url)
#    driver.get_screenshot_as_file('../output/screenshot_ucd_max.png')
    time.sleep(2)

    driver.set_window_size(500, 500)  # 500 x 500
    time.sleep(2)

    driver.set_window_size(1000, 800) # 1000 x 800
    driver.get_screenshot_as_file('../output/screenshot_ucd_large.png')
    time.sleep(2)

![Screenshot](../output/screenshot_ucd_large.png)

### Page refresh

In [7]:
url_dynamic = 'https://the-internet.herokuapp.com/dynamic_content'

with webdriver.Chrome() as driver:
    driver.maximize_window()          # Fullscreen
    driver.get(url_dynamic)
    time.sleep(2)
    driver.refresh()
    print('Page refreshed.')
    time.sleep(2)

Page refreshed.


### Forward/Back Navigation

In [8]:
with webdriver.Chrome() as driver:
    driver.get(url_dynamic)
    time.sleep(2)
    driver.get(url)
    driver.back() # go back to the-internet
    time.sleep(2)
    driver.forward() # go forward to ucd
    time.sleep(2)

## Page Properties
Once Selenium opens a page, you can retrieve basic info:

In [9]:
with webdriver.Chrome() as driver:
    driver.get(url)
    print(driver.title)       # page title
    print(driver.current_url) # current URL
    print(driver.name)        # browser name
    html = driver.page_source # raw HTML source

UC Davis Statistics
https://statistics.ucdavis.edu/
chrome


In [10]:
html[:100]

'<html lang="en" dir="ltr" prefix="og: https://ogp.me/ns#" class=" js" style="--page-width: 1185px; -'

In [11]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
links = soup.find_all('link')

In [12]:
for link in links:
    print(link.get('href'))

https://statistics.ucdavis.edu/
https://statistics.ucdavis.edu/
https://cdn.skypack.dev/pin/lit@v2.0.0-rc.3-RFrIXWBysJfo8GpKQ4Gc/mode=imports,min/optimized/lit.js
/profiles/sitefarm/themes/sitefarm_one/dist/primary-nav.js
https://campusfont.ucdavis.edu/proxima-nova/proximanova_bold_macroman/proximanova-bold-webfont.woff2
https://campusfont.ucdavis.edu/proxima-nova/proximanova_regular_macroman/proximanova-regular-webfont.woff2
https://campusfont.ucdavis.edu/proxima-nova/proximanova_extrabold_macroman/proximanova-extrabold-webfont.woff2
https://use.fontawesome.com/releases/v6.7.2/webfonts/fa-solid-900.woff2
/sites/g/files/dgvnsk5166/files/stats%20chart%20favicon_0.png
/sites/g/files/dgvnsk5166/files/css/css_-9UtQ61-omgDChGy_Tk7vD_nqVrNQ5YwZddpuDPyFHs.css?delta=0&language=en&theme=sitefarm_one&include=eJxljs0OgzAMg18I0dOeJyqtYRFNgpoCYk-_vwtjF0v-bMlOMzI3qzdayjqx0hLTTKwZ2mgoluZwNl2yiqBWJRZ-oHNuGGMVMkUYol_IaNriDje5BAL3OMF_qVq-9M6mc0scC8nrcaTCOnv4R3274712eIN8L621EGRADhW-mDpvIG9Hee1vjN3DR3uxvB

## Page Elements

When using Selenium, a **key** step is to locate elements for input, clicking, etc. Below are common methods.

### Locating Page Elements

The syntax will always be 

```python
driver.find_element(By.X, "your_element_id")
```

where X is either an ID, Tag Name, etc

#### Locate by ID

In [15]:
from selenium.webdriver.common.by import By
url_ab = 'https://the-internet.herokuapp.com/abtest'

with webdriver.Chrome() as driver:
    driver.get(url_ab)
    content = driver.find_element(By.ID, 'content')
    time.sleep(3)
    text = content.text

In [14]:
print(text)

A/B Test Control
Also known as split testing. This is a way in which businesses are able to simultaneously test and learn different versions of a page to see which text and/or functionality works best towards a desired outcome (e.g. a user action such as a click-through).


In [20]:
url_test = 'https://automationintesting.com/selenium/testpage/'

with webdriver.Chrome() as driver:
    driver.get(url_test)
    driver.maximize_window()          # Fullscreen
    element = driver.find_element(By.ID, 'firstname')
    driver.execute_script("arguments[0].scrollIntoView();", element) # scroll until we can see the element
    time.sleep(2)
    element.send_keys('Aggies')
    time.sleep(5)

#### Locate by Name

In [22]:
url_test = 'https://automationintesting.com/selenium/testpage/'

with webdriver.Chrome() as driver:
    driver.get(url_test)
    driver.maximize_window()          # Fullscreen
    element = driver.find_element(By.NAME, 'colour')
    driver.execute_script("window.scrollBy(0, 500);")  # scroll down by 500 pixels
    time.sleep(2)
    element.click()
    time.sleep(2)

#### Locate by Class Name

In [23]:
url_test = 'https://automationintesting.com/selenium/testpage/'

with webdriver.Chrome() as driver:
    driver.get(url_test)
    element = driver.find_element(By.CLASS_NAME, 'info-title')
    title = element.text
    time.sleep(2)

print(title)

SELENIUM TEST PAGE


#### Locate by Tag Name

```python
browser.find_element(By.ID, 'name')
browser.find_element(By.NAME, 'name')
browser.find_element(By.CLASS_NAME, 'name')
browser.find_element(By.TAG_NAME, 'name')
browser.find_element(By.LINK_TEXT, 'name')
browser.find_element(By.PARTIAL_LINK_TEXT, 'name')
browser.find_element(By.XPATH, '//*[@id="name"]')
browser.find_element(By.CSS_SELECTOR, '#name')
```

Note that finding elements by using commands like
```browser.find_element_by_css_selector('#kw')```
are deprecated. It is highly recommended to use the `By.CSS_SELECTOR` instead.

In [28]:
url_test = 'https://automationintesting.com/selenium/testpage/'

with webdriver.Chrome() as driver:
    driver.get(url_test)
    element = driver.find_element_by_name('info-title')
    title = element.text
    time.sleep(2)

print(title)

AttributeError: 'WebDriver' object has no attribute 'find_element_by_name'

### Multiple Elements

If there are multiple matches, use `find_elements_...()` to get a **list** of matching elements.

## 4. Getting Element Attributes

### `get_attribute()`

For example, retrieving the `src` of an `<img>` element:

In [32]:
url = 'https://statistics.ucdavis.edu/'

with webdriver.Chrome() as driver:
    driver.get(url)
    time.sleep(1)
    element = driver.find_element(By.XPATH, '//*[@id="block-hbwelcometotheucdavisdepartmentofstatistics"]/div/img#)
    img_src = element.get_attribute('src')
    time.sleep(3)z

print(img_src)

https://statistics.ucdavis.edu/sites/g/files/dgvnsk5166/files/styles/sf_title_banner/public/media/images/MSB%20Sept%202021.jpg?h=aba4661c&itok=yKUUi6__


### Getting Text

In [37]:
url = 'https://statistics.ucdavis.edu/'

with webdriver.Chrome() as driver:
    driver.get(url)
    time.sleep(1)
    elements = driver.find_elements(By.XPATH, '//a')
    for el in elements:
        if el.text:
            print(el.text + ": " + el.get_attribute('href'))
    time.sleep(3)

Skip to main content: https://statistics.ucdavis.edu/#main-content
Home: https://statistics.ucdavis.edu/
About: https://statistics.ucdavis.edu/about-us
Courses: https://statistics.ucdavis.edu/courses
Seminars/Events: https://statistics.ucdavis.edu/seminars
Undergraduate: https://statistics.ucdavis.edu/undergrad
Graduate: https://statistics.ucdavis.edu/grad
Stat Lab: https://statistics.ucdavis.edu/stat-lab
Graduate Programs: https://statistics.ucdavis.edu/grad
Learn More: https://statistics.ucdavis.edu/grad/phd
How to Apply: https://statistics.ucdavis.edu/grad/admissions
Learn More: https://statistics.ucdavis.edu/grad/ms
How to Apply: https://statistics.ucdavis.edu/grad/admissions
Undergraduate Programs: https://statistics.ucdavis.edu/undergrad
Learn More: https://statistics.ucdavis.edu/undergrad/data-science/bs-foundations-track
Apply to UC Davis: https://www.ucdavis.edu/admissions/undergraduate
Learn More: https://statistics.ucdavis.edu/undergrad/major-programs#stamajor
Apply to UC Da

#### Other Attributes

url = 'https://statistics.ucdavis.edu/'

with webdriver.Chrome() as driver:
    driver.get(url)
    time.sleep(1)
    element = driver.find_element(By.XPATH, '//*[@id="block-hbwelcometotheucdavisdepartmentofstatistics"]/div/img')
    img_src = element.get_attribute('src')
    time.sleep(3)

print(img_src)

print(logo.id)
print(logo.location)
print(logo.tag_name)
print(logo.size)

In [40]:
url = 'https://statistics.ucdavis.edu/'

with webdriver.Chrome() as driver:
    driver.get(url)
    time.sleep(1)
    logo = driver.find_element(By.XPATH, '//*[@id="block-hbwelcometotheucdavisdepartmentofstatistics"]/div/img')

    print(logo.id)
    print(logo.location)
    print(logo.tag_name)
    print(logo.size)

    time.sleep(3)

f.CDD8C1433E1ABB543E75195AB974DC34.d.452C1D88C15EDBC64FD9D68F2FBB5539.e.460
{'x': 47, 'y': 265}
img
{'height': 251, 'width': 1090}


### Page Interaction

We have already seen some interactions: 
- scrolling
- button clicks
- writing text

In [52]:
url_test = 'https://automationintesting.com/selenium/testpage/'

driver = webdriver.Chrome()
driver.get(url_test)

In [53]:
element = driver.find_element(By.ID, 'firstname')
driver.execute_script("arguments[0].scrollIntoView();", element)

In [54]:
element.send_keys('Aggies') # write text

In [55]:
element.clear() # clear field

In [56]:
element.send_keys('NewAggies')

In [57]:
driver.find_element(By.ID, 'submitbutton').click() # press button

In [58]:
driver.find_element(By.ID, 'submitbutton').submit() # press enter

In [67]:
driver.find_element(By.ID, 'gender').click()
driver.find_element(By.XPATH, "//option[@value='my_business']").click()
element = driver.find_element(By.ID, 'firstname').send_keys('Aggies')

In [68]:
continents = driver.find_element(By.ID, "continent")

# Examples of what you can do:
print(option_element.text)          # Get the visible text
print(option_element.is_selected()) # Check if it is currently picked

NameError: name 'option_element' is not defined

In [None]:
driver.quit()

## Concluding Remarks

- **Selenium** is powerful for automating and scraping **dynamic** or **JavaScript-heavy** pages.  
- **Locating elements** can be done via ID, name, class, tag, link text, partial link text, XPath, or CSS.  
- **Mouse** and **keyboard** actions can simulate real user behavior.  
- Combine Selenium with **WebDriverWait** for reliability on sites with asynchronous loading.  
- Don’t forget best practices like **closing** the browser (`browser.quit()`) and being mindful about rate-limiting or server load.

For more advanced examples or a comprehensive PDF, refer to the original blog or advanced Selenium documentation.