# Selenium Introduction: Examples

This notebook provides an introduction to the **`Selenium`** library.  

**`Selenium`** is widely used to automate and control web browsers, simulating human interactions such as clicking, typing, and navigating. Unlike libraries such as **`requests`**, which can only fetch static HTML, Selenium can scrape dynamic content that is loaded via JavaScript.

The notebook will be structured in the following way:

1. **Introduction and Setup**
2. **Browser Control / Navigation**
3. **Finding Elements**
4. **Interacting with Elements**
5. **Code Example**


For this notebook, we will use **[scrapethissite.com](https://www.scrapethissite.com/)**, a website specifically designed for legal scraping practice.

#

## 1. Introduction and Setup

To make **`Selenium`** run on your computer, you need to fulfill a few requirements.

Start by importing the library:

```python
import selenium

In [15]:
import selenium
from time import sleep
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager

Selenium cannot directly communicate with the browser which makes a **browser-specific driver** that implements the standardized **WebDriver protocol** neccessary.
The driver acts as a bridge between **`Selerium`** and the **browser** by translating the Python commands from **`Selerium`** into browser actions (e.g. click, navigate, type)


You can use the **Chrome browser** to run Selenium.  

- Chrome requires a **version-specific driver** called **ChromeDriver**.  
- Download ChromeDriver here: [https://chromedriver.chromium.org/downloads](https://chromedriver.chromium.org/downloads)  

To find your Chrome version: Open Chrome → Menu (⋮) → Help → About Google Chrome

In [20]:
# You can skip the installation of the driver if you use the webdriver_manager library:
# driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver = webdriver.Safari()

In [22]:
# Test if your driver works:
driver.get("https://www.scrapethissite.com")
print("Page title:", driver.title)

Page title: Scrape This Site | A public sandbox for learning web scraping


After this step your browser should have opend automatically and opend the page of scrapethissite!

#

## 2. Browser Control and Navigation

In this section, you will learn how to **navigate between pages** and **open/close the browser properly** using Selenium.  
The `driver` object in your code represents the browser and allows you to perform actions such as loading pages, refreshing, or moving forward/backward.  
The following commands are the most important for basic browser control:

**Navigation**
- `driver.get(url)` - open the page behind the url
- `driver.refresh()` - reload the page
- `driver.back()` - navigate one page back
- `driver.forward()` - navigate one page forward

**Page Information**
- `driver.title` - current page title (as seen above)
- `driver.current_url` - current URL
- `driver.page_source` - page_source

**Close the browser**
- `driver.quit()` - close the driver/ browser completely
- `driver.close()` - just closes the current tab

In [24]:
# Print current URL:
print(driver.current_url)

# Print page source (prints the complete HTML code of Scrape this Site):
# print(driver.page_source)

https://www.scrapethissite.com/


#

## 3. Finding Elements

Pure browser navigation alone does not allow us to **interact with a page automatically**.  
To make **`Selenium`** truly useful, we need to **click buttons, fill forms, and locate elements** on the page.  

Finding elements enables us to **interact with the page** and **extract the data** we want.  

**`Selenium`** interacts with elements via the **DOM** (Document Object Model), which represents the structure of the page in HTML.  
This allows us to access elements using their **attributes** such as ID, class, name, or tag.

**The most common attributes used for location are:**
- **ID** - Locating by unique element ID
- **Name** - Locating by HTML name attribute
- **Class Name** - Locating by HTML class attribute
- **Tag Name** - Locating by Tag Name e.g. `div`,`p`
- **CSS Selector** - Locating by CSS path
- **XPath** - Locating by XPath path

**This translates into the following python codes:**
- **ID** - `driver.find_element(By.ID, "username")`
- **Name** - `driver.find_element(By.NAME, "q")`
- **Class Name** - `driver.find_element(By.CLASS_NAME, "quote")`
- **Tag Name** - `driver.find_elements(By.TAG_NAME, "p")`
- **CSS Selector** - `driver.find_element(By.CSS_SELECTOR, "div.content > p")`
- **XPath** - `driver.find_element(By.XPATH, "//div[@class='quote']")`

To locate elements on a webpage, use the **browser inspector** by right-clicking → **Inspect**. This lets you examine the HTML structure and identify attributes like `id`, `class`, `name`, or `tag` to use in Selenium.

######

#### 3.1 Task

Try to find the `id` of the **Sandbox** Tab on the page of **Srape This Site** with the help of **Inspect**

In [26]:
# Import by to be able to select different elements:
from selenium.webdriver.common.by import By

# Check if your id returns an element:
# element = driver.find_element(By.ID, "")

# Check if you habe the right element:
# print("Visible text:", element.text + ", and underlying id:", element.get_attribute("id"))

#

## 4. Interacting with Elements

Once you have located the right element in the HTML and found it using Python,  
you still need to **interact with it** to extract data or perform actions.  

This section covers the **most commonly used actions** you usually perform on elements in Selenium:

**Most common actions**
- **Click** - `element.click`
- **Type / Input** - `element.send_keys("Hello World")`
- **Clear Input** - `element.clear()`
- **Submit** - `element.submit()`
- **Check / Uncheck an element** - `if not element.is_selected():
    element.click()`

The goal for this section is to navigate to the **Sandbox** page and print the new **URL**

In [69]:
# Navigate to the next page by clicking the element:
element = driver.find_element(By.ID, "nav-sandbox")

# Click the element:
element.click()

# Check the new url:
print(driver.current_url)

https://www.scrapethissite.com/pages/forms/?q=Boston+Bruins


#

## 5. Code Example

The following code example combines the previous steps to extract data for a specific hockey team and store it in a **Pandas DataFrame**:

In [77]:
import pandas as pd
from io import StringIO

# Navigate to the next page by clicking the element:
element = driver.find_element(By.ID, "nav-sandbox")

# Click the element:
element.click()

# Check the new url:
print(driver.current_url)

sleep(1)
# Navigate even further by clicking on one of the pages:
element_2 = driver.find_element(By.XPATH, "//a[@href='/pages/forms/']")

# Check the second element: 
print("Visible text:", element_2.text + ", and underlying link:", element_2.get_attribute("href"))

# Element is correct so we want to get the new href link
href = element_2.get_attribute("href")
driver.get(href)

sleep(1)
# Look for the input field on the page and paste a value:
element_input = driver.find_element(By.ID, "q")
element_input.send_keys("Boston Bruins")

# Submit the input parameters:
element_input.submit()

# Get the raw HTML code and create different tables based on the tables in the code:
html_hockey = driver.page_source
tables = pd.read_html(StringIO(html_hockey))
df = tables[0]
print(df.head())

https://www.scrapethissite.com/pages/
Visible text: Hockey Teams: Forms, Searching and Pagination, and underlying link: https://www.scrapethissite.com/pages/forms/
            Team Name  Year  Wins  Losses  OT Losses  Win %  Goals For (GF)  \
0       Boston Bruins  1990    44      24        NaN  0.550             299   
1      Buffalo Sabres  1990    31      30        NaN  0.388             292   
2      Calgary Flames  1990    46      26        NaN  0.575             344   
3  Chicago Blackhawks  1990    49      23        NaN  0.613             284   
4   Detroit Red Wings  1990    34      38        NaN  0.425             273   

   Goals Against (GA)  + / -  
0                 264     35  
1                 278     14  
2                 263     81  
3                 211     73  
4                 298    -25  
