Skip to content

It is a standalone Python application that exposes a comprehensive suite of Selenium browser automation commands as a FastMCP server. This allows a language model, AI agent, or any other client capable of calling tools to control a web browser, perform complex web scraping, and automate web-based tasks by issuing natural language commands.

Notifications You must be signed in to change notification settings

mm541/selenium-mcp-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

# Selenium MCP Server

Selenium MCP Server is a standalone Python application that exposes a comprehensive suite of Selenium browser automation commands as a **FastMCP** server.

This allows a language model, AI agent, or any other client capable of calling tools to control a web browser, perform complex web scraping, and automate web-based tasks by issuing natural language commands.

## 🚀 Features

- **Browser Lifecycle**: Open and close Chrome, Firefox, or Edge instances.
- **Full Navigation**: `goto`, `back`, `forward`, `refresh`, and URL/title inspection.
- **Element Interaction**: `click`, `type`, `clear`, `get_text`, `get_attribute`.
- **Advanced Actions**: `hover`, `drag_and_drop`, `right_click`, `double_click`, `execute_javascript`.
- **Complex Targets**: Handles alerts, iframes, and Shadow DOM elements.
- **Data Extraction**: Scrape full tables into JSON, list all links, and get page source.
- **Window & Tab Management**: Open, close, and list all tabs.
- **Cookies & Storage**: Full control over browser cookies.
- **Explicit Waits**: Wait for elements to be visible or clickable to handle dynamic pages.

## 🔧 Dependencies and Installation

### Dependencies

This program relies on several third-party Python libraries:

- `fastmcp` – For creating the tool server
- `mcp-json` – For the JSON tool-calling schema with FastMCP
- `selenium` – For browser automation
- `webdriver-manager` – For automatically managing browser drivers (optional with Selenium 4.6+)

Create a `requirements.txt` file with the following content:

```txt
fastmcp
mcp-json
selenium
webdriver-manager

Installation Steps

Option 1: Using uv (Recommended)

uv is an extremely fast Python package installer and virtual environment manager.

# 1. Create virtual environment
uv venv

# 2. Activate the environment
# macOS / Linux
source .venv/bin/activate
# Windows
.\.venv\Scripts\activate

# 3. Install dependencies
uv pip install -r requirements.txt

Option 2: Using venv + pip (Traditional)

# 1. Create virtual environment
# macOS / Linux
python3 -m venv venv
# Windows
python -m venv venv

# 2. Activate the environment
# macOS / Linux
source venv/bin/activate
# Windows
.\venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt

Browser Drivers

Modern versions of Selenium (4.6.0+) include Selenium Manager, which automatically downloads the correct webdriver (chromedriver, geckodriver, etc.) for your locally installed browsers the first time you run the script. No manual driver installation is required.

⚡️ Usage

Run the main script to start the server:

python main.py

By default, the server runs all browsers in headless mode (no visible UI). This is configured in the BrowserManager class.

Example output:

Selenium MCP Server Starting...
Selenium MCP Server Running...
Supported Tools:
- open_browser, close_browser, goto_url, click_element, type_text, etc.

Running with a JSON Configuration (for mcp-json)

Add the server to your mcp-servers.json configuration:

{
  "mcpServers": {
    "Selenium Automation Server": {
      "command": "uv",
      "args": [
        "run",
        "--with",
        "fastmcp",
        "fastmcp",
        "run",
        "path/to/your/selenium_mcp_server.py"
      ],
      "env": {},
      "transport": "stdio"
    }
  }
}

Replace path/to/your/selenium_mcp_server.py with the actual path to your server script.

🛠 Available Tools (API Reference)

Browser Lifecycle & Navigation

Function Description
open_browser(browser_selector) Opens a new browser instance (firefox, chrome, edge). Defaults to Firefox.
close_browser() Closes the current browser instance and cleans up resources.
is_browser_active() Checks if the current browser is active and responsive.
goto_url(url) Navigates the browser to the specified URL.
get_current_url() Returns the URL of the current page.
get_title() Returns the title of the current page.
refresh_page() Refreshes the current page.
go_back() Simulates the browser Back button.
go_forward() Simulates the browser Forward button.

Element Interaction

Function Description
click_element(strategy, selector) Clicks an element identified by strategy and selector.
type_text(text, strategy, selector, ...) Types text into an input field (can clear first and press Enter).
get_element_text(strategy, selector) Gets the visible text of an element.
get_element_attribute(attribute, ...) Gets an attribute from an element (e.g., href, src).
click_shadow_dom_element(host, element) Clicks an element inside Shadow DOM.
double_click_element(strategy, selector) Performs a double-click on an element.
right_click_element(strategy, selector) Performs a right-click (context click).
hover_element(strategy, selector) Simulates hovering the mouse over an element.
drag_and_drop(source_selector, target_selector, ...) Drags and drops an element.

Data & Information Extraction

Function Description
get_page_source() Returns the full HTML source of the current page's <body>.
extract_table_data(strategy, selector) Extracts table data and returns it as JSON.
list_links() Returns a JSON list of all unique links (text + href).

Frames, Alerts, and Tabs

Function Description
handle_alert(accept, input_text) Handles alert/prompt (accept/dismiss, send text).
switch_to_frame(selector, strategy) Switches context to an iframe.
switch_to_default_content() Returns context to the main document.
open_new_tab(url) Opens a new tab and switches to it.
close_current_tab() Closes the currently active tab.
switch_tab(index) Switches to a tab by index.
list_all_tabs() Returns a JSON list of all open tabs (title + URL).

Waits & Validation

Function Description
wait_for_element_visible(strategy, selector, timeout) Waits for element to be present and visible.
wait_for_element_clickable(strategy, selector, timeout) Waits for element to be clickable.
is_element_visible(strategy, selector) Checks if element is currently visible (non-blocking).
is_element_enabled(strategy, selector) Checks if element is enabled (non-blocking).

Cookies & Storage

Function Description
get_cookies() Returns all cookies for the current domain as JSON.
add_cookie(name, value, path) Adds a simple cookie to the current session.
delete_all_cookies() Deletes all cookies for the current session.

Advanced & Visual

Function Description
execute_javascript(script) Executes custom JavaScript and returns the result.
scroll_to(x, y) Scrolls the window to absolute coordinates.
scroll_by(x, y) Scrolls the window by a specific amount.
take_screenshot(filename) Takes a viewport screenshot (returns base64 if no filename).
save_full_page_screenshot(filename) Attempts to capture the entire page.

About

It is a standalone Python application that exposes a comprehensive suite of Selenium browser automation commands as a FastMCP server. This allows a language model, AI agent, or any other client capable of calling tools to control a web browser, perform complex web scraping, and automate web-based tasks by issuing natural language commands.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages