A powerful AI-powered web automation tool that enables natural language interaction with web browsers. This project combines the capabilities of various Large Language Models (LLMs) accessed via OpenRouter with Playwright for sophisticated web automation and interaction.
- π§ Advanced AI-powered web navigation and interaction
- π¬ Natural language understanding and processing
- π― Precise web element identification and interaction
- π₯οΈ Support for multiple browser automation features
- π Rich visual feedback through Gradio interface
- π Real-time browser state observation
- π¨ Beautiful and intuitive user interface
- LLM Providers: Various Models via OpenRouter
- AI Models: Multiple LLMs (e.g., OpenAI GPT-4o, OpenAI o3/o4, Google Gemini 2.5 Flash/Pro, DeepSeek V3/R1)
- Web Automation: Playwright
- User Interface: Gradio
- Accessibility: Built-in support for AXTree, DOM, and screenshot analysis
- Action Execution: Performs a wide range of actions, including:
- Filling forms (
fill) - Clicking elements (
click,dblclick) - Selecting options (
select_option) - Navigating between pages (
goto,go_back,go_forward) - Opening and closing tabs (
new_tab,tab_close) - Scrolling (
scroll) - Mouse interactions (
mouse_move,mouse_click,mouse_drag_and_drop) - Keyboard interactions (
keyboard_type,keyboard_press) - File uploads (
upload_file) - And more!
- Filling forms (
- Python 3.8 or higher
- OpenRouter API Key
- Modern web browser
- Clone the repository:
git clone <repository-url>
cd web-agent- Install dependencies:
pip install -r requirements.txt- Install Playwright browsers:
playwright install chromium-
Set up your OpenRouter API Key: You can either set it as an environment variable:
OPENROUTER_API_KEY="your-openrouter-api-key"Or enter it directly into the application's UI.
-
Launch the application:
python gradio_app.py-
Configure the agent in the UI:
- Enter your OpenRouter API Key (if not set as env variable)
- Select your preferred model from the dropdown
- Configure additional settings as needed
- Click "Initialize Agent"
-
Start using the agent:
- Enter a URL to navigate
- Interact with the agent using natural language
- View real-time browser feedback in the interface
Here are some examples of what you can do with WebRouter:
"Navigate to google.com and search for latest news"
"Fill out this contact form with my information"
"Find the best price for this product across different tabs"
"Log into my account using these credentials"
- Model Selection: Choose from a wide range of models available through OpenRouter (e.g., GPT-4o, Gemini 2.5 Pro/Flash, DeepSeek V3/R1)
- Observation Settings:
- HTML parsing
- Accessibility Tree analysis
- Screenshot capture
- Browser Options:
- Headless mode
- Custom viewport settings
- Network conditions
./
βββ action/ # Action handling and execution
βββ agent/ # Core agent implementation
βββ browser/ # Browser automation and observation
βββ gradio_app.py # Gradio UI implementation
βββ requirements.txt # Project dependencies
- Never store sensitive credentials in plain text
- Use environment variables for sensitive configuration
- Be cautious when granting web automation permissions
- Review and validate all automated actions
- Monitor automated sessions for security
Contributions are welcome! Please feel free to submit pull requests. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
- Huge inspiration from BrowserGym works
- OpenRouter team for API access
- OpenAI, Google, DeepSeek, and other model providers
- Playwright team for browser automation
- Gradio team for the UI framework
- All contributors and supporters
This project is licensed under the MIT License - see the LICENSE file for details.
- Support for additional AI models (easily added via OpenRouter)
- Enhanced multi-tab coordination
- Advanced workflow automation
- Improved error handling and recovery
- Extended browser compatibility
