Skip to content

πŸ€– An intelligent web automation assistant powered by Google/OpenAI/DeepSeek AI models via OpenRouter that understands natural language commands to navigate, interact with, and automate web browsing tasks.

License

Notifications You must be signed in to change notification settings

hqm7/web-router

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– WebRouter (Web-enable AI Agent)

A powerful AI-powered web automation tool that enables natural language interaction with web browsers. This project combines the capabilities of various Large Language Models (LLMs) accessed via OpenRouter with Playwright for sophisticated web automation and interaction.

🌟 Features

  • 🧠 Advanced AI-powered web navigation and interaction
  • πŸ’¬ Natural language understanding and processing
  • 🎯 Precise web element identification and interaction
  • πŸ–₯️ Support for multiple browser automation features
  • πŸ“Š Rich visual feedback through Gradio interface
  • πŸ”„ Real-time browser state observation
  • 🎨 Beautiful and intuitive user interface

πŸ› οΈ Technical Stack

  • LLM Providers: Various Models via OpenRouter
  • AI Models: Multiple LLMs (e.g., OpenAI GPT-4o, OpenAI o3/o4, Google Gemini 2.5 Flash/Pro, DeepSeek V3/R1)
  • Web Automation: Playwright
  • User Interface: Gradio
  • Accessibility: Built-in support for AXTree, DOM, and screenshot analysis
  • Action Execution: Performs a wide range of actions, including:
    • Filling forms (fill)
    • Clicking elements (click, dblclick)
    • Selecting options (select_option)
    • Navigating between pages (goto, go_back, go_forward)
    • Opening and closing tabs (new_tab, tab_close)
    • Scrolling (scroll)
    • Mouse interactions (mouse_move, mouse_click, mouse_drag_and_drop)
    • Keyboard interactions (keyboard_type, keyboard_press)
    • File uploads (upload_file)
    • And more!

πŸ“‹ Prerequisites

  • Python 3.8 or higher
  • OpenRouter API Key
  • Modern web browser

βš™οΈ Installation

  1. Clone the repository:
git clone <repository-url>
cd web-agent
  1. Install dependencies:
pip install -r requirements.txt
  1. Install Playwright browsers:
playwright install chromium

πŸš€ Quick Start

  1. Set up your OpenRouter API Key: You can either set it as an environment variable:

    OPENROUTER_API_KEY="your-openrouter-api-key"

    Or enter it directly into the application's UI.

  2. Launch the application:

python gradio_app.py
  1. Configure the agent in the UI:

    • Enter your OpenRouter API Key (if not set as env variable)
    • Select your preferred model from the dropdown
    • Configure additional settings as needed
    • Click "Initialize Agent"
  2. Start using the agent:

    • Enter a URL to navigate
    • Interact with the agent using natural language
    • View real-time browser feedback in the interface

πŸ’‘ Usage Examples

Here are some examples of what you can do with WebRouter:

Example Use Case

"Navigate to google.com and search for latest news"
"Fill out this contact form with my information"
"Find the best price for this product across different tabs"
"Log into my account using these credentials"

πŸ”§ Configuration Options

  • Model Selection: Choose from a wide range of models available through OpenRouter (e.g., GPT-4o, Gemini 2.5 Pro/Flash, DeepSeek V3/R1)
  • Observation Settings:
    • HTML parsing
    • Accessibility Tree analysis
    • Screenshot capture
  • Browser Options:
    • Headless mode
    • Custom viewport settings
    • Network conditions

πŸ—οΈ Project Structure

./
β”œβ”€β”€ action/               # Action handling and execution
β”œβ”€β”€ agent/               # Core agent implementation
β”œβ”€β”€ browser/             # Browser automation and observation
β”œβ”€β”€ gradio_app.py        # Gradio UI implementation
└── requirements.txt     # Project dependencies

πŸ” Security Considerations

  • Never store sensitive credentials in plain text
  • Use environment variables for sensitive configuration
  • Be cautious when granting web automation permissions
  • Review and validate all automated actions
  • Monitor automated sessions for security

🀝 Contributing

Contributions are welcome! Please feel free to submit pull requests. For major changes, please open an issue first to discuss what you would like to change.

  1. Fork the repository
  2. Create your feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a Pull Request

πŸ™ Acknowledgments

  • Huge inspiration from BrowserGym works
  • OpenRouter team for API access
  • OpenAI, Google, DeepSeek, and other model providers
  • Playwright team for browser automation
  • Gradio team for the UI framework
  • All contributors and supporters

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸš€ Future Plans

  • Support for additional AI models (easily added via OpenRouter)
  • Enhanced multi-tab coordination
  • Advanced workflow automation
  • Improved error handling and recovery
  • Extended browser compatibility

About

πŸ€– An intelligent web automation assistant powered by Google/OpenAI/DeepSeek AI models via OpenRouter that understands natural language commands to navigate, interact with, and automate web browsing tasks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published