Skip to content

kl1d/web-scraper-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Scraper Agent

Research Blogger Agent

A lightweight AI agent that can:

  1. Scrape the text content of any public web page using Playwright
  2. Answer follow-up questions about the scraped content with OpenAI
  3. Return both the raw scraped text and the AI’s answer as JSON

You can drive this agent either:

  • From the CLI for quick one-off queries
  • Via the Flask App which provides a control & monitoring panel over WebSockets
  • Or using Docker for easy containerized deployment

Features

  • Headless browsing with Playwright to fully render JavaScript-driven pages
  • Tool-based agent architecture using agents.Runner and @function_tool
  • Dual output: raw content + concise AI answer
  • Live logs streamed to the browser for visibility

Directory Structure

.
├── agent/
│   ├── agent.py             # main agent definition & CLI runner
├── web/
│   ├── app.py           # Flask + SocketIO server
│   ├── templates/
│   │   └── index.html   # Control + monitoring UI
│   └── requirements.txt
├── docker-compose.yml   # (optional) compose file for Docker
├── Dockerfile           # container setup
└── README.md

Prerequisites

  • Git
  • Docker & Docker-Compose (or standalone Docker)
  • An OpenAI API key stored in the environment

Getting Started

  1. Clone the repo

    git clone https://github.com/kl1d/web-scraper-agent.git
    cd web-scraper-agent
  2. Create a .env file

    OPENAI_API_KEY=sk-...
  3. Build & Run

    docker-compose up --build
  4. Access the UI Open your browser and go to http://localhost:5005/.

Usage via Flask UI

A simple web UI (with live logs)

  1. Enter the url (e.g., https://example.com).
  2. Specify question (e.g., What does this page say about pricing?).

CLI Usage

Run the agent directly from your terminal:

docker exec -it agent python3 -m agent.agent \
--url "https://example.com" \
--question "What does this page say about pricing?"

You should see:

🧠 Starting WebScraperAgent with prompt: url=https://example.com;question=...
🔧 Invoking scrape_webpage tool with URL: https://example.com
🔧 Invoking ask_question tool with question: What does this page say about pricing?
✅ Agent finished. Output →
{"content":"…full scraped text…","answer":"…AI’s concise answer…"}

🤝 Contributing

Pull requests are welcome! Please open an issue or PR with your ideas and improvements.

📄 License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors