A lightweight AI agent that can:
- Scrape the text content of any public web page using Playwright
- Answer follow-up questions about the scraped content with OpenAI
- Return both the raw scraped text and the AI’s answer as JSON
You can drive this agent either:
- From the CLI for quick one-off queries
- Via the Flask App which provides a control & monitoring panel over WebSockets
- Or using Docker for easy containerized deployment
- Headless browsing with Playwright to fully render JavaScript-driven pages
- Tool-based agent architecture using
agents.Runnerand@function_tool - Dual output: raw content + concise AI answer
- Live logs streamed to the browser for visibility
.
├── agent/
│ ├── agent.py # main agent definition & CLI runner
├── web/
│ ├── app.py # Flask + SocketIO server
│ ├── templates/
│ │ └── index.html # Control + monitoring UI
│ └── requirements.txt
├── docker-compose.yml # (optional) compose file for Docker
├── Dockerfile # container setup
└── README.md
- Git
- Docker & Docker-Compose (or standalone Docker)
- An OpenAI API key stored in the environment
-
Clone the repo
git clone https://github.com/kl1d/web-scraper-agent.git cd web-scraper-agent -
Create a
.envfileOPENAI_API_KEY=sk-...
-
Build & Run
docker-compose up --build
-
Access the UI Open your browser and go to
http://localhost:5005/.
A simple web UI (with live logs)
- Enter the url (e.g.,
https://example.com). - Specify question (e.g.,
What does this page say about pricing?).
Run the agent directly from your terminal:
docker exec -it agent python3 -m agent.agent \
--url "https://example.com" \
--question "What does this page say about pricing?"You should see:
🧠 Starting WebScraperAgent with prompt: url=https://example.com;question=...
🔧 Invoking scrape_webpage tool with URL: https://example.com
🔧 Invoking ask_question tool with question: What does this page say about pricing?
✅ Agent finished. Output →
{"content":"…full scraped text…","answer":"…AI’s concise answer…"}
Pull requests are welcome! Please open an issue or PR with your ideas and improvements.
This project is licensed under the MIT License.
