Tiny web scraper with Lua scripting - ~5MB single binary, no runtime, ~3MB idle RAM
SpyWeb is a single-binary web scraper designed for monitoring listings, job boards, classified ads, and paginated HTML content. Define what to scrape with simple TOML configs, and get notified via desktop notifications or webhooks when new items appear.
[[jobs]]
name = "HN Front Page"
url = "https://news.ycombinator.com"
selector = ".athing"
fields = ["title:.titleline > a", "link:.titleline > a@href"]
keywords = ["rust", "linux", "open source"]
interval = 300- Zero dependencies: ~5MB binary, no Node/Python/Java.
- Lua hooks: Intercept requests, implement pagination, or push to external DBs.
- True Concurrency: Each job runs independently in an async pool; slow proxies never block others.
- Dual HTML parser: Recovers from broken, unclosed HTML automatically.
- Desktop notifications & webhooks: Ping yourself immediately when new items are found.
- Proxy rotation & dedup: Rotate sticky/random proxies, and rely on the internal database to never see the same item twice.
Download the latest release ZIP from the Releases page and extract it.
SpyWeb ships as two separate binaries to provide the best experience for your environment:
Ideal for headless servers, Docker containers, or developers who want to watch live logs.
./spywebPress Ctrl+C to quit.
Ideal for Windows/macOS personal laptops. It runs entirely in the background without a terminal window and adds a quiet icon to your system tray.
# Windows
spyweb-gui.exeRight-click the tray icon to open the web UI or quit the app.
Both binaries serve the admin dashboard at http://127.0.0.1:8001 and will loop each enabled job at its configured interval.
The terminal binary includes helpful developer tools:
# Validate your jobs.toml without running the scraper
./spyweb check
# Run a single job instantly (bypasses interval and runs the full async Lua pipeline)
# Helpful for testing configurations and Lua hooks!
./spyweb debug "My Job Name"Place a hooks.lua next to your config to customize the pipeline:
page = 1 -- State persists across runs!
function before_fetch(request)
request.url = request.url .. "?page=" .. page
page = page + 1
return request
endNote: For JS-heavy client-rendered pages, you can easily use this same before_fetch hook to wrap the request and proxy it through Browserless!
With great power comes great responsibility. Please be a good internet citizen:
- Do not hammer sites: Set a reasonable, modest
intervalin your configurations. Scraping a page every 5 seconds is almost never necessary and costs the site owner money. Furthermore, aggressive abuse is the fastest way to get your connection throttled, flagged, or permanently IP banned. - Respect resources: If you are monitoring a small independent site, be extra gentle with your request frequency.
- Honor the web: spyweb is a tool built for personal monitoring and automation; it is not a weapon for Denial of Service or aggressive data harvesting. Be modest when scraping.
Dual-licensed under MIT or Apache-2.0.