Skip to content

SilverKenn/spyweb

SpyWeb

Tiny web scraper with Lua scripting - ~5MB single binary, no runtime, ~3MB idle RAM

Language Scripting License


What is spyweb?

SpyWeb is a single-binary web scraper designed for monitoring listings, job boards, classified ads, and paginated HTML content. Define what to scrape with simple TOML configs, and get notified via desktop notifications or webhooks when new items appear.

Quick Demo

[[jobs]]
name = "HN Front Page"
url = "https://news.ycombinator.com"
selector = ".athing"
fields = ["title:.titleline > a", "link:.titleline > a@href"]
keywords = ["rust", "linux", "open source"]
interval = 300

Features

  • Zero dependencies: ~5MB binary, no Node/Python/Java.
  • Lua hooks: Intercept requests, implement pagination, or push to external DBs.
  • True Concurrency: Each job runs independently in an async pool; slow proxies never block others.
  • Dual HTML parser: Recovers from broken, unclosed HTML automatically.
  • Desktop notifications & webhooks: Ping yourself immediately when new items are found.
  • Proxy rotation & dedup: Rotate sticky/random proxies, and rely on the internal database to never see the same item twice.

Install & Run

Download the latest release ZIP from the Releases page and extract it.

SpyWeb ships as two separate binaries to provide the best experience for your environment:

1. Terminal Version (spyweb)

Ideal for headless servers, Docker containers, or developers who want to watch live logs.

./spyweb

Press Ctrl+C to quit.

2. Silent Desktop Version (spyweb-gui)

Ideal for Windows/macOS personal laptops. It runs entirely in the background without a terminal window and adds a quiet icon to your system tray.

# Windows
spyweb-gui.exe

Right-click the tray icon to open the web UI or quit the app.

Both binaries serve the admin dashboard at http://127.0.0.1:8001 and will loop each enabled job at its configured interval.

CLI Tools

The terminal binary includes helpful developer tools:

# Validate your jobs.toml without running the scraper
./spyweb check

# Run a single job instantly (bypasses interval and runs the full async Lua pipeline)
# Helpful for testing configurations and Lua hooks!
./spyweb debug "My Job Name"

One Hook Example (Pagination)

Place a hooks.lua next to your config to customize the pipeline:

page = 1  -- State persists across runs!

function before_fetch(request)
    request.url = request.url .. "?page=" .. page
    page = page + 1
    return request
end

Note: For JS-heavy client-rendered pages, you can easily use this same before_fetch hook to wrap the request and proxy it through Browserless!

Documentation


⚖️ A Note on Scraping Morality

With great power comes great responsibility. Please be a good internet citizen:

  • Do not hammer sites: Set a reasonable, modest interval in your configurations. Scraping a page every 5 seconds is almost never necessary and costs the site owner money. Furthermore, aggressive abuse is the fastest way to get your connection throttled, flagged, or permanently IP banned.
  • Respect resources: If you are monitoring a small independent site, be extra gentle with your request frequency.
  • Honor the web: spyweb is a tool built for personal monitoring and automation; it is not a weapon for Denial of Service or aggressive data harvesting. Be modest when scraping.

Dual-licensed under MIT or Apache-2.0.

About

Tiny web scraper with Lua scripting: ~5MB single binary, no runtime, ~3MB idle RAM

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

No contributors