Tiny web scraper with Lua scripting - ~7MB binary, no runtime required, under 5MB idle RAM
π Master Guide | π Browser Automation | βοΈ Config | π VPS Setup | π Examples | ποΈ Build from Source
SpyWeb is a zero-dependency web monitoring engine built for speed and precision. Track listings, job boards, classifieds, price drops, restocks, public records, and anything that lives on an HTML page with simple TOML configs. Inject custom Lua logic for advanced workflows, and receive real-time alerts via desktop or webhooksβall packaged as two self-contained binaries that sip under 5MB of RAM at idle.
[[jobs]]
name = "HN Front Page"
url = "https://news.ycombinator.com"
selector = ".athing"
fields = ["title:.titleline > a", "link:.titleline > a@href"]
keywords = ["rust", "linux", "open source"]| Feature | Description |
|---|---|
| Zero Dependencies | ~7MB self-contained. Completely portable, no runtime required. |
| Lua Scripting | 9 hook stages plus persistent Lua storage for counters, cursors, and shared state. |
| Hot Reload | Save a config or Lua script and SpyWeb respawns the job instantly. |
| Internal DB | Built-in deduplication ensures you never see the same item twice. |
| Dual Binary | Choice of a headless CLI or a silent system tray app for background runs. |
| Concurrency | Async-first engine; slow proxies or large jobs never block others. |
| Fault Tolerant | Lua hook errors are caught and logged without stopping the job, ensuring 100% uptime. |
| Hybrid Engine | Falls back to a spec-compliant DOM parser for broken or complex HTML. |
| CDP Automation | Launch or connect to any Chromium browser for JS rendering, clicking, waiting, screenshots. |
| Pro Alerting | Integrated desktop notifications and customizable webhooks for real-time monitoring. |
Download the latest release ZIP from the Releases page and extract it.
--- OR USE THE COMMAND BELOW ---
stable version only, beta is available for download on beta release
# Linux
curl -L -o spyweb.zip https://dl.spyweb.app/linux && tar -xf spyweb.zip && rm spyweb.zip
# macOS (Intel)
curl -L -o spyweb.zip https://dl.spyweb.app/mac-intel && tar -xf spyweb.zip && rm spyweb.zip
# macOS (Apple Silicon)
curl -L -o spyweb.zip https://dl.spyweb.app/mac-arm && tar -xf spyweb.zip && rm spyweb.zip
# Windows (CMD, Windows 10 or later)
curl -L -o spyweb.zip https://dl.spyweb.app/windows && tar -xf spyweb.zip && del spyweb.zip
# Windows (PowerShell)
Invoke-WebRequest -Uri https://dl.spyweb.app/windows -OutFile spyweb.zip; Expand-Archive spyweb.zip; Remove-Item spyweb.zip
spyweb/
βββ spyweb # Terminal executable
βββ spyweb-tray # Background tray executable
βββ data # Internal database file (Created on first run)
βββ ui/ # Dashboard UI files (Required for web dashboard)
βββ jobs.toml # Single-file config for simple jobs (Optional)
βββ jobs/ # Folder for advanced per-job configs (Optional)
βββ docs/ # Offline documentation (Safe to delete)
βββ examples/ # Sample configurations and Lua hooks (Safe to delete)
SpyWeb ships as two separate binaries to provide the best experience for your environment:
best for headless servers, VPS, and cloud environments. Runs in the terminal and outputs real-time logs for monitoring and debugging.
./spyweb startPress Ctrl+C to quit.
Best for desktop use. Runs in the background without a terminal window and provides quick access via a system tray icon.
# Windows
spyweb-tray.exeRight-click the tray icon to open the web UI or quit the app.
A typical workflow is to use the Terminal Version for your initial setup, debugging Lua hooks, and verifying selectors. Once you are happy with the results, switch to the Tray Version to let it run silently in the background without cluttering your taskbar or terminal.
Both binaries serve the admin dashboard at http://127.0.0.1:7979 and will loop each enabled job at its configured interval. check REST API for more details
Tip: You can customize the port with
--portor theSPYWEB_PORTenvironment variable:
# Linux / macOS
./spyweb start --port 9000
# Or:
SPYWEB_PORT=9000 ./spyweb start
# Windows (PowerShell)
.\spyweb.exe start --port 9000
# Or:
$env:SPYWEB_PORT=9000; .\spyweb.exe startThe terminal binary includes helpful developer tools:
# Validate your jobs.toml without running the scraper
./spyweb check
# Run a single job instantly (bypasses interval and runs the full async Lua pipeline)
# Saves '{job_location}/{job-id}-response.html' and '{job_location}/{job-id}-fields.json' for easy inspection!
./spyweb debug "My Job Name"
# Check version and active Lua engine
./spyweb version
# Profile management β check, list, clear, or delete per-job browser profiles
./spyweb profile check "My Job" # Show profile status (all jobs if no name given)
./spyweb profile list # Alias for check
./spyweb profile clear all # Wipe browser caches for all jobs
./spyweb profile clear "My Job" # Clear a specific job's cache
./spyweb profile delete all # Delete profile directories entirely
./spyweb profile delete "My Job" # Delete a specific job's profile directoryPlace a hooks.lua next to your config to customize the pipeline. SpyWeb provides persistent storage to track state (like page numbers or failure counts) across restarts.
store_get/set/delete(key): Scoped to the individual job. Safe for standard logic because hooks for a single job are sequential.global_store_incr(key, default, delta): Atomic. Use this when you need to mutate shared state across multiple jobs simultaneously to avoid race conditions.global_store_get/set/delete(key): Shared across all jobs.
function before_fetch(request)
local page = tonumber(store_get("page") or "1")
if page > 100 then
log("[!] Page limit reached")
return nil
end
request.url = request.url .. "?page=" .. page
store_set("page", tostring(page + 1))
return request
endCheck the Examples for more Lua hook examples.
JavaScript Rendering with CDP: SpyWeb does not bundle a 300MB browser. Instead, the built-in CDP module launches whatever Chromium-based browser you already have installed (Chrome, Edge, Brave, Lightpanda, etc.) and controls it via the Chrome DevTools Protocol β all from your Lua hooks. Zero downloads, zero config, full JS execution.
function override_fetch(request)
local browser = cdp.launch({})
local page = browser:attach()
page:open(request.url)
page:wait_for_selector(".dynamic-content", 10000)
local html = page:content()
browser:close()
return { status = 200, body = html, url = request.url }
endSee the CDP Documentation for the full API β browser management, page navigation, click/wait/inject, cookies, screenshots, and a complete production hybrid-recovery pattern that falls back to a visual browser on bot detection.
Please be a good internet citizen:
- Do not hammer sites: Set a reasonable, modest
intervalin your configurations. Scraping a page every 5 seconds is almost never necessary and costs the site owner money. Furthermore, aggressive abuse is the fastest way to get your connection throttled, flagged, or permanently IP banned. - Respect resources: If you are monitoring a small independent site, be extra gentle with your request frequency.
- Honor the web: SpyWeb is a tool built for personal monitoring and automation; it is not a weapon for Denial of Service or aggressive data harvesting. Be modest when scraping.
Dual-licensed under MIT or Apache-2.0.