Skip to content

spyweb-app/spyweb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

29 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

SpyWeb

Tiny web scraper with Lua scripting - ~7MB binary, no runtime required, under 5MB idle RAM

Language Scripting License

πŸ“– Master Guide | 🌐 Browser Automation | βš™οΈ Config | πŸš€ VPS Setup | πŸ“‚ Examples | πŸ—οΈ Build from Source


What is spyweb?

SpyWeb is a zero-dependency web monitoring engine built for speed and precision. Track listings, job boards, classifieds, price drops, restocks, public records, and anything that lives on an HTML page with simple TOML configs. Inject custom Lua logic for advanced workflows, and receive real-time alerts via desktop or webhooksβ€”all packaged as two self-contained binaries that sip under 5MB of RAM at idle.

Quick Demo

[[jobs]]
name = "HN Front Page"
url = "https://news.ycombinator.com"
selector = ".athing"
fields = ["title:.titleline > a", "link:.titleline > a@href"]
keywords = ["rust", "linux", "open source"]

Features

Feature Description
Zero Dependencies ~7MB self-contained. Completely portable, no runtime required.
Lua Scripting 9 hook stages plus persistent Lua storage for counters, cursors, and shared state.
Hot Reload Save a config or Lua script and SpyWeb respawns the job instantly.
Internal DB Built-in deduplication ensures you never see the same item twice.
Dual Binary Choice of a headless CLI or a silent system tray app for background runs.
Concurrency Async-first engine; slow proxies or large jobs never block others.
Fault Tolerant Lua hook errors are caught and logged without stopping the job, ensuring 100% uptime.
Hybrid Engine Falls back to a spec-compliant DOM parser for broken or complex HTML.
CDP Automation Launch or connect to any Chromium browser for JS rendering, clicking, waiting, screenshots.
Pro Alerting Integrated desktop notifications and customizable webhooks for real-time monitoring.

Install & Run

Download the latest release ZIP from the Releases page and extract it.

--- OR USE THE COMMAND BELOW ---

stable version only, beta is available for download on beta release

# Linux
curl -L -o spyweb.zip https://dl.spyweb.app/linux && tar -xf spyweb.zip && rm spyweb.zip

# macOS (Intel)
curl -L -o spyweb.zip https://dl.spyweb.app/mac-intel && tar -xf spyweb.zip && rm spyweb.zip

# macOS (Apple Silicon)
curl -L -o spyweb.zip https://dl.spyweb.app/mac-arm && tar -xf spyweb.zip && rm spyweb.zip

# Windows (CMD, Windows 10 or later)
curl -L -o spyweb.zip https://dl.spyweb.app/windows && tar -xf spyweb.zip && del spyweb.zip

# Windows (PowerShell)
Invoke-WebRequest -Uri https://dl.spyweb.app/windows -OutFile spyweb.zip; Expand-Archive spyweb.zip; Remove-Item spyweb.zip

Release Structure

spyweb/
β”œβ”€β”€ spyweb           # Terminal executable
β”œβ”€β”€ spyweb-tray      # Background tray executable
β”œβ”€β”€ data             # Internal database file (Created on first run)
β”œβ”€β”€ ui/              # Dashboard UI files (Required for web dashboard)
β”œβ”€β”€ jobs.toml        # Single-file config for simple jobs (Optional)
β”œβ”€β”€ jobs/            # Folder for advanced per-job configs (Optional)
β”œβ”€β”€ docs/            # Offline documentation (Safe to delete)
└── examples/        # Sample configurations and Lua hooks (Safe to delete)

SpyWeb ships as two separate binaries to provide the best experience for your environment:

1. Terminal Version (spyweb)

best for headless servers, VPS, and cloud environments. Runs in the terminal and outputs real-time logs for monitoring and debugging.

./spyweb start

Press Ctrl+C to quit.

2. Silent Tray Version (spyweb-tray)

Best for desktop use. Runs in the background without a terminal window and provides quick access via a system tray icon.

# Windows
spyweb-tray.exe

Right-click the tray icon to open the web UI or quit the app.

Recommended Workflow

A typical workflow is to use the Terminal Version for your initial setup, debugging Lua hooks, and verifying selectors. Once you are happy with the results, switch to the Tray Version to let it run silently in the background without cluttering your taskbar or terminal.

Both binaries serve the admin dashboard at http://127.0.0.1:7979 and will loop each enabled job at its configured interval. check REST API for more details

Tip: You can customize the port with --port or the SPYWEB_PORT environment variable:

# Linux / macOS
./spyweb start --port 9000

# Or:
SPYWEB_PORT=9000 ./spyweb start

# Windows (PowerShell)
.\spyweb.exe start --port 9000

# Or:
$env:SPYWEB_PORT=9000; .\spyweb.exe start

CLI Tools

The terminal binary includes helpful developer tools:

# Validate your jobs.toml without running the scraper
./spyweb check

# Run a single job instantly (bypasses interval and runs the full async Lua pipeline)
# Saves '{job_location}/{job-id}-response.html' and '{job_location}/{job-id}-fields.json' for easy inspection!
./spyweb debug "My Job Name"

# Check version and active Lua engine
./spyweb version

# Profile management β€” check, list, clear, or delete per-job browser profiles
./spyweb profile check "My Job"  # Show profile status (all jobs if no name given)
./spyweb profile list            # Alias for check
./spyweb profile clear all       # Wipe browser caches for all jobs
./spyweb profile clear "My Job"  # Clear a specific job's cache
./spyweb profile delete all      # Delete profile directories entirely
./spyweb profile delete "My Job" # Delete a specific job's profile directory

Lua API & Hooks

Place a hooks.lua next to your config to customize the pipeline. SpyWeb provides persistent storage to track state (like page numbers or failure counts) across restarts.

Scoped vs Global Storage

  • store_get/set/delete(key): Scoped to the individual job. Safe for standard logic because hooks for a single job are sequential.
  • global_store_incr(key, default, delta): Atomic. Use this when you need to mutate shared state across multiple jobs simultaneously to avoid race conditions.
  • global_store_get/set/delete(key): Shared across all jobs.

Stage Example (Pagination)

function before_fetch(request)
    local page = tonumber(store_get("page") or "1")

    if page > 100 then
        log("[!] Page limit reached")
        return nil
    end

    request.url = request.url .. "?page=" .. page
    store_set("page", tostring(page + 1))

    return request
end

Check the Examples for more Lua hook examples.


JavaScript Rendering with CDP: SpyWeb does not bundle a 300MB browser. Instead, the built-in CDP module launches whatever Chromium-based browser you already have installed (Chrome, Edge, Brave, Lightpanda, etc.) and controls it via the Chrome DevTools Protocol β€” all from your Lua hooks. Zero downloads, zero config, full JS execution.

function override_fetch(request)
    local browser = cdp.launch({})
    local page = browser:attach()
    page:open(request.url)
    page:wait_for_selector(".dynamic-content", 10000)
    local html = page:content()
    browser:close()
    return { status = 200, body = html, url = request.url }
end

See the CDP Documentation for the full API β€” browser management, page navigation, click/wait/inject, cookies, screenshots, and a complete production hybrid-recovery pattern that falls back to a visual browser on bot detection.


βš–οΈ A Note on Scraping Morality

Please be a good internet citizen:

  • Do not hammer sites: Set a reasonable, modest interval in your configurations. Scraping a page every 5 seconds is almost never necessary and costs the site owner money. Furthermore, aggressive abuse is the fastest way to get your connection throttled, flagged, or permanently IP banned.
  • Respect resources: If you are monitoring a small independent site, be extra gentle with your request frequency.
  • Honor the web: SpyWeb is a tool built for personal monitoring and automation; it is not a weapon for Denial of Service or aggressive data harvesting. Be modest when scraping.

Dual-licensed under MIT or Apache-2.0.

About

Tiny web monitoring/scraper with Lua scripting: ~5MB dual binary, no runtime.. <5MB idle RAM

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors