Skip to content

Scouter v0.7

Latest

Choose a tag to compare

@lokoe-mehdi lokoe-mehdi released this 02 Jun 23:04
· 6 commits to main since this release
5656cf6

πŸš€ Major features

New crawler engine β€” Go + uTLS stealth renderer

The PHP crawler has been fully replaced by a new engine written in Go.

  • uTLS stealth fingerprinting so the crawler blends in with real browsers and is far less likely to be blocked.
  • Dedicated JS renderer service with controlled Chrome tab pooling (no more idle-tab hoarding / RAM creep).
  • CPU-aware crawl governor that throttles concurrency to the host's real capacity, plus accurate live "running" accounting.
  • Hardened for multi-million-URL crawls: bounded memory, slot-before-claim concurrency, smarter retries and timeouts (classic-mode HTTP timeout aligned to 20s).

Analytics migrated to ClickHouse

Crawl analytics now live in ClickHouse instead of PostgreSQL, for dramatically faster aggregations on large crawls.

  • All report queries route through a per-crawl store shim (ClickHouse for new crawls, transparent fallback for older ones).
  • Optimized reporting queries and a lazy-warm report cache (crawl_report_cache) so heavy widgets are computed once and then read instantly.
  • PG β†’ ClickHouse backfill path for migrating existing data.

AI suite (OpenRouter)

A full AI layer powered by OpenRouter.

  • Dr. Brief β€” conversational chatbot to interrogate your crawl in natural language.
  • AI segmentation β€” automatic URL categorization.
  • Assistants for the SQL Explorer and the URL / Link filters that turn plain English into queries.
  • Bulk content generation for at-scale on-page content work.

Public API v1

A documented, key-authenticated REST API.

  • Full crawl lifecycle (create / start / stop / status), recurring schedules, and page content retrieval.
  • Categorization endpoint.
  • Redesigned Settings page to manage API keys and integrations.

MCP server (Model Context Protocol)

Connect Scouter to MCP-compatible AI clients.

  • OAuth authentication.
  • Now open to all roles, with permissions enforced server-side.

Storage & export layer

  • New blob-storage abstraction and export functionality (async export jobs for SQL / URL / Link / Redirect explorers), reconciled and hardened.

In-app notification center

  • Centralized, in-app notifications for crawl lifecycle and background jobs, with per-step progress.

✨ Improvements

  • Faster home and project pages on accounts with many crawls.
  • Crawler finalize / resilience fixes; post-processing failures no longer wrongly mark a crawl as failed.
  • Deploy stabilization: container resource limits (auto-sized to host RAM), clean PG baseline, renderer memory-leak fixes.
  • Version bumped to 0.7 across the default crawler user-agent (Scouter/0.7) and the UI.

πŸ”§ Notes for operators

  • This release restarts ClickHouse and (optionally) rebaselines PostgreSQL β€” expect cold caches on first load after deploy: the lazy-warm report cache fills on first view of each report, then reads are instant.
  • Container memory is auto-sized as a fraction of host RAM (scripts/autosize.sh); review limits if deploying on a smaller host.