Scouter v0.7

Latest

Latest

lokoe-mehdi released this 02 Jun 23:04

· 6 commits to main since this release

5656cf6

🚀 Major features

New crawler engine — Go + uTLS stealth renderer

The PHP crawler has been fully replaced by a new engine written in Go.

uTLS stealth fingerprinting so the crawler blends in with real browsers and is far less likely to be blocked.
Dedicated JS renderer service with controlled Chrome tab pooling (no more idle-tab hoarding / RAM creep).
CPU-aware crawl governor that throttles concurrency to the host's real capacity, plus accurate live "running" accounting.
Hardened for multi-million-URL crawls: bounded memory, slot-before-claim concurrency, smarter retries and timeouts (classic-mode HTTP timeout aligned to 20s).

Analytics migrated to ClickHouse

Crawl analytics now live in ClickHouse instead of PostgreSQL, for dramatically faster aggregations on large crawls.

All report queries route through a per-crawl store shim (ClickHouse for new crawls, transparent fallback for older ones).
Optimized reporting queries and a lazy-warm report cache (crawl_report_cache) so heavy widgets are computed once and then read instantly.
PG → ClickHouse backfill path for migrating existing data.

AI suite (OpenRouter)

A full AI layer powered by OpenRouter.

Dr. Brief — conversational chatbot to interrogate your crawl in natural language.
AI segmentation — automatic URL categorization.
Assistants for the SQL Explorer and the URL / Link filters that turn plain English into queries.
Bulk content generation for at-scale on-page content work.

Public API v1

A documented, key-authenticated REST API.

Full crawl lifecycle (create / start / stop / status), recurring schedules, and page content retrieval.
Categorization endpoint.
Redesigned Settings page to manage API keys and integrations.

MCP server (Model Context Protocol)

Connect Scouter to MCP-compatible AI clients.

OAuth authentication.
Now open to all roles, with permissions enforced server-side.

Storage & export layer

New blob-storage abstraction and export functionality (async export jobs for SQL / URL / Link / Redirect explorers), reconciled and hardened.

In-app notification center

Centralized, in-app notifications for crawl lifecycle and background jobs, with per-step progress.

✨ Improvements

Faster home and project pages on accounts with many crawls.
Crawler finalize / resilience fixes; post-processing failures no longer wrongly mark a crawl as failed.
Deploy stabilization: container resource limits (auto-sized to host RAM), clean PG baseline, renderer memory-leak fixes.
Version bumped to 0.7 across the default crawler user-agent (Scouter/0.7) and the UI.

🔧 Notes for operators

This release restarts ClickHouse and (optionally) rebaselines PostgreSQL — expect cold caches on first load after deploy: the lazy-warm report cache fills on first view of each report, then reads are instant.
Container memory is auto-sized as a fraction of host RAM (scripts/autosize.sh); review limits if deploying on a smaller host.

Assets 2