Skip to content

Bot Detection

Richard Kent Gates edited this page Jun 9, 2026 · 2 revisions

Bot Detection

Rich Statistics uses a two-layer multi-signal scoring system to filter bots before their pageview is ever written to the database. It does this without reading any IP address and without using cookies or any persistent identifier.

How it works

Every tracked pageview goes through two independent layers that each contribute a numeric score. The scores are summed and capped at 10. If the total meets or exceeds the configured Bot Score Threshold (default: 5), the request is silently discarded.

Layer 1 — JavaScript signals (client-side)

The tracker script runs checks in the browser and combines results into a compact integer bitmask that is sent with the pageview payload. The PHP ingest endpoint never sees raw browser values — only the bitmask integer.

Signal What it checks Bot score contribution Reasoning
WEBDRIVER navigator.webdriver === true +4 Near-certain headless browser
NO_HUMAN_EVENT No mouse, touch, or keyboard event before send +1 Real users almost always interact
ZERO_SCREEN screen.width or screen.height === 0 +3 Impossible on a real display
CHROME_MISSING_OBJ Claims Chrome UA but window.chrome absent +3 Common scraper tell
NO_LANGUAGES navigator.languages empty or missing +2 Headless defaults
INSTANT_LOAD Navigation timing: page loaded in < 50 ms +2 Not physically possible for a real render
NO_CANVAS HTMLCanvasElement missing +2 Stripped by some minimal headless setups
HIDDEN_ON_ARRIVAL document.hidden === true immediately +2 Headless tabs are often hidden
NO_PLUGINS navigator.plugins.length === 0 +1 Weak alone; strong combined with others
NO_TOUCH_API No touch/pointer API AND mobile UA claim +1 Mobile UA without touch events

Layer 2 — PHP signals (server-side)

The server reads only two HTTP request headers and the User-Agent string. REMOTE_ADDR (the IP address) is never read or passed to the scorer.

Signal What it checks Bot score contribution
Honest-bot UA UA contains a known crawler string (Googlebot, Bingbot, curl, etc.) = 10 (immediate reject)
Suspicious UA UA contains headlesschrome, phantomjs, selenium, scrapy, etc. +4 per match
Short UA UA is fewer than 10 characters +3
No Accept-Language HTTP_ACCEPT_LANGUAGE header is absent or empty +2
No Accept HTTP_ACCEPT header is absent or empty +1

Privacy guarantee

  You can verify this yourself: `grep -rn "REMOTE_ADDR" includes/` returns exactly one match in `class-rest-api.php` where it is used only for OTP rate limiting (SHA-256 hashed, 5-minute transient) — never read or passed to the tracking pipeline. The PHP scorer function signature documents that callers must pass only an allowlist of two headers — not the full `$_SERVER` superglobal.

Tuning the threshold

Navigate to Rich Statistics → Preferences → Bot Score Threshold. Lower values are more aggressive (may flag some edge-case legitimate traffic); higher values are more permissive. The default of 5 is a balanced starting point for most sites.

If you notice a specific traffic source being incorrectly filtered, open a GitHub issue with the User-Agent string and we'll review the signal weights.

Documentation

Features

User Guide

Compliance


External Links

Clone this wiki locally