-
Notifications
You must be signed in to change notification settings - Fork 2
Domain Presets
Pass domain="finance" or domain="sensor" to scan() to apply domain-specific thresholds. If domain=None (the default), general-purpose thresholds are used.
The same statistical threshold can mean different things in different domains.
A stock price staying flat for 3 consecutive trading days is unusual but not necessarily impossible (e.g. a trading halt). The same value repeating for 3 consecutive hours from a temperature sensor is almost certainly a dead sensor. Using the same stuck-value threshold for both would produce either too many false positives in finance or too many missed anomalies in sensor data.
Domain presets solve this by tuning each check's thresholds to match real-world expectations for that domain, without requiring users to manually configure every parameter.
| Check | domain=None |
domain="finance" |
domain="sensor" |
|---|---|---|---|
| Max gap before PRF001 | 3× median gap | 5 calendar days | 3× median gap |
| Missing cluster threshold (PRF002) | 3 consecutive NaNs | 5 consecutive NaNs | 3 consecutive NaNs |
| Stuck-value window (ANO001) | 5 consecutive identical values | 5 consecutive identical values | 3 consecutive identical values |
| Z-score threshold (ANO002) | 4.0 | 5.0 (wider — finance has fat tails) | 3.5 (tighter) |
| Spike threshold (ANO003) | 3.5 | 4.0 | 3.0 |
| LEK001 binary target threshold | 0.80 AUC | 0.80 AUC | 0.80 AUC |
| LEK001 continuous target threshold | 0.95 Spearman | 0.95 Spearman | 0.95 Spearman |
Designed for daily OHLCV-style equity and index data.
Key differences from default:
- Wider gap tolerance (5 days) — weekends are 2-day gaps, and public holiday extensions in markets like PSX or regional exchanges can stretch to 4–5 days without indicating a data error.
- Wider z-score threshold (5.0) — financial returns have fat tails. A 4-sigma event happens in real markets; a 5-sigma event in daily OHLCV data is more likely a data error.
- Longer stuck-value window (5) — a trading halt can produce repeated prices for a few consecutive days.
report = tsa.scan(df, target="Direction", domain="finance")Suitable for: equity prices, indices, FX rates, daily OHLCV data, return series.
Designed for regular-frequency sensor readings (temperature, pressure, vibration, etc.).
Key differences from default:
- Tighter stuck-value window (3) — a physical sensor that reports the same value 3+ times in a row is almost certainly malfunctioning or disconnected, not measuring a genuinely flat signal.
- Tighter z-score threshold (3.5) — physical sensors have natural bounds; a 3.5-sigma reading from a temperature or pressure sensor is more likely an error than a legitimate measurement.
- Gap threshold uses 3× median — sensor frequency varies widely (seconds, minutes, hours); the adaptive median-based threshold works better than a fixed calendar-day rule.
report = tsa.scan(df, domain="sensor")Suitable for: IoT sensor streams, environmental monitoring, industrial equipment readings, Raspberry Pi GPIO logs.
New domain presets (e.g. "crypto", "iot", "healthcare") are explicitly welcome as contributions. Each check that branches on domain would need one additional elif clause with justified default thresholds.
To propose one:
- Open a GitHub issue using the Feature Request template
- Propose specific threshold values with reasoning for each (why is this threshold appropriate for the domain's real-world characteristics?)
- Reference at least one module where the difference from existing presets matters most See Contributing for the full process.