feat: multi-signal correlation engine with signal adapters and dashboard#91
Merged
feat: multi-signal correlation engine with signal adapters and dashboard#91
Conversation
…ructure - Create migration 007_signal_groups.sql with signal_groups table, signal_group_events junction table, and nullable signal_group_id FK on mitigations. Add indexes for (victim_ip, vector, status) and (status, window_expires_at). - Create src/correlation/ module with CorrelationConfig struct (enabled, window_seconds, min_sources, confidence_threshold, sources HashMap, default_weight) all with backward-compatible defaults. - Add PlaybookCorrelationOverride for per-playbook min_sources and confidence_threshold overrides on the Playbook struct. - Add correlation field to Settings (serde default = disabled). - Wire CorrelationConfig into AppState with RwLock for hot-reload. - Add correlation config reload to AppState::reload_config(). - Write ADR 018 documenting multi-signal correlation engine design. - Add 24 unit tests for config deserialization and override resolution. All existing 44 integration + 9 postgres tests pass unchanged.
…ted confidence, and corroboration checking - Add CorrelationEngine with create_group, compute_derived_confidence, count_distinct_sources, check_corroboration, and compute_explanation - Add SignalGroup, SignalGroupEvent, CorrelationExplanation domain types - Add 8 RepositoryTrait methods for signal group CRUD with concurrent-safe INSERT ... ON CONFLICT for PostgreSQL and matching MockRepository - Add 5 Prometheus metrics: prefixd_signal_groups_total, prefixd_signal_group_sources, prefixd_correlation_confidence, prefixd_corroboration_met_total, prefixd_corroboration_timeout_total - 23 unit tests covering confidence math, corroboration, edge cases - 6 Postgres integration tests for signal group operations
- Add correlation step between event storage and policy evaluation in
handle_ban(). When correlation.enabled, find/create signal group, add
event, check corroboration — skip mitigation if threshold not met.
- Add signal_group_id to Mitigation domain type and DB layer
- Add CorrelationContext to MitigationResponse with signal_group_id,
derived_confidence, source_count, corroboration_met, contributing
sources, and explanation string
- Enrich GET /v1/mitigations/{id} with full correlation context
- Add correlation summary to GET /v1/mitigations list items
- Update WebSocket MitigationCreated broadcast with correlation data
- Add correlation section to incident report markdown generation
- Transition signal group to 'resolved' when mitigation is created
- Add signal group expiry sweep to reconciliation loop with
corroboration timeout metric
- Add find_expired_signal_groups to RepositoryTrait with Mock and
Postgres implementations
- 14 new integration tests covering: min_sources=1 triggers, min_sources=2
blocks/triggers, disabled bypass, EventResponse shape, low confidence
blocking, duplicate source counting, batch endpoint, mitigation detail
correlation, list correlation summary, guardrail enforcement, incident
report correlation section, signal group resolution
…/signal-groups/{id})
- GET /v1/signal-groups with cursor pagination, status/vector/date range filters
- GET /v1/signal-groups/{id} returning group metadata + contributing events
- Both endpoints require authentication (401 without)
- Full OpenAPI spec registration: paths, schemas (SignalGroup, SignalGroupEvent,
SignalGroupsListResponse, SignalGroupDetailResponse, CorrelationContext,
CorrelationExplanation, SourceContribution, SignalGroupStatus)
- 10 integration tests: list basic, pagination, status filter, vector filter,
date range filter, detail with events, detail not found, auth required,
OpenAPI validation, multi-event detail
Fulfills: VAL-ENGINE-016, VAL-ENGINE-017, VAL-ENGINE-032, VAL-ENGINE-034
…elog, roadmap, test counts)
- All validators pass (173 unit + 68 integration + 15 postgres tests, typecheck, lint, frontend build + 34 tests) - 5/5 feature reviews passed with no blocking issues - 10 non-blocking issues documented - Library updated: CTE concurrent insert pattern, API response context levels - Recommended skill changes: soften TDD mandate (systemic across 4 workers)
…que index for open groups Two correctness fixes for the correlation engine: 1. Signal group status was set to 'resolved' BEFORE mitigation creation was confirmed. If guardrails or policy evaluation rejected the mitigation, the group was left in 'resolved' status with no mitigation. Now the status update happens AFTER insert_mitigation() succeeds. 2. Added migration 008 with a partial unique index on signal_groups(victim_ip, vector) WHERE status='open' to prevent duplicate open groups from truly concurrent inserts. Added integration test verifying that if guardrails reject a corroborated mitigation, the signal group stays 'open'.
…45/45 assertions passed)
Implement POST /v1/signals/alertmanager endpoint that accepts Alertmanager v4 webhook payloads, maps labels/annotations to AttackEventInput fields, and feeds them through the existing event ingestion pipeline (correlation, guardrails, policy evaluation). Features: - Accepts batched alerts with per-alert processing and results - Maps labels.vector (fallback alertname) → vector - Maps labels.victim_ip (fallback instance with port stripping) → victim_ip - Maps annotations.bps/pps → optional i64 - Maps labels.severity → confidence (critical=0.9, warning=0.7, info=0.5) - Resolved alerts (status='resolved') trigger unban/withdraw flow - Uses alerts[].fingerprint as external_event_id for dedup - Returns 400 for malformed payloads (Alertmanager won't retry 4xx) - Requires authentication (401 without) - Partial batch failure reports per-alert status Docs: - ADR 019 (signal-adapter-architecture.md) with Context/Decision/Consequences - API docs updated with endpoint reference and label mapping table - CHANGELOG updated with feature entry Tests (12 new integration tests): - Valid payload, batched alerts, vector label mapping variants - Victim IP extraction with port stripping - BPS/PPS annotation parsing, severity→confidence mapping - Resolved alerts trigger withdraw, fingerprint dedup - Malformed payloads return 400, auth required (401) - Partial batch failure, OpenAPI spec registration
Implement dedicated FastNetMon signal endpoint that: - Accepts FastNetMon's native JSON notify format (ip, action, attack_details) - Classifies attack vector from per-protocol traffic breakdown - Computes confidence from action type via configurable mapping (ban=0.9, partial_block=0.7, alert=0.5, overridable per-source) - Returns EventResponse shape for compatibility - Requires authentication - Stores raw payload for forensics Add confidence_mapping field to SourceConfig for per-source action→confidence overrides. Add route, OpenAPI registration, 4 unit tests and 7 integration tests covering: valid payload, default/overridden confidence mapping, malformed payload (400), auth requirement (401), source field verification, and OpenAPI spec inclusion.
…ation) - GET /v1/config/correlation: returns correlation config with allowlist-redacted fields following ADR 014 pattern - PUT /v1/config/correlation: admin-only endpoint that validates, saves to correlation.yaml with atomic write + backup, and hot-reloads - Updated POST /v1/config/reload to prefer standalone correlation.yaml, falling back to prefixd.yaml correlation section - Added save/load/validate/redacted methods to CorrelationConfig - Added correlation_path() to AppState - Registered routes in api_routes() and OpenAPI spec - 9 integration tests: GET config, GET default, PUT success, PUT admin-only (403), PUT invalid JSON (400), PUT validation errors (400), reload picks up changes, unknown source graceful handling, OpenAPI spec inclusion - 6 unit tests for validate, redacted, save/load roundtrip
- Add 3 E2E tests in tests/integration_e2e.rs for signal adapter flows
through real Postgres + GoBGP (marked #[ignore] by default):
- Alertmanager webhook → signal group → mitigation with FlowSpec in RIB
- FastNetMon signal → signal group → mitigation with FlowSpec in RIB
- Multi-source corroboration: FastNetMon (no mitigation) + Alertmanager
→ same group → corroboration met → mitigation with both sources
- Add E2ETestContext::with_correlation() for correlation-enabled E2E tests
- Add FastNetMon endpoint docs to docs/api.md (request/response examples,
field reference, confidence mapping, vector classification, config snippets)
- Add Alertmanager config snippet to docs/api.md
- Update CHANGELOG.md with FastNetMon adapter, correlation config API,
and signal adapter E2E test entries
…22 passed, 2 blocked)
…one (22/22 passed)
…t correlation data Two fixes for non-blocking issues found during user testing validation: 1. Concurrent event submissions for same (victim_ip, vector) could trigger a 500 from the unique constraint (idx_signal_groups_open_unique). The CTE handles sequential races but truly concurrent inserts can fail. Added retry-on-conflict logic: when INSERT fails with unique violation (23505), retry with a SELECT to find the group that won the race. 2. GET /v1/mitigations list returned empty contributing_sources [] and explanation "" for correlated mitigations, while detail returned full data. Made both fields Optional and set to None in list view (omitted from JSON via skip_serializing_if), keeping full data in detail view.
…rtions, fixes verified)
… tabs - New /correlation page with three Radix Tabs (Signals, Groups, Config) - Signals tab: source status cards with health dots, recent signals table, weight visualization - Groups tab: filterable list with status/vector filters, cursor pagination (Load More), URL param sync, empty state with clear-filters - Config tab: correlation settings editor with validation (admin-only), signal source CRUD cards with add/edit/remove dialogs, per-playbook override display with link to Playbooks - SWR hooks: useSignalGroups, useSignalGroupsPaginated, useSignalGroupDetail, useSignalSources, useCorrelationConfig, useOpenSignalGroupCount - API functions: getSignalGroups, getSignalGroupDetail, getSignalSources, getCorrelationConfig, updateCorrelationConfig - Sidebar nav item with Waypoints icon and open group count badge - Command palette entry with g r shortcut - Keyboard shortcut g r for navigation, added to shortcuts modal - 15 Vitest tests covering all three tabs (rendering, loading, error, empty states) - Full light/dark mode support throughout
- Header with victim IP (large mono), vector badge, status badge, timestamps
- Contributing events timeline (chronological, color-coded by source)
- Derived confidence breakdown table with weighted contributions
- Corroboration badge (green Corroborated / amber Pending N/M)
- Linked mitigation card with navigation to /mitigations/{id}
- 404 handling for unknown group IDs
- Bidirectional navigation (groups <-> mitigations, IP history, events)
- 7 Vitest tests covering all states and navigation
- Add CorrelationContext interface to api.ts with signal_group_id,
derived_confidence, source_count, corroboration_met, contributing_sources,
and explanation fields
- Update Mitigation interface with optional correlation field
- Add correlation mock data for first mock mitigation
- Add Correlation card to mitigation detail page showing:
- Signal group ID as clickable link to /correlation/groups/{id}
- Derived confidence percentage with visual progress bar
- Source count with source name badges
- Corroboration badge (green Corroborated / amber Pending)
- Contributing sources table (source, confidence, weight) via
useSignalGroupDetail hook
- Why explanation text
- Show muted 'Single-source mitigation' message for non-correlated mitigations
- Add 8 Vitest tests covering correlated/non-correlated rendering,
signal group link navigation, contributing sources table, corroboration
badge states, and hook parameter passing
…/3 reviews passed)
- Add mitigation_id field to SignalGroupDetailResponse by querying mitigations table for matching signal_group_id (CRITICAL fix) - Add find_mitigation_id_by_signal_group to RepositoryTrait + impls - Add AlertDialog confirmation for signal source deletion in config tab - Remove unused imports (useSWRConfig in config-tab, useEffect in groups-tab) - Fix useState misuse in SourceDialog (replaced with useEffect) - Add loading skeleton for contributing sources table in mitigation detail - Add integration tests for mitigation_id in signal group detail response
…e (20/20 assertions passed)
- README: add correlation to features table, how-it-works flow, native adapter docs - FEATURES: add Multi-Signal Correlation section (signal groups, corroboration, weighting, dashboard) - AGENTS: add missing endpoints, correlation.yaml, correlation dashboard page, fix test counts and ADR count - CHANGELOG: fix test count claims to match actual (179 unit, 99 integration, 16 postgres, 64 frontend) - ROADMAP: check off completed Alertmanager and FastNetMon adapter items
…dapters - upgrading.md: add v0.13.0 -> v0.14.0 section with migration 007, new endpoints, signal adapter setup, correlation config, Docker read-only caveat, step-by-step upgrade - deployment.md: add Signal Adapters section (Alertmanager/FastNetMon setup), correlation metrics to monitoring table, production checklist items, read-only config volume note - Fix ADR count (19, not 20) in AGENTS.md and README.md - Update migration table in upgrading.md and deployment.md to include migration 007
Dashboard config editors (playbooks, alerting, correlation PUT endpoints) were returning 500 because configs were mounted :ro. Now writable by default so the UI works out of the box. Document :ro as a hardening option for production.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds multi-signal correlation to prefixd — the ability to combine weak signals from multiple detectors into high-confidence mitigation decisions. This is the v1.3 headline feature.
What's included
Correlation Engine (backend)
Signal Adapters
POST /v1/signals/alertmanager) — maps labels/annotations to attack events, handles batched alerts, resolved alerts (withdraw), fingerprint dedupPOST /v1/signals/fastnetmon) — classifies vector from traffic breakdown, configurable confidence mapping, attack_uuid dedupCorrelation Config API
GET /v1/config/correlation(secrets redacted)PUT /v1/config/correlation(admin only, validates, writes YAML, hot-reloads)Dashboard
Documentation
Testing
Breaking changes
None. Correlation is opt-in via
correlation.enabled: true. Default behavior (min_sources: 1) preserves existing single-source flow.