Skip to content

feat: multi-signal correlation engine with signal adapters and dashboard#91

Merged
lance0 merged 30 commits intomainfrom
feature/multi-signal-correlation
Mar 20, 2026
Merged

feat: multi-signal correlation engine with signal adapters and dashboard#91
lance0 merged 30 commits intomainfrom
feature/multi-signal-correlation

Conversation

@lance0
Copy link
Owner

@lance0 lance0 commented Mar 19, 2026

Summary

Adds multi-signal correlation to prefixd — the ability to combine weak signals from multiple detectors into high-confidence mitigation decisions. This is the v1.3 headline feature.

What's included

Correlation Engine (backend)

  • Time-windowed event grouping by (victim_ip, vector) from multiple sources
  • Configurable source weights, corroboration thresholds, and per-playbook overrides
  • Signal groups with derived confidence, source counting, and corroboration status
  • Correlation explainability on mitigations (why a decision was made)
  • Signal group expiry via reconciliation loop
  • Database migration 007 (signal_groups, signal_group_events tables)

Signal Adapters

  • Alertmanager webhook (POST /v1/signals/alertmanager) — maps labels/annotations to attack events, handles batched alerts, resolved alerts (withdraw), fingerprint dedup
  • FastNetMon webhook (POST /v1/signals/fastnetmon) — classifies vector from traffic breakdown, configurable confidence mapping, attack_uuid dedup

Correlation Config API

  • GET /v1/config/correlation (secrets redacted)
  • PUT /v1/config/correlation (admin only, validates, writes YAML, hot-reloads)

Dashboard

  • Correlation page with Signals, Groups, and Config tabs
  • Signal group detail page with contributing events and source breakdown
  • Correlation context section on mitigation detail page
  • 30 new frontend tests

Documentation

  • ADR 018 (correlation engine) and ADR 019 (signal adapter architecture)
  • Full API docs for all new endpoints
  • Configuration reference for correlation section
  • Updated README, FEATURES, CHANGELOG, ROADMAP, AGENTS

Testing

  • 179 backend unit tests (was 126)
  • 99 integration tests (was 44)
  • 16 postgres integration tests (was 9)
  • 9 e2e tests including multi-source corroboration through real Postgres + GoBGP
  • 64 frontend tests (was 34)
  • All passing, clippy clean, fmt clean, Docker builds clean

Breaking changes

None. Correlation is opt-in via correlation.enabled: true. Default behavior (min_sources: 1) preserves existing single-source flow.

lance0 added 30 commits March 19, 2026 13:47
…ructure

- Create migration 007_signal_groups.sql with signal_groups table,
  signal_group_events junction table, and nullable signal_group_id FK
  on mitigations. Add indexes for (victim_ip, vector, status) and
  (status, window_expires_at).
- Create src/correlation/ module with CorrelationConfig struct
  (enabled, window_seconds, min_sources, confidence_threshold,
  sources HashMap, default_weight) all with backward-compatible defaults.
- Add PlaybookCorrelationOverride for per-playbook min_sources and
  confidence_threshold overrides on the Playbook struct.
- Add correlation field to Settings (serde default = disabled).
- Wire CorrelationConfig into AppState with RwLock for hot-reload.
- Add correlation config reload to AppState::reload_config().
- Write ADR 018 documenting multi-signal correlation engine design.
- Add 24 unit tests for config deserialization and override resolution.

All existing 44 integration + 9 postgres tests pass unchanged.
…ted confidence, and corroboration checking

- Add CorrelationEngine with create_group, compute_derived_confidence,
  count_distinct_sources, check_corroboration, and compute_explanation
- Add SignalGroup, SignalGroupEvent, CorrelationExplanation domain types
- Add 8 RepositoryTrait methods for signal group CRUD with concurrent-safe
  INSERT ... ON CONFLICT for PostgreSQL and matching MockRepository
- Add 5 Prometheus metrics: prefixd_signal_groups_total,
  prefixd_signal_group_sources, prefixd_correlation_confidence,
  prefixd_corroboration_met_total, prefixd_corroboration_timeout_total
- 23 unit tests covering confidence math, corroboration, edge cases
- 6 Postgres integration tests for signal group operations
- Add correlation step between event storage and policy evaluation in
  handle_ban(). When correlation.enabled, find/create signal group, add
  event, check corroboration — skip mitigation if threshold not met.
- Add signal_group_id to Mitigation domain type and DB layer
- Add CorrelationContext to MitigationResponse with signal_group_id,
  derived_confidence, source_count, corroboration_met, contributing
  sources, and explanation string
- Enrich GET /v1/mitigations/{id} with full correlation context
- Add correlation summary to GET /v1/mitigations list items
- Update WebSocket MitigationCreated broadcast with correlation data
- Add correlation section to incident report markdown generation
- Transition signal group to 'resolved' when mitigation is created
- Add signal group expiry sweep to reconciliation loop with
  corroboration timeout metric
- Add find_expired_signal_groups to RepositoryTrait with Mock and
  Postgres implementations
- 14 new integration tests covering: min_sources=1 triggers, min_sources=2
  blocks/triggers, disabled bypass, EventResponse shape, low confidence
  blocking, duplicate source counting, batch endpoint, mitigation detail
  correlation, list correlation summary, guardrail enforcement, incident
  report correlation section, signal group resolution
…/signal-groups/{id})

- GET /v1/signal-groups with cursor pagination, status/vector/date range filters
- GET /v1/signal-groups/{id} returning group metadata + contributing events
- Both endpoints require authentication (401 without)
- Full OpenAPI spec registration: paths, schemas (SignalGroup, SignalGroupEvent,
  SignalGroupsListResponse, SignalGroupDetailResponse, CorrelationContext,
  CorrelationExplanation, SourceContribution, SignalGroupStatus)
- 10 integration tests: list basic, pagination, status filter, vector filter,
  date range filter, detail with events, detail not found, auth required,
  OpenAPI validation, multi-event detail

Fulfills: VAL-ENGINE-016, VAL-ENGINE-017, VAL-ENGINE-032, VAL-ENGINE-034
- All validators pass (173 unit + 68 integration + 15 postgres tests, typecheck, lint, frontend build + 34 tests)
- 5/5 feature reviews passed with no blocking issues
- 10 non-blocking issues documented
- Library updated: CTE concurrent insert pattern, API response context levels
- Recommended skill changes: soften TDD mandate (systemic across 4 workers)
…que index for open groups

Two correctness fixes for the correlation engine:

1. Signal group status was set to 'resolved' BEFORE mitigation creation was
   confirmed. If guardrails or policy evaluation rejected the mitigation,
   the group was left in 'resolved' status with no mitigation. Now the
   status update happens AFTER insert_mitigation() succeeds.

2. Added migration 008 with a partial unique index on
   signal_groups(victim_ip, vector) WHERE status='open' to prevent
   duplicate open groups from truly concurrent inserts.

Added integration test verifying that if guardrails reject a corroborated
mitigation, the signal group stays 'open'.
Implement POST /v1/signals/alertmanager endpoint that accepts Alertmanager v4
webhook payloads, maps labels/annotations to AttackEventInput fields, and feeds
them through the existing event ingestion pipeline (correlation, guardrails,
policy evaluation).

Features:
- Accepts batched alerts with per-alert processing and results
- Maps labels.vector (fallback alertname) → vector
- Maps labels.victim_ip (fallback instance with port stripping) → victim_ip
- Maps annotations.bps/pps → optional i64
- Maps labels.severity → confidence (critical=0.9, warning=0.7, info=0.5)
- Resolved alerts (status='resolved') trigger unban/withdraw flow
- Uses alerts[].fingerprint as external_event_id for dedup
- Returns 400 for malformed payloads (Alertmanager won't retry 4xx)
- Requires authentication (401 without)
- Partial batch failure reports per-alert status

Docs:
- ADR 019 (signal-adapter-architecture.md) with Context/Decision/Consequences
- API docs updated with endpoint reference and label mapping table
- CHANGELOG updated with feature entry

Tests (12 new integration tests):
- Valid payload, batched alerts, vector label mapping variants
- Victim IP extraction with port stripping
- BPS/PPS annotation parsing, severity→confidence mapping
- Resolved alerts trigger withdraw, fingerprint dedup
- Malformed payloads return 400, auth required (401)
- Partial batch failure, OpenAPI spec registration
Implement dedicated FastNetMon signal endpoint that:
- Accepts FastNetMon's native JSON notify format (ip, action, attack_details)
- Classifies attack vector from per-protocol traffic breakdown
- Computes confidence from action type via configurable mapping
  (ban=0.9, partial_block=0.7, alert=0.5, overridable per-source)
- Returns EventResponse shape for compatibility
- Requires authentication
- Stores raw payload for forensics

Add confidence_mapping field to SourceConfig for per-source action→confidence
overrides. Add route, OpenAPI registration, 4 unit tests and 7 integration
tests covering: valid payload, default/overridden confidence mapping,
malformed payload (400), auth requirement (401), source field verification,
and OpenAPI spec inclusion.
…ation)

- GET /v1/config/correlation: returns correlation config with allowlist-redacted
  fields following ADR 014 pattern
- PUT /v1/config/correlation: admin-only endpoint that validates, saves to
  correlation.yaml with atomic write + backup, and hot-reloads
- Updated POST /v1/config/reload to prefer standalone correlation.yaml,
  falling back to prefixd.yaml correlation section
- Added save/load/validate/redacted methods to CorrelationConfig
- Added correlation_path() to AppState
- Registered routes in api_routes() and OpenAPI spec
- 9 integration tests: GET config, GET default, PUT success, PUT admin-only
  (403), PUT invalid JSON (400), PUT validation errors (400), reload picks up
  changes, unknown source graceful handling, OpenAPI spec inclusion
- 6 unit tests for validate, redacted, save/load roundtrip
- Add 3 E2E tests in tests/integration_e2e.rs for signal adapter flows
  through real Postgres + GoBGP (marked #[ignore] by default):
  - Alertmanager webhook → signal group → mitigation with FlowSpec in RIB
  - FastNetMon signal → signal group → mitigation with FlowSpec in RIB
  - Multi-source corroboration: FastNetMon (no mitigation) + Alertmanager
    → same group → corroboration met → mitigation with both sources
- Add E2ETestContext::with_correlation() for correlation-enabled E2E tests
- Add FastNetMon endpoint docs to docs/api.md (request/response examples,
  field reference, confidence mapping, vector classification, config snippets)
- Add Alertmanager config snippet to docs/api.md
- Update CHANGELOG.md with FastNetMon adapter, correlation config API,
  and signal adapter E2E test entries
…t correlation data

Two fixes for non-blocking issues found during user testing validation:

1. Concurrent event submissions for same (victim_ip, vector) could trigger
   a 500 from the unique constraint (idx_signal_groups_open_unique). The CTE
   handles sequential races but truly concurrent inserts can fail. Added
   retry-on-conflict logic: when INSERT fails with unique violation (23505),
   retry with a SELECT to find the group that won the race.

2. GET /v1/mitigations list returned empty contributing_sources [] and
   explanation "" for correlated mitigations, while detail returned full
   data. Made both fields Optional and set to None in list view (omitted
   from JSON via skip_serializing_if), keeping full data in detail view.
… tabs

- New /correlation page with three Radix Tabs (Signals, Groups, Config)
- Signals tab: source status cards with health dots, recent signals table, weight visualization
- Groups tab: filterable list with status/vector filters, cursor pagination (Load More), URL param sync, empty state with clear-filters
- Config tab: correlation settings editor with validation (admin-only), signal source CRUD cards with add/edit/remove dialogs, per-playbook override display with link to Playbooks
- SWR hooks: useSignalGroups, useSignalGroupsPaginated, useSignalGroupDetail, useSignalSources, useCorrelationConfig, useOpenSignalGroupCount
- API functions: getSignalGroups, getSignalGroupDetail, getSignalSources, getCorrelationConfig, updateCorrelationConfig
- Sidebar nav item with Waypoints icon and open group count badge
- Command palette entry with g r shortcut
- Keyboard shortcut g r for navigation, added to shortcuts modal
- 15 Vitest tests covering all three tabs (rendering, loading, error, empty states)
- Full light/dark mode support throughout
- Header with victim IP (large mono), vector badge, status badge, timestamps
- Contributing events timeline (chronological, color-coded by source)
- Derived confidence breakdown table with weighted contributions
- Corroboration badge (green Corroborated / amber Pending N/M)
- Linked mitigation card with navigation to /mitigations/{id}
- 404 handling for unknown group IDs
- Bidirectional navigation (groups <-> mitigations, IP history, events)
- 7 Vitest tests covering all states and navigation
- Add CorrelationContext interface to api.ts with signal_group_id,
  derived_confidence, source_count, corroboration_met, contributing_sources,
  and explanation fields
- Update Mitigation interface with optional correlation field
- Add correlation mock data for first mock mitigation
- Add Correlation card to mitigation detail page showing:
  - Signal group ID as clickable link to /correlation/groups/{id}
  - Derived confidence percentage with visual progress bar
  - Source count with source name badges
  - Corroboration badge (green Corroborated / amber Pending)
  - Contributing sources table (source, confidence, weight) via
    useSignalGroupDetail hook
  - Why explanation text
- Show muted 'Single-source mitigation' message for non-correlated mitigations
- Add 8 Vitest tests covering correlated/non-correlated rendering,
  signal group link navigation, contributing sources table, corroboration
  badge states, and hook parameter passing
- Add mitigation_id field to SignalGroupDetailResponse by querying
  mitigations table for matching signal_group_id (CRITICAL fix)
- Add find_mitigation_id_by_signal_group to RepositoryTrait + impls
- Add AlertDialog confirmation for signal source deletion in config tab
- Remove unused imports (useSWRConfig in config-tab, useEffect in groups-tab)
- Fix useState misuse in SourceDialog (replaced with useEffect)
- Add loading skeleton for contributing sources table in mitigation detail
- Add integration tests for mitigation_id in signal group detail response
- README: add correlation to features table, how-it-works flow, native adapter docs
- FEATURES: add Multi-Signal Correlation section (signal groups, corroboration, weighting, dashboard)
- AGENTS: add missing endpoints, correlation.yaml, correlation dashboard page, fix test counts and ADR count
- CHANGELOG: fix test count claims to match actual (179 unit, 99 integration, 16 postgres, 64 frontend)
- ROADMAP: check off completed Alertmanager and FastNetMon adapter items
…dapters

- upgrading.md: add v0.13.0 -> v0.14.0 section with migration 007, new endpoints,
  signal adapter setup, correlation config, Docker read-only caveat, step-by-step upgrade
- deployment.md: add Signal Adapters section (Alertmanager/FastNetMon setup),
  correlation metrics to monitoring table, production checklist items,
  read-only config volume note
- Fix ADR count (19, not 20) in AGENTS.md and README.md
- Update migration table in upgrading.md and deployment.md to include migration 007
Dashboard config editors (playbooks, alerting, correlation PUT endpoints)
were returning 500 because configs were mounted :ro. Now writable by
default so the UI works out of the box. Document :ro as a hardening
option for production.
@lance0 lance0 merged commit c43fe93 into main Mar 20, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant