Skip to content

feat: per-entity confirmation and healing thresholds#274

Merged
mfaferek93 merged 4 commits intomainfrom
feature/per-entity-thresholds
Mar 15, 2026
Merged

feat: per-entity confirmation and healing thresholds#274
mfaferek93 merged 4 commits intomainfrom
feature/per-entity-thresholds

Conversation

@mfaferek93
Copy link
Collaborator

@mfaferek93 mfaferek93 commented Mar 15, 2026

Pull Request

Summary

Add per-entity debounce threshold configuration using longest-prefix matching on source_id. Different subsystems can now have different fault confirmation/healing policies.

Example: lidar faults confirm instantly (confirmation_threshold: -1), motor faults need 5 events (-5), safety subsystem disables auto-healing.

# entity_thresholds.yaml
/sensors/lidar:
  confirmation_threshold: -1
  healing_threshold: 1
/powertrain/motor_left:
  confirmation_threshold: -5
  healing_threshold: 10
/safety:
  healing_enabled: false

Key design decisions

  • Runtime resolution, no DB schema change - thresholds resolved from source_id at report time, passed to storage per-call. No ALTER TABLE, no migration.
  • report_fault_event() takes DebounceConfig parameter - storage has zero knowledge of entities, just applies whichever config it receives.
  • External YAML config file (entity_thresholds_config_file parameter) - follows existing correlation.config_file / snapshots.config_file pattern.
  • Longest-prefix matching - /sensors/lidar/front matches /sensors/lidar over /sensors.
  • Fully backward compatible - no config file = exact current behavior.

Issue

Type

  • Bug fix
  • New feature or tests
  • Breaking change
  • Documentation only

Checklist

  • Breaking changes are clearly described (and announced in docs / changelog if needed)
  • Tests were added or updated if needed
  • Docs were updated if behavior or public API changed

Copilot AI review requested due to automatic review settings March 15, 2026 11:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds per-entity debounce threshold configuration to the fault manager, allowing different subsystems to have distinct fault confirmation/healing policies via longest-prefix matching on source_id. Thresholds are loaded from an external YAML config file and resolved at report time, with the DebounceConfig passed per-call to storage rather than relying solely on the global config.

Changes:

  • New EntityThresholdResolver class with longest-prefix matching and YAML loading
  • report_fault_event() API extended with a DebounceConfig parameter across FaultStorage, InMemoryFaultStorage, and SqliteFaultStorage
  • All existing test call sites updated to pass config; 17 new tests for resolver, YAML parsing, and per-entity storage integration

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
entity_threshold_resolver.hpp New header defining EntityDebounceOverride struct and EntityThresholdResolver class
entity_threshold_resolver.cpp Implementation of prefix matching, merging, and YAML loading
fault_storage.hpp Added DebounceConfig param to report_fault_event virtual interface and update_status
fault_storage.cpp InMemoryFaultStorage uses per-call config instead of config_ member
sqlite_fault_storage.hpp/cpp Same per-call config change for SQLite backend
fault_manager_node.hpp/cpp Stores global config, loads resolver, resolves config per ReportFault call
CMakeLists.txt New test target test_entity_thresholds with build/coverage config
test_entity_thresholds.cpp 17 new tests: resolver, YAML, per-entity storage integration
test_fault_manager.cpp Updated 57 call sites to pass default_config()
test_sqlite_storage.cpp Updated 35 call sites to pass default_config()
docs/config/fault-manager.rst Documentation for per-entity thresholds feature

@mfaferek93 mfaferek93 force-pushed the feature/per-entity-thresholds branch from 10a502a to 851c91a Compare March 15, 2026 11:33
Add per-entity debounce threshold configuration using longest-prefix
matching on source_id. Different subsystems can now have different
fault confirmation/healing policies (e.g. lidar=instant, motor=debounced).

- New EntityThresholdResolver class with longest-prefix matching and
  YAML config file parsing
- Changed report_fault_event() to accept DebounceConfig per-call
  instead of using a stored global config
- FaultManagerNode resolves config from source_id before each report
- No DB schema change - thresholds resolved at runtime
- 17 new tests (resolver unit, YAML parsing, storage integration)
- Updated docs/config/fault-manager.rst with Per-Entity Thresholds section

Closes #269
@mfaferek93 mfaferek93 force-pushed the feature/per-entity-thresholds branch from 851c91a to 1df5e6a Compare March 15, 2026 11:37
- Fix ROS_DOMAIN_ID collision: 66 -> 67 (66 used by gateway test_log_manager)
- Add path boundary check in prefix matching (/sensors/lid must not match /sensors/lidar)
- Fix incorrect doc claim about hot-reload (config loaded once at startup)
- Document that auto_confirm_after_sec is global-only
- Add PrefixMatchRequiresPathBoundary test
@mfaferek93 mfaferek93 self-assigned this Mar 15, 2026
@mfaferek93 mfaferek93 added the enhancement New feature or request label Mar 15, 2026
@mfaferek93 mfaferek93 requested a review from Copilot March 15, 2026 11:40
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds per-entity debounce threshold configuration to the fault manager, allowing different subsystems to have different fault confirmation and healing policies. Thresholds are resolved at fault-report time using longest-prefix matching on source_id, loaded from an external YAML config file.

Changes:

  • New EntityThresholdResolver class with longest-prefix matching and YAML loading
  • FaultStorage::report_fault_event() now takes an explicit DebounceConfig parameter instead of using the stored global config
  • Documentation and comprehensive tests for the new feature

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.

Show a summary per file
File Description
entity_threshold_resolver.hpp New header defining EntityDebounceOverride struct and EntityThresholdResolver class
entity_threshold_resolver.cpp Implementation of prefix matching, merging, and YAML loading
fault_storage.hpp Added DebounceConfig parameter to report_fault_event virtual interface
fault_storage.cpp InMemoryFaultStorage uses passed config instead of stored config_
sqlite_fault_storage.hpp/cpp SqliteFaultStorage uses passed config instead of stored config_
fault_manager_node.hpp/cpp Integrates resolver, resolves config per source_id before calling storage
CMakeLists.txt Adds new source and test target with ROS_DOMAIN_ID=67
test_entity_thresholds.cpp Comprehensive tests for resolver, YAML loading, and per-entity storage behavior
test_fault_manager.cpp / test_sqlite_storage.cpp Updated all report_fault_event calls with config parameter
docs/requirements/specs/faults.rst New requirement REQ_INTEROP_095
docs/config/fault-manager.rst Documentation for per-entity thresholds configuration

@mfaferek93 mfaferek93 requested a review from bburda March 15, 2026 11:58
Copy link
Collaborator

@bburda bburda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good feature design - clean architecture, solid backward compatibility, thorough unit tests.

6 findings to address before merge (see inline comments).

- Rename param entity_thresholds_config_file -> entity_thresholds.config_file
  (consistent with snapshots.config_file and correlation.config_file pattern)
- Add startup warning when auto_confirm_after_sec and per-entity thresholds
  are both set (auto-confirm bypasses entity debounce policies)
- Add entity_thresholds.config_file to README.md parameters table
- Add source_id explanation in docs (what values to expect, how to inspect)
- Fix contradictory test comment in DifferentEntitiesSameFaultCode
bburda
bburda previously approved these changes Mar 15, 2026
Copy link
Collaborator

@bburda bburda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mfaferek93 mfaferek93 force-pushed the feature/per-entity-thresholds branch from 54e1f26 to 265aaa2 Compare March 15, 2026 13:24
- New launch_testing test: test_entity_thresholds_integration.test.py
  - Lidar confirms immediately (threshold=-1)
  - Motor stays PREFAILED until 3 events (threshold=-3)
  - Unknown entity uses global threshold=-5
- YAML fixture: test_entity_thresholds.yaml
- TODO(#276) in handle_report_fault for multi-config warning
Copy link
Collaborator

@bburda bburda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mfaferek93 mfaferek93 merged commit ff4705d into main Mar 15, 2026
9 checks passed
@bburda bburda mentioned this pull request Mar 15, 2026
22 tasks
@bburda bburda deleted the feature/per-entity-thresholds branch March 16, 2026 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Per-entity confirmation and healing thresholds

3 participants