feat: per-entity confirmation and healing thresholds#274
Conversation
There was a problem hiding this comment.
Pull request overview
Adds per-entity debounce threshold configuration to the fault manager, allowing different subsystems to have distinct fault confirmation/healing policies via longest-prefix matching on source_id. Thresholds are loaded from an external YAML config file and resolved at report time, with the DebounceConfig passed per-call to storage rather than relying solely on the global config.
Changes:
- New
EntityThresholdResolverclass with longest-prefix matching and YAML loading report_fault_event()API extended with aDebounceConfigparameter acrossFaultStorage,InMemoryFaultStorage, andSqliteFaultStorage- All existing test call sites updated to pass config; 17 new tests for resolver, YAML parsing, and per-entity storage integration
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
entity_threshold_resolver.hpp |
New header defining EntityDebounceOverride struct and EntityThresholdResolver class |
entity_threshold_resolver.cpp |
Implementation of prefix matching, merging, and YAML loading |
fault_storage.hpp |
Added DebounceConfig param to report_fault_event virtual interface and update_status |
fault_storage.cpp |
InMemoryFaultStorage uses per-call config instead of config_ member |
sqlite_fault_storage.hpp/cpp |
Same per-call config change for SQLite backend |
fault_manager_node.hpp/cpp |
Stores global config, loads resolver, resolves config per ReportFault call |
CMakeLists.txt |
New test target test_entity_thresholds with build/coverage config |
test_entity_thresholds.cpp |
17 new tests: resolver, YAML, per-entity storage integration |
test_fault_manager.cpp |
Updated 57 call sites to pass default_config() |
test_sqlite_storage.cpp |
Updated 35 call sites to pass default_config() |
docs/config/fault-manager.rst |
Documentation for per-entity thresholds feature |
src/ros2_medkit_fault_manager/include/ros2_medkit_fault_manager/entity_threshold_resolver.hpp
Show resolved
Hide resolved
src/ros2_medkit_fault_manager/src/entity_threshold_resolver.cpp
Outdated
Show resolved
Hide resolved
10a502a to
851c91a
Compare
Add per-entity debounce threshold configuration using longest-prefix matching on source_id. Different subsystems can now have different fault confirmation/healing policies (e.g. lidar=instant, motor=debounced). - New EntityThresholdResolver class with longest-prefix matching and YAML config file parsing - Changed report_fault_event() to accept DebounceConfig per-call instead of using a stored global config - FaultManagerNode resolves config from source_id before each report - No DB schema change - thresholds resolved at runtime - 17 new tests (resolver unit, YAML parsing, storage integration) - Updated docs/config/fault-manager.rst with Per-Entity Thresholds section Closes #269
851c91a to
1df5e6a
Compare
- Fix ROS_DOMAIN_ID collision: 66 -> 67 (66 used by gateway test_log_manager) - Add path boundary check in prefix matching (/sensors/lid must not match /sensors/lidar) - Fix incorrect doc claim about hot-reload (config loaded once at startup) - Document that auto_confirm_after_sec is global-only - Add PrefixMatchRequiresPathBoundary test
There was a problem hiding this comment.
Pull request overview
This PR adds per-entity debounce threshold configuration to the fault manager, allowing different subsystems to have different fault confirmation and healing policies. Thresholds are resolved at fault-report time using longest-prefix matching on source_id, loaded from an external YAML config file.
Changes:
- New
EntityThresholdResolverclass with longest-prefix matching and YAML loading FaultStorage::report_fault_event()now takes an explicitDebounceConfigparameter instead of using the stored global config- Documentation and comprehensive tests for the new feature
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
entity_threshold_resolver.hpp |
New header defining EntityDebounceOverride struct and EntityThresholdResolver class |
entity_threshold_resolver.cpp |
Implementation of prefix matching, merging, and YAML loading |
fault_storage.hpp |
Added DebounceConfig parameter to report_fault_event virtual interface |
fault_storage.cpp |
InMemoryFaultStorage uses passed config instead of stored config_ |
sqlite_fault_storage.hpp/cpp |
SqliteFaultStorage uses passed config instead of stored config_ |
fault_manager_node.hpp/cpp |
Integrates resolver, resolves config per source_id before calling storage |
CMakeLists.txt |
Adds new source and test target with ROS_DOMAIN_ID=67 |
test_entity_thresholds.cpp |
Comprehensive tests for resolver, YAML loading, and per-entity storage behavior |
test_fault_manager.cpp / test_sqlite_storage.cpp |
Updated all report_fault_event calls with config parameter |
docs/requirements/specs/faults.rst |
New requirement REQ_INTEROP_095 |
docs/config/fault-manager.rst |
Documentation for per-entity thresholds configuration |
bburda
left a comment
There was a problem hiding this comment.
Good feature design - clean architecture, solid backward compatibility, thorough unit tests.
6 findings to address before merge (see inline comments).
src/ros2_medkit_fault_manager/include/ros2_medkit_fault_manager/fault_manager_node.hpp
Show resolved
Hide resolved
- Rename param entity_thresholds_config_file -> entity_thresholds.config_file (consistent with snapshots.config_file and correlation.config_file pattern) - Add startup warning when auto_confirm_after_sec and per-entity thresholds are both set (auto-confirm bypasses entity debounce policies) - Add entity_thresholds.config_file to README.md parameters table - Add source_id explanation in docs (what values to expect, how to inspect) - Fix contradictory test comment in DifferentEntitiesSameFaultCode
54e1f26 to
265aaa2
Compare
- New launch_testing test: test_entity_thresholds_integration.test.py - Lidar confirms immediately (threshold=-1) - Motor stays PREFAILED until 3 events (threshold=-3) - Unknown entity uses global threshold=-5 - YAML fixture: test_entity_thresholds.yaml - TODO(#276) in handle_report_fault for multi-config warning
265aaa2 to
e81674b
Compare
Pull Request
Summary
Add per-entity debounce threshold configuration using longest-prefix matching on
source_id. Different subsystems can now have different fault confirmation/healing policies.Example: lidar faults confirm instantly (
confirmation_threshold: -1), motor faults need 5 events (-5), safety subsystem disables auto-healing.Key design decisions
source_idat report time, passed to storage per-call. No ALTER TABLE, no migration.report_fault_event()takesDebounceConfigparameter - storage has zero knowledge of entities, just applies whichever config it receives.entity_thresholds_config_fileparameter) - follows existingcorrelation.config_file/snapshots.config_filepattern./sensors/lidar/frontmatches/sensors/lidarover/sensors.Issue
Type
Checklist