Skip to content

fix: cross-demo script regressions + multi-ECU smoke-test flake#60

Merged
mfaferek93 merged 6 commits into
mainfrom
fix/demo-script-regressions
May 30, 2026
Merged

fix: cross-demo script regressions + multi-ECU smoke-test flake#60
mfaferek93 merged 6 commits into
mainfrom
fix/demo-script-regressions

Conversation

@bburda
Copy link
Copy Markdown
Contributor

@bburda bburda commented May 29, 2026

Description

Fixes a set of demo-script regressions surfaced while running every demo end-to-end with its documented scripts, plus an intermittent multi-ECU CI flake. None of the script issues are new - they accumulated over time and went unnoticed because no single change exercises every demo end-to-end. Split out from #58 so the OTA demo PR stays scoped to the OTA demo and these unrelated fixes land on their own.

Script regressions

  • lib: resolve sourced lib path via BASH_SOURCE instead of $0 - lib/setup-trigger.sh and lib/watch-trigger.sh resolved SCRIPT_DIR from $0, which stays the caller's path when the lib is sourced, so the re-source of triggers-api.sh failed with "No such file or directory".
  • triggers: use hyphenated entity IDs to match gateway normalisation - the gateway exposes entities as diagnostic-bridge, manipulation-monitor, etc. The per-demo trigger wrappers passed the ROS-node-name spelling (diagnostic_bridge), which matches the reporting source path but not the URL-addressable entity ID, so setup-triggers returned HTTP 404. turtlebot3 also switched from anomaly-detector to diagnostic-bridge because faults bucket against the bridge on that stack.
  • relax nounset around ROS setup.bash sourcing - container scripts in moveit_pick_place and multi_ecu_aggregation start with set -eu, then source /opt/ros/jazzy/setup.bash, which dereferences AMENT_TRACE_SETUP_FILES without guarding for unset. Every inject/restore aborted with AMENT_TRACE_SETUP_FILES: unbound variable before the payload ran. Wrapped each source pair with set +u / set -u.
  • tb3: treat nav2 action rejection as expected in inject-localization-failure - the script scatters AMCL particles then asks bt-navigator to navigate; under scattered particles the action server returns HTTP 400 (the whole point of the demo), but curl -sf exit 22 aborted before the final print. Dropped -f to match inject-nav-failure on the same demo.
  • restore exec bit on injection/diagnostic scripts - 4 host-side helper scripts in sensor_diagnostics and moveit_pick_place lost their executable bit; README and run-demo.sh reference them as ./script.sh. chmod +x, no content change.

CI flake fix

  • multi-ecu: wait for actuation ECU aggregation before asserting - tests/smoke_test_multi_ecu.sh gated readiness on the planning ECU's path-planner only, then asserted on all three peer ECUs. When the actuation ECU was slower to aggregate into robot-alpha, the apps assertion ran early and reported "found 7" instead of >=10, failing build-and-test-multi-ecu intermittently. The gate now waits for a representative app from each peer ECU (path-planner for planning, motor-controller for actuation) before the discovery assertions run.

Test plan

  • shellcheck clean across all *.sh / *.bash (CI-equivalent find | xargs shellcheck)
  • Brought up the 3-ECU multi_ecu_aggregation stack locally and ran tests/smoke_test_multi_ecu.sh: 24/24 pass, with the gate now reporting planning ECU aggregated and actuation ECU aggregated before the assertions
  • Negative check: started the stack with the actuation ECU deliberately absent - the gate waits 60s and exits non-zero with actuation ECU not aggregated within 60s (motor-controller missing), instead of racing into a confusing "found 7" assertion failure

Related Issue

Split out from #58.

Checklist

  • Tested locally
  • README updated (if needed)

bburda added 6 commits May 29, 2026 21:47
The per-demo wrappers source lib/setup-trigger.sh and lib/watch-trigger.sh
from inside demos/<name>/. Both sourced files then re-source
triggers-api.sh with:

  SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"

When sourced, $0 stays the caller's script path - the demo wrapper -
not the lib file's path. So SCRIPT_DIR resolves to demos/<name>/ and
the second source fails with:

  demos/<name>/triggers-api.sh: No such file or directory

${BASH_SOURCE[0]} is the path of the file currently executing the line,
which is what we actually want.
…lisation

The gateway normalises app IDs with hyphens - the entity is registered
as e.g. diagnostic-bridge, manipulation-monitor, not diagnostic_bridge.
The trigger wrappers were assuming the ROS-node-name spelling
(underscore), which matches the reporting_sources path inside the
FaultEvent payload but not the URL-addressable entity ID. setup-triggers
returned HTTP 404 from /apps/<underscore_id>/triggers and watch-triggers
found no active trigger to subscribe to.

Switch every wrapper to the hyphenated form. For turtlebot3 also switch
from anomaly-detector to diagnostic-bridge: faults on this stack arrive
from /diagnostics through the bridge and bucket against the bridge's
faults collection, so the anomaly-detector trigger would never see an
event even with the correct ID.
Container scripts in moveit_pick_place and multi_ecu_aggregation start
with "set -eu" and then source /opt/ros/jazzy/setup.bash. setup.bash
dereferences AMENT_TRACE_SETUP_FILES without guarding for unset, so
under nounset the source aborts before any payload runs:

  /opt/ros/jazzy/setup.bash: line 8:
  AMENT_TRACE_SETUP_FILES: unbound variable

The Scripts API reports this as "Script exited with code 1" and the
inject/restore never lands. Wrap the source pair with set +u / set -u
so AMENT_* defaults can stay implicit while the rest of the script
keeps strict variable checking.
…alization-failure

The script reinitialises AMCL (scatters particles) and then asks
bt-navigator to navigate. Under scattered particles the action server
returns HTTP 400 with x-medkit-ros2-action-rejected - which is the
demo's intended failure mode, the whole point of the injection.

curl -sf turns that 400 into exit 22 and aborts the script before the
final "Localization failure injected." print, and the Scripts API
reports the script as failed even though the fault path it was supposed
to trigger fired correctly.

Drop -f on the navigate_to_pose call and print the response body
(matching the pattern used in inject-nav-failure on the same demo).
Four host-side helper scripts lost their executable bit somewhere along
the way:

  demos/sensor_diagnostics/inject-fault-scenario.sh
  demos/sensor_diagnostics/run-diagnostics.sh
  demos/moveit_pick_place/arm-self-test.sh
  demos/moveit_pick_place/planning-benchmark.sh

README and run-demo.sh both reference them as ./script.sh, so running
the demos as documented hits "Permission denied". chmod +x to restore
the bit; no content change.
…ting

The readiness gate polled only for the planning ECU's path-planner app,
then asserted on all three peer ECUs. When the actuation ECU was slower to
aggregate into robot-alpha, the apps assertion ran before its nodes were
linked and reported "found 7" (perception + planning only), failing
build-and-test-multi-ecu intermittently. The gate now waits for a
representative app from each peer ECU (path-planner for planning,
motor-controller for actuation) before the discovery assertions run.
Copilot AI review requested due to automatic review settings May 29, 2026 20:01
@bburda bburda mentioned this pull request May 29, 2026
13 tasks
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes several shell-script regressions across multiple demos and hardens the multi-ECU smoke test to reduce CI flakiness by gating on aggregation readiness from multiple peer ECUs.

Changes:

  • Make trigger helper libs source-safe by resolving their own directory via BASH_SOURCE[0].
  • Align demo trigger scripts to gateway-normalized, hyphenated entity IDs (and update TB3 to use diagnostic-bridge).
  • Reduce multi-ECU smoke-test flake by waiting for representative apps from both planning and actuation ECUs before asserting discovery.
  • Relax nounset around ROS setup.bash sourcing in several container scripts.

Reviewed changes

Copilot reviewed 19 out of 23 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/smoke_test_multi_ecu.sh Adds a stronger aggregation readiness gate by waiting for both planning + actuation marker apps.
lib/watch-trigger.sh Uses BASH_SOURCE[0] to resolve paths correctly when sourced.
lib/setup-trigger.sh Uses BASH_SOURCE[0] to resolve paths correctly when sourced.
demos/turtlebot3_integration/watch-triggers.sh Switches watched entity ID to diagnostic-bridge (hyphenated).
demos/turtlebot3_integration/setup-triggers.sh Switches trigger target to diagnostic-bridge and updates explanatory comments.
demos/turtlebot3_integration/container_scripts/nav2-stack/inject-localization-failure/script.bash Treats nav2 rejection as expected by removing curl -f and printing the response.
demos/sensor_diagnostics/watch-triggers.sh Switches watched entity ID to diagnostic-bridge (hyphenated).
demos/sensor_diagnostics/setup-triggers.sh Switches trigger target to diagnostic-bridge (hyphenated).
demos/sensor_diagnostics/run-diagnostics.sh Adds a Scripts API wrapper to run diagnostics from the host.
demos/sensor_diagnostics/inject-fault-scenario.sh Adds a Scripts API wrapper to inject a composite fault scenario from the host.
demos/multi_ecu_aggregation/container_scripts/planning-ecu/restore-normal/script.bash Wraps ROS environment sourcing with set +u/set -u to avoid nounset failures.
demos/multi_ecu_aggregation/container_scripts/planning-ecu/inject-planning-delay/script.bash Wraps ROS environment sourcing with set +u/set -u to avoid nounset failures.
demos/multi_ecu_aggregation/container_scripts/perception-ecu/restore-normal/script.bash Wraps ROS environment sourcing with set +u/set -u to avoid nounset failures.
demos/multi_ecu_aggregation/container_scripts/perception-ecu/inject-sensor-failure/script.bash Wraps ROS environment sourcing with set +u/set -u to avoid nounset failures.
demos/multi_ecu_aggregation/container_scripts/actuation-ecu/restore-normal/script.bash Wraps ROS environment sourcing with set +u/set -u to avoid nounset failures.
demos/multi_ecu_aggregation/container_scripts/actuation-ecu/inject-gripper-jam/script.bash Wraps ROS environment sourcing with set +u/set -u to avoid nounset failures.
demos/moveit_pick_place/watch-triggers.sh Switches watched entity ID to manipulation-monitor (hyphenated).
demos/moveit_pick_place/setup-triggers.sh Switches trigger target to manipulation-monitor (hyphenated).
demos/moveit_pick_place/planning-benchmark.sh Adds a Scripts API wrapper for the planning benchmark.
demos/moveit_pick_place/container_scripts/moveit-planning/restore-normal/script.bash Wraps ROS environment sourcing with set +u/set -u to avoid nounset failures.
demos/moveit_pick_place/container_scripts/moveit-planning/inject-planning-failure/script.bash Wraps ROS environment sourcing with set +u/set -u to avoid nounset failures.
demos/moveit_pick_place/container_scripts/moveit-planning/inject-collision/script.bash Wraps ROS environment sourcing with set +u/set -u to avoid nounset failures.
demos/moveit_pick_place/arm-self-test.sh Adds a Scripts API wrapper for the arm self-test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mfaferek93 mfaferek93 self-requested a review May 30, 2026 08:41
@mfaferek93 mfaferek93 merged commit c375504 into main May 30, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants