Skip to content

test: end-to-end integration harness (closes #28)#40

Merged
eric-becker merged 21 commits into
mainfrom
feat/integration-harness
May 2, 2026
Merged

test: end-to-end integration harness (closes #28)#40
eric-becker merged 21 commits into
mainfrom
feat/integration-harness

Conversation

@eric-becker
Copy link
Copy Markdown
Owner

Summary

Adds a Docker Compose-based end-to-end integration harness that stands up real EMQX + floodgate + meshtasticd on an isolated bridge network and verifies floodgate's behavior end-to-end. Six cases (drop / zerohop / passthru / noop / custom-key passthru / meshtasticd round-trip), each asserting on both the subscriber's delivered payload and floodgate's /health stats. Available as a one-shot CI job (after smoke) and as a local ./scripts/run-integration.sh with --keep and --teardown modes.

What's in the stack

docker-compose.test.yaml (isolated bridge network floodgate-test-net):

  • emqxemqx/emqx:6.1.1, REST exposed to host on :18083 for the runner
  • floodgate — built from local Dockerfile, /health exposed on host :18089 for the runner; uses tests/integration/config.yaml (drop_enabled: true, drop_portnums: [RANGE_TEST_APP], standard zerohop channels)
  • exhook-init — alpine + curl + jq sidecar that waits for EMQX REST then idempotently registers the ExHook at http://floodgate:9000
  • meshtasticdghcr.io/meshtastic/meshtasticd:latest, run with -s (SimRadio mode, no LoRa hardware) — matches upstream firmware's test_native.yml pattern
  • meshtasticd-init — Python + meshtastic CLI sidecar that, after meshtasticd's TCP API on 4403 is up, sets moduleConfig.mqtt.{enabled,address,root,...} and sends one probe text message
  • test-driver (under driver profile, runs only via compose run) — Python + paho-mqtt + meshtastic protobufs + cryptography. Crafts ServiceEnvelope packets, captures all msh/# deliveries, asserts on each case

MQTT 1883 and gRPC 9000 stay container-internal. Only /health and EMQX REST are mapped to the host (so the runner can probe readiness).

Six cases

Each case publishes a crafted (or organic, for case 6) Meshtastic packet and verifies BOTH the subscriber's captured bytes AND the /health stat counter. A behavior change with no stat increment, or a stat increment with no delivery effect, both fail the case loudly.

Case What it crafts Asserts
zerohop TEXT_MESSAGE_APP on LongFast, default key, hop_limit=3 Delivered with hop_limit=0; stats.zerohop
drop RANGE_TEST_APP on LongFast, default key NOT delivered to subscriber; stats.dropped
passthru TEXT_MESSAGE_APP on PrivateClear (not in zerohop list), default key Delivered byte-identical, hop_limit=3; stats.passthru
noop TEXT_MESSAGE_APP on LongFast, default key, already hop_limit=0 Delivered byte-identical; stats.noop
custom-key passthru RANGE_TEST_APP on PrivateNet, custom AES key Delivered byte-identical; stats.passthru ↑; stats.dropped did NOT increment (drop must not fire on unreadable portnum)
meshtasticd-roundtrip observes organic meshtasticd traffic At least one msh/.../e/... message with packet_id outside the crafted 0xAxxxxxxx range arrives within 60s

The crafted packets use a stable 0xAxxxxxxx packet_id namespace so they don't collide with meshtasticd's auto-generated IDs.

Runner

# One-shot — bring stack up, run cases, tear down, exit 0/non-zero
./scripts/run-integration.sh

# Ad-hoc poking — leave the stack running for inspection
./scripts/run-integration.sh --keep
#   floodgate /health: http://localhost:18089/health
#   EMQX dashboard:    http://localhost:18083  (admin / public)

# Tear down without running cases
./scripts/run-integration.sh --teardown

A new CI integration job runs the same script after smoke and dumps service logs on failure.

Verification constraint

Local end-to-end verification was not possible on the development host (k3s installs iptables FORWARD rules ahead of DOCKER-USER while the chain default is DROP, breaking inter-container traffic on Docker user-defined bridges). Each piece was verified individually — image builds clean, ruff lint passes, compose YAML parses, helper round-trip works in-image, every case is importable. CI on a clean GitHub Actions runner is the source of truth for the end-to-end run.

Test plan

  • CI lint job passes (the new tests/integration/test-driver/run.py is in scope for ruff check src/ tests/)
  • CI test job passes (tests/integration is now ignored from pytest)
  • CI smoke job passes
  • CI integration job passes — six PASS lines, exit 0
  • CI manifests job passes
  • After CI is green, run locally on a non-k3s host: ./scripts/run-integration.sh shows six PASS lines
  • ./scripts/run-integration.sh --keep leaves the stack running with /health and EMQX dashboard reachable on the documented host ports
  • ./scripts/run-integration.sh --teardown removes the stack and the floodgate-test-net network

Closes #28.

🤖 Generated with Claude Code

Eric Becker and others added 21 commits May 2, 2026 02:10
Local verification on this host is constrained by a k3s iptables
FORWARD policy that drops Docker user-defined-bridge inter-container
traffic. CI runs on clean GitHub Actions runners and will validate
the stack end-to-end in the integration job.
Without these flags, curl will hang indefinitely on an unreachable
broker (network failure rather than not-yet-ready). The wait loop's
60-iteration cap never advances. --connect-timeout 2 + --max-time 5
keeps each probe bounded so the loop and its overall timeout work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- register.sh: add explicit timeout-exit on the EMQX wait loop so a
  never-up broker fails fast and clearly, matching the pattern in
  init.sh and run-integration.sh
- docker-compose.test.yaml: test-driver now depends on meshtasticd-init
  completing successfully; the meshtasticd-roundtrip case no longer
  relies on emergent ordering plus the runner script's sleep
- CONTRIBUTING.md: CI job count was stale (four → five) after adding
  the integration job
CI's first run of the integration job aborted on `docker compose pull`:
ghcr.io/meshtastic/meshtasticd does not exist (NAME_UNKNOWN); the
official image is published to Docker Hub at meshtastic/meshtasticd.

The image also has no ENTRYPOINT and a sh-wrapped CMD
(`sh -cx 'meshtasticd --fsdir=/var/lib/meshtasticd'`), so passing
`command: ["-s"]` does not append to the existing invocation — it
replaces the whole CMD with `-s` and Docker tries to exec `-s`
directly. Override entrypoint + command together to invoke
`meshtasticd -s --fsdir=/var/lib/meshtasticd` cleanly.

Verified locally: container starts, prints "Running in simulated
mode" and "API server listen on TCP port 4403".
CI exposed two bugs:

1. init.sh used the wrong meshtastic CLI syntax. The CLI rejects
   --port 4403 (that flag is for serial ports) and accepts only flat
   dotted preference paths like mqtt.enabled — not the protobuf-nested
   form moduleConfig.mqtt.enabled. Verified locally: the corrected
   form ('--host HOST:PORT' + '--set mqtt.<key>') connects, prints
   "Set mqtt.<key> to <value>" for every key, and exits 0.

2. run-integration.sh skipped the compose-logs dump on the explicit-
   failure path (when 'docker compose run test-driver' returned
   non-zero). The workflow's 'if: failure()' log-dump step fires
   AFTER the runner's teardown, so all containers were already gone
   and the workflow log dump was empty. Now the runner dumps service
   logs before teardown whenever the test-driver returned non-zero.
The meshtasticd-init configuration path is fundamentally broken inside
the upstream image: any MQTT module config change triggers the firmware
to schedule "Reboot in 7 seconds", which calls execv() on itself —
and the exec fails inside the container ("execv() returned -1! No such
file or directory"). Reproduced locally with a reachable broker.
Working around it would require either pre-baking a /prefs/module.proto
binary protobuf or fighting Docker restart policies plus reboot timing
windows, both of which violate KISS.

The five crafted test cases (zerohop, drop, passthru, noop, custom-key
passthru) already exercise floodgate end-to-end with real Meshtastic
ServiceEnvelope protobufs — built with the same `meshtastic` Python
library the firmware uses internally. Wire format is identical;
meshtasticd-in-the-loop wasn't testing anything the crafted cases
don't already test, just adding CI flakiness.

Removed:
- meshtasticd + meshtasticd-init services from compose
- tests/integration/meshtasticd/ directory
- case_meshtasticd_roundtrip from run.py
- "Letting meshtasticd settle" sleep from runner script
- meshtasticd mentions from CONTRIBUTING.md and CLAUDE.md
@eric-becker eric-becker merged commit aca59bb into main May 2, 2026
8 checks passed
@eric-becker eric-becker deleted the feat/integration-harness branch May 2, 2026 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test: EMQX integration test in CI — end-to-end ExHook verification

1 participant