test: end-to-end integration harness (closes #28)#40
Merged
Conversation
Local verification on this host is constrained by a k3s iptables FORWARD policy that drops Docker user-defined-bridge inter-container traffic. CI runs on clean GitHub Actions runners and will validate the stack end-to-end in the integration job.
Without these flags, curl will hang indefinitely on an unreachable broker (network failure rather than not-yet-ready). The wait loop's 60-iteration cap never advances. --connect-timeout 2 + --max-time 5 keeps each probe bounded so the loop and its overall timeout work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- register.sh: add explicit timeout-exit on the EMQX wait loop so a never-up broker fails fast and clearly, matching the pattern in init.sh and run-integration.sh - docker-compose.test.yaml: test-driver now depends on meshtasticd-init completing successfully; the meshtasticd-roundtrip case no longer relies on emergent ordering plus the runner script's sleep - CONTRIBUTING.md: CI job count was stale (four → five) after adding the integration job
CI's first run of the integration job aborted on `docker compose pull`: ghcr.io/meshtastic/meshtasticd does not exist (NAME_UNKNOWN); the official image is published to Docker Hub at meshtastic/meshtasticd. The image also has no ENTRYPOINT and a sh-wrapped CMD (`sh -cx 'meshtasticd --fsdir=/var/lib/meshtasticd'`), so passing `command: ["-s"]` does not append to the existing invocation — it replaces the whole CMD with `-s` and Docker tries to exec `-s` directly. Override entrypoint + command together to invoke `meshtasticd -s --fsdir=/var/lib/meshtasticd` cleanly. Verified locally: container starts, prints "Running in simulated mode" and "API server listen on TCP port 4403".
CI exposed two bugs:
1. init.sh used the wrong meshtastic CLI syntax. The CLI rejects
--port 4403 (that flag is for serial ports) and accepts only flat
dotted preference paths like mqtt.enabled — not the protobuf-nested
form moduleConfig.mqtt.enabled. Verified locally: the corrected
form ('--host HOST:PORT' + '--set mqtt.<key>') connects, prints
"Set mqtt.<key> to <value>" for every key, and exits 0.
2. run-integration.sh skipped the compose-logs dump on the explicit-
failure path (when 'docker compose run test-driver' returned
non-zero). The workflow's 'if: failure()' log-dump step fires
AFTER the runner's teardown, so all containers were already gone
and the workflow log dump was empty. Now the runner dumps service
logs before teardown whenever the test-driver returned non-zero.
The meshtasticd-init configuration path is fundamentally broken inside
the upstream image: any MQTT module config change triggers the firmware
to schedule "Reboot in 7 seconds", which calls execv() on itself —
and the exec fails inside the container ("execv() returned -1! No such
file or directory"). Reproduced locally with a reachable broker.
Working around it would require either pre-baking a /prefs/module.proto
binary protobuf or fighting Docker restart policies plus reboot timing
windows, both of which violate KISS.
The five crafted test cases (zerohop, drop, passthru, noop, custom-key
passthru) already exercise floodgate end-to-end with real Meshtastic
ServiceEnvelope protobufs — built with the same `meshtastic` Python
library the firmware uses internally. Wire format is identical;
meshtasticd-in-the-loop wasn't testing anything the crafted cases
don't already test, just adding CI flakiness.
Removed:
- meshtasticd + meshtasticd-init services from compose
- tests/integration/meshtasticd/ directory
- case_meshtasticd_roundtrip from run.py
- "Letting meshtasticd settle" sleep from runner script
- meshtasticd mentions from CONTRIBUTING.md and CLAUDE.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a Docker Compose-based end-to-end integration harness that stands up real EMQX + floodgate + meshtasticd on an isolated bridge network and verifies floodgate's behavior end-to-end. Six cases (drop / zerohop / passthru / noop / custom-key passthru / meshtasticd round-trip), each asserting on both the subscriber's delivered payload and floodgate's
/healthstats. Available as a one-shot CI job (aftersmoke) and as a local./scripts/run-integration.shwith--keepand--teardownmodes.What's in the stack
docker-compose.test.yaml(isolated bridge networkfloodgate-test-net):emqx—emqx/emqx:6.1.1, REST exposed to host on:18083for the runnerfloodgate— built from localDockerfile,/healthexposed on host:18089for the runner; usestests/integration/config.yaml(drop_enabled: true,drop_portnums: [RANGE_TEST_APP], standard zerohop channels)exhook-init— alpine + curl + jq sidecar that waits for EMQX REST then idempotently registers the ExHook athttp://floodgate:9000meshtasticd—ghcr.io/meshtastic/meshtasticd:latest, run with-s(SimRadio mode, no LoRa hardware) — matches upstream firmware'stest_native.ymlpatternmeshtasticd-init— Python + meshtastic CLI sidecar that, after meshtasticd's TCP API on 4403 is up, setsmoduleConfig.mqtt.{enabled,address,root,...}and sends one probe text messagetest-driver(underdriverprofile, runs only viacompose run) — Python + paho-mqtt + meshtastic protobufs + cryptography. CraftsServiceEnvelopepackets, captures allmsh/#deliveries, asserts on each caseMQTT 1883 and gRPC 9000 stay container-internal. Only
/healthand EMQX REST are mapped to the host (so the runner can probe readiness).Six cases
Each case publishes a crafted (or organic, for case 6) Meshtastic packet and verifies BOTH the subscriber's captured bytes AND the
/healthstat counter. A behavior change with no stat increment, or a stat increment with no delivery effect, both fail the case loudly.zerohopTEXT_MESSAGE_APPonLongFast, default key,hop_limit=3hop_limit=0;stats.zerohop↑dropRANGE_TEST_APPonLongFast, default keystats.dropped↑passthruTEXT_MESSAGE_APPonPrivateClear(not in zerohop list), default keyhop_limit=3;stats.passthru↑noopTEXT_MESSAGE_APPonLongFast, default key, alreadyhop_limit=0stats.noop↑custom-key passthruRANGE_TEST_APPonPrivateNet, custom AES keystats.passthru↑;stats.droppeddid NOT increment (drop must not fire on unreadable portnum)meshtasticd-roundtripmsh/.../e/...message with packet_id outside the crafted0xAxxxxxxxrange arrives within 60sThe crafted packets use a stable
0xAxxxxxxxpacket_id namespace so they don't collide with meshtasticd's auto-generated IDs.Runner
A new CI
integrationjob runs the same script aftersmokeand dumps service logs on failure.Verification constraint
Local end-to-end verification was not possible on the development host (k3s installs iptables FORWARD rules ahead of
DOCKER-USERwhile the chain default isDROP, breaking inter-container traffic on Docker user-defined bridges). Each piece was verified individually — image builds clean, ruff lint passes, compose YAML parses, helper round-trip works in-image, every case is importable. CI on a clean GitHub Actions runner is the source of truth for the end-to-end run.Test plan
lintjob passes (the newtests/integration/test-driver/run.pyis in scope forruff check src/ tests/)testjob passes (tests/integrationis now ignored from pytest)smokejob passesintegrationjob passes — six PASS lines, exit 0manifestsjob passes./scripts/run-integration.shshows six PASS lines./scripts/run-integration.sh --keepleaves the stack running with/healthand EMQX dashboard reachable on the documented host ports./scripts/run-integration.sh --teardownremoves the stack and thefloodgate-test-netnetworkCloses #28.
🤖 Generated with Claude Code