Skip to content

feat: add flowmesh stack bundle init#38

Merged
kaiitunnz merged 8 commits into
mainfrom
kaiitunnz/feat/bundle-init
May 12, 2026
Merged

feat: add flowmesh stack bundle init#38
kaiitunnz merged 8 commits into
mainfrom
kaiitunnz/feat/bundle-init

Conversation

@kaiitunnz
Copy link
Copy Markdown
Collaborator

@kaiitunnz kaiitunnz commented May 12, 2026

Purpose

Give pip-only deploy hosts a one-command bootstrap, and make the bootstrap role-aware so a worker bundle actually produces a worker-shaped deployment. flowmesh stack bundle export only works inside a repo checkout (it reads secrets/tls/* and configs/worker_config.yaml from CWD, and shells out to pip wheel ./sdk ./cli for the wheel bundle). A host that ran pip install flowmesh[cli] from PyPI has none of those sources, so the only way to land on the layout stack up expects was to produce a tarball elsewhere and tar -xzf it. And worker-mode bundles were previously half-broken: the tarball staging skipped the Redis cert/key files, but the chained flowmesh stack init still wrote a NODE_ROLE=root env that expected them.

Changes

  • flowmesh stack bundle init scaffolds the deploy layout in-place (secrets/tls/{server,redis}/, configs/worker_config.yaml, .env) and chains stack init. Layout matches the shipped .env.example / compose.yml defaults, so stack up resolves to the paths the bundle wrote to.
  • --role is end-to-end role-aware across stack init, bundle init, and bundle export <role>. Flips NODE_ROLE and blanks the Redis cert/key paths on worker; the worker bundle's install.sh propagates --role worker into its chained init. Worker bundle init next-steps now point operators at redis-ca.pem only, matching the scaffold.
  • stack init --deploy pins FLOWMESH_VERSION to v<installed flowmesh-cli-stack version> (fallback latest+warning if metadata is missing). The v prefix matches the GHCR tag convention enforced by release-images.yml. bundle init implies it; bundle export's install.sh appends --deploy so CLI install and image pull land at the same version.
  • install.sh anchors to the bundle directory via cd "$(dirname "$0")", so ./flowmesh_server_bundle/install.sh works from any CWD without scattering .venv, .env, or --include-wheels lookups outside the bundle.
  • Drive-bys: lumid-hooks / lumid-data-sdk switched from git+url to PyPI pins; dead is_resource branch in assets.py removed (it was firing DeprecationWarning on every asset_path call).

Design

stack init now renders live from STACK_ENV_SCHEMA instead of copying the static .env.example. The shipped example stays the human reference, CI-verified against the root render. render_env_example takes an overrides map; WORKER_ROLE_OVERRIDES (next to the schema) holds the keys whose worker default diverges from root, and --deploy adds a single dynamic override for FLOWMESH_VERSION.

Image tag alignment happens automatically: bundle export's install.sh pins flowmesh[cli]==X and then runs stack init --deploy, which reads the just-installed package's version via importlib.metadata — same X — and pins FLOWMESH_VERSION=vX. No literal latest lives in the schema or override path.

bundle init is non-destructive: existing worker_config.yaml and TLS dirs are preserved; --force only governs .env. test_worker_role_render_passes_schema_validation pins the contract that rendered worker .envs are valid by the schema's own validators, so future drift trips at PR time, not on the deploy host.

Test Plan

uv run pre-commit run --all-files
uv run pytest tests --ignore=tests/worker/test_mp_executor_cleanup_gpu.py

Manual end-to-end smoke from a clean directory:

flowmesh stack bundle init --dest deploy --role worker
# verifies the worker .env shape, the scaffolded layout, the
# next-steps block staying accurate post-cd, REDIS_TLS_CA_FILE staying
# populated while cert/key are blanked, and FLOWMESH_VERSION pinned to
# v<installed flowmesh-cli-stack version>.

Test Result

$ uv run pre-commit run --all-files
# All checks passed

$ uv run pytest tests --ignore=tests/worker/test_mp_executor_cleanup_gpu.py
# 862 passed, 18 warnings in 29.87s

Pre-submission Checklist
  • I have read the contribution guidelines.
  • I have run `pre-commit run --all-files` and fixed any issues.
  • I have added or updated tests covering my changes (if applicable).
  • I have verified that `uv run pytest tests/` passes locally.
  • If I changed shared schemas or proto definitions, I have checked downstream compatibility across Server and Worker.
  • If I changed the SDK or CLI, I have verified the affected packages work (`uv sync --all-packages --group ci --frozen`).
  • If this is a breaking change, I have prefixed the PR title with `[BREAKING]` and described migration steps above.
  • I have updated documentation or config examples if user-facing behavior changed.

kaiitunnz added 6 commits May 12, 2026 18:28
`bundle export` only works inside a repo checkout — it reads
secrets/tls/* and configs/worker_config.yaml from CWD and shells out to
`pip wheel ./sdk ./cli`. A deploy host that installed `flowmesh[cli]`
from PyPI has none of those sources, so there was no way to land on the
on-disk layout `stack up` expects without first producing a tarball
elsewhere and `tar -xzf`ing it.

`bundle init` scaffolds that layout directly: empty
secrets/tls/{server,redis}/ placeholders, an empty
configs/worker_config.yaml, and a .env from the shipped example. The
TLS/worker_config paths are now driven by module-level constants shared
with `_copy_server_assets`, and those constants match the defaults
already encoded in .env.example, env_schema.py, and compose.yml
(secrets/tls/..., configs/worker_config.yaml) — `stack up` resolves to
the same paths bundle init / bundle export write to, so the bundle is
operable without editing path values in .env first.

Drive-by fix: `_copy_server_assets` now creates the configs/ parent
before copying worker_config.yaml. Previously worked only because
the destination filename had no parent component; the layout move
needed the mkdir to keep `bundle export` from crashing on the copy.

Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
The is_resource call lived inside an `if not …: pass` block that did
nothing, so it only existed to emit DeprecationWarning when callers
exercised asset_path. as_file already raises FileNotFoundError on a
missing resource, which the surrounding try/except already maps to
AssetNotFoundError.

Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
bundle_export unconditionally called _copy_redis_tls_assets(ca_only=True),
a leftover from an earlier migration. A root-node bundle skipped
redis-server.pem / redis-server.key, so the exported tarball couldn't
boot local Redis even though the shipped .env.example and compose.yml
point at those files. Surface the node role as a positional argument
to bundle_export (default root); ca_only is now role == worker, so a
root bundle stages the full Redis TLS material and a worker bundle
keeps the CA-only shape.

The role string was already encoded in stack.py and env_schema.py as
ad-hoc literals. Promote it to a shared StrEnum (flowmesh.models.nodes.
NodeRole), widen EnvVar.choices to Iterable[str] so the enum class can
be passed directly, and replace the literals at every callsite.
test_schema_compat.py adds NodeRole to the SDK <-> server enum-pair
compat check.

bundle_init next-steps fixes:
- Append --env-file <path> to the printed `flowmesh stack pull` / `up`
  lines when --env-file is non-default (the bare commands would have
  read ./.env).
- Suppress the "drop TLS certs into ..." line when --no-tls is set.

Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
The previous commit on this branch role-gated only the Redis TLS file
copy in bundle export. install.sh and stack init kept writing the
static .env.example verbatim, so a worker-mode bundle still landed on
a NODE_ROLE=root config that expected local Redis and the cert/key
files the worker bundle had just declined to ship.

Switch stack init from copying the checked-in .env.example to rendering
live from STACK_ENV_SCHEMA via render_env_example, which gained an
optional `overrides: Mapping[str, str]` kwarg. The shipped example is
still authoritative (scripts/dev/check_env_examples.py verifies the
root render equals the tracked copy), but stack init can now produce
worker-shaped output without a parallel template.

Add WORKER_ROLE_OVERRIDES next to STACK_ENV_SCHEMA: NODE_ROLE flips to
"worker"; REDIS_TLS_CERT_FILE / REDIS_TLS_KEY_FILE blank out because
src/server only reads REDIS_TLS_CA_FILE and the cert/key files are
consumed exclusively by the root-profile-gated Redis services. The
connection-side knobs (REDIS_*_URL, ACL, credentials, CA) stay
populated; the operator still has to repoint REDIS_CONTROL_URL /
REDIS_TELEMETRY_URL at the root node before stack up, which is
intrinsic since only the operator knows that address.

Plumb `--role` through stack init, bundle init, and bundle export's
install.sh so a worker bundle's bootstrap chain ends in
`flowmesh stack init --env-file "\$ENV_FILE" --role worker`. A shared
parse_node_role helper replaces the inline try/except now duplicated
across three callsites.

Coverage: test_worker_role_render_passes_schema_validation runs the
rendered worker .env through validate_env_values to pin the contract
that role overrides don't trip schema-level required/min_value
checks — catches drift if a future required=True lands on a blanked
key.

Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
stack init writes FLOWMESH_VERSION=dev (the local-iteration placeholder)
by default. For deploy-shaped scaffolding the right tag is the one
matching the running flowmesh-cli-stack, so compose pulls server/worker
images at the same version the CLI was installed at.

--deploy reads the installed package version via importlib.metadata and
pins FLOWMESH_VERSION to it, falling back to 'latest' with a warning
when the metadata is missing. The fallback keeps the bootstrap usable
on hosts where the package isn't pip-installed in a way that exposes
metadata; the operator can edit .env after.

bundle init implies --deploy; bundle export's install.sh now emits
--deploy in its chained stack init call. Combined with install.sh
pinning flowmesh[cli]==X via _published_cli_spec, the resulting deploy
host installs the pinned CLI, scaffolds .env at the same version, and
compose pulls aligned images. No literal "latest" lives in the schema
or override path.

Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
@kaiitunnz kaiitunnz force-pushed the kaiitunnz/feat/bundle-init branch from e23b8b1 to ad8c142 Compare May 12, 2026 21:20
kaiitunnz added 2 commits May 12, 2026 21:33
Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
@kaiitunnz kaiitunnz merged commit 49a8fd2 into main May 12, 2026
13 checks passed
@kaiitunnz kaiitunnz deleted the kaiitunnz/feat/bundle-init branch May 12, 2026 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant