feat: add flowmesh stack bundle init by kaiitunnz · Pull Request #38 · mlsys-io/FlowMesh

kaiitunnz · 2026-05-12T18:41:34Z

Purpose

Give pip-only deploy hosts a one-command bootstrap, and make the bootstrap role-aware so a worker bundle actually produces a worker-shaped deployment. flowmesh stack bundle export only works inside a repo checkout (it reads secrets/tls/* and configs/worker_config.yaml from CWD, and shells out to pip wheel ./sdk ./cli for the wheel bundle). A host that ran pip install flowmesh[cli] from PyPI has none of those sources, so the only way to land on the layout stack up expects was to produce a tarball elsewhere and tar -xzf it. And worker-mode bundles were previously half-broken: the tarball staging skipped the Redis cert/key files, but the chained flowmesh stack init still wrote a NODE_ROLE=root env that expected them.

Changes

flowmesh stack bundle init scaffolds the deploy layout in-place (secrets/tls/{server,redis}/, configs/worker_config.yaml, .env) and chains stack init. Layout matches the shipped .env.example / compose.yml defaults, so stack up resolves to the paths the bundle wrote to.
--role is end-to-end role-aware across stack init, bundle init, and bundle export <role>. Flips NODE_ROLE and blanks the Redis cert/key paths on worker; the worker bundle's install.sh propagates --role worker into its chained init. Worker bundle init next-steps now point operators at redis-ca.pem only, matching the scaffold.
stack init --deploy pins FLOWMESH_VERSION to v<installed flowmesh-cli-stack version> (fallback latest+warning if metadata is missing). The v prefix matches the GHCR tag convention enforced by release-images.yml. bundle init implies it; bundle export's install.sh appends --deploy so CLI install and image pull land at the same version.
install.sh anchors to the bundle directory via cd "$(dirname "$0")", so ./flowmesh_server_bundle/install.sh works from any CWD without scattering .venv, .env, or --include-wheels lookups outside the bundle.
Drive-bys: lumid-hooks / lumid-data-sdk switched from git+url to PyPI pins; dead is_resource branch in assets.py removed (it was firing DeprecationWarning on every asset_path call).

Design

stack init now renders live from STACK_ENV_SCHEMA instead of copying the static .env.example. The shipped example stays the human reference, CI-verified against the root render. render_env_example takes an overrides map; WORKER_ROLE_OVERRIDES (next to the schema) holds the keys whose worker default diverges from root, and --deploy adds a single dynamic override for FLOWMESH_VERSION.

Image tag alignment happens automatically: bundle export's install.sh pins flowmesh[cli]==X and then runs stack init --deploy, which reads the just-installed package's version via importlib.metadata — same X — and pins FLOWMESH_VERSION=vX. No literal latest lives in the schema or override path.

bundle init is non-destructive: existing worker_config.yaml and TLS dirs are preserved; --force only governs .env. test_worker_role_render_passes_schema_validation pins the contract that rendered worker .envs are valid by the schema's own validators, so future drift trips at PR time, not on the deploy host.

Test Plan

uv run pre-commit run --all-files
uv run pytest tests --ignore=tests/worker/test_mp_executor_cleanup_gpu.py

Manual end-to-end smoke from a clean directory:

flowmesh stack bundle init --dest deploy --role worker
# verifies the worker .env shape, the scaffolded layout, the
# next-steps block staying accurate post-cd, REDIS_TLS_CA_FILE staying
# populated while cert/key are blanked, and FLOWMESH_VERSION pinned to
# v<installed flowmesh-cli-stack version>.

Test Result

$ uv run pre-commit run --all-files
# All checks passed

$ uv run pytest tests --ignore=tests/worker/test_mp_executor_cleanup_gpu.py
# 862 passed, 18 warnings in 29.87s

Pre-submission Checklist

I have read the contribution guidelines.
I have run `pre-commit run --all-files` and fixed any issues.
I have added or updated tests covering my changes (if applicable).
I have verified that `uv run pytest tests/` passes locally.
If I changed shared schemas or proto definitions, I have checked downstream compatibility across Server and Worker.
If I changed the SDK or CLI, I have verified the affected packages work (`uv sync --all-packages --group ci --frozen`).
If this is a breaking change, I have prefixed the PR title with `[BREAKING]` and described migration steps above.
I have updated documentation or config examples if user-facing behavior changed.

`bundle export` only works inside a repo checkout — it reads secrets/tls/* and configs/worker_config.yaml from CWD and shells out to `pip wheel ./sdk ./cli`. A deploy host that installed `flowmesh[cli]` from PyPI has none of those sources, so there was no way to land on the on-disk layout `stack up` expects without first producing a tarball elsewhere and `tar -xzf`ing it. `bundle init` scaffolds that layout directly: empty secrets/tls/{server,redis}/ placeholders, an empty configs/worker_config.yaml, and a .env from the shipped example. The TLS/worker_config paths are now driven by module-level constants shared with `_copy_server_assets`, and those constants match the defaults already encoded in .env.example, env_schema.py, and compose.yml (secrets/tls/..., configs/worker_config.yaml) — `stack up` resolves to the same paths bundle init / bundle export write to, so the bundle is operable without editing path values in .env first. Drive-by fix: `_copy_server_assets` now creates the configs/ parent before copying worker_config.yaml. Previously worked only because the destination filename had no parent component; the layout move needed the mkdir to keep `bundle export` from crashing on the copy. Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>

Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>

The is_resource call lived inside an `if not …: pass` block that did nothing, so it only existed to emit DeprecationWarning when callers exercised asset_path. as_file already raises FileNotFoundError on a missing resource, which the surrounding try/except already maps to AssetNotFoundError. Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>

bundle_export unconditionally called _copy_redis_tls_assets(ca_only=True), a leftover from an earlier migration. A root-node bundle skipped redis-server.pem / redis-server.key, so the exported tarball couldn't boot local Redis even though the shipped .env.example and compose.yml point at those files. Surface the node role as a positional argument to bundle_export (default root); ca_only is now role == worker, so a root bundle stages the full Redis TLS material and a worker bundle keeps the CA-only shape. The role string was already encoded in stack.py and env_schema.py as ad-hoc literals. Promote it to a shared StrEnum (flowmesh.models.nodes. NodeRole), widen EnvVar.choices to Iterable[str] so the enum class can be passed directly, and replace the literals at every callsite. test_schema_compat.py adds NodeRole to the SDK <-> server enum-pair compat check. bundle_init next-steps fixes: - Append --env-file <path> to the printed `flowmesh stack pull` / `up` lines when --env-file is non-default (the bare commands would have read ./.env). - Suppress the "drop TLS certs into ..." line when --no-tls is set. Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>

The previous commit on this branch role-gated only the Redis TLS file copy in bundle export. install.sh and stack init kept writing the static .env.example verbatim, so a worker-mode bundle still landed on a NODE_ROLE=root config that expected local Redis and the cert/key files the worker bundle had just declined to ship. Switch stack init from copying the checked-in .env.example to rendering live from STACK_ENV_SCHEMA via render_env_example, which gained an optional `overrides: Mapping[str, str]` kwarg. The shipped example is still authoritative (scripts/dev/check_env_examples.py verifies the root render equals the tracked copy), but stack init can now produce worker-shaped output without a parallel template. Add WORKER_ROLE_OVERRIDES next to STACK_ENV_SCHEMA: NODE_ROLE flips to "worker"; REDIS_TLS_CERT_FILE / REDIS_TLS_KEY_FILE blank out because src/server only reads REDIS_TLS_CA_FILE and the cert/key files are consumed exclusively by the root-profile-gated Redis services. The connection-side knobs (REDIS_*_URL, ACL, credentials, CA) stay populated; the operator still has to repoint REDIS_CONTROL_URL / REDIS_TELEMETRY_URL at the root node before stack up, which is intrinsic since only the operator knows that address. Plumb `--role` through stack init, bundle init, and bundle export's install.sh so a worker bundle's bootstrap chain ends in `flowmesh stack init --env-file "\$ENV_FILE" --role worker`. A shared parse_node_role helper replaces the inline try/except now duplicated across three callsites. Coverage: test_worker_role_render_passes_schema_validation runs the rendered worker .env through validate_env_values to pin the contract that role overrides don't trip schema-level required/min_value checks — catches drift if a future required=True lands on a blanked key. Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>

stack init writes FLOWMESH_VERSION=dev (the local-iteration placeholder) by default. For deploy-shaped scaffolding the right tag is the one matching the running flowmesh-cli-stack, so compose pulls server/worker images at the same version the CLI was installed at. --deploy reads the installed package version via importlib.metadata and pins FLOWMESH_VERSION to it, falling back to 'latest' with a warning when the metadata is missing. The fallback keeps the bootstrap usable on hosts where the package isn't pip-installed in a way that exposes metadata; the operator can edit .env after. bundle init implies --deploy; bundle export's install.sh now emits --deploy in its chained stack init call. Combined with install.sh pinning flowmesh[cli]==X via _published_cli_spec, the resulting deploy host installs the pinned CLI, scaffolds .env at the same version, and compose pulls aligned images. No literal "latest" lives in the schema or override path. Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>

Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>

kaiitunnz added 6 commits May 12, 2026 18:28

chore: switch lumid deps from git+url to PyPI pins

c9c0232

Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>

kaiitunnz force-pushed the kaiitunnz/feat/bundle-init branch from e23b8b1 to ad8c142 Compare May 12, 2026 21:20

kaiitunnz added 2 commits May 12, 2026 21:33

fix: prefix package version with "v"

69ba9ed

Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>

fix: anchor install.sh at the bundle dir and improve instruction log

c942d95

Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>

kaiitunnz merged commit 49a8fd2 into main May 12, 2026
13 checks passed

kaiitunnz deleted the kaiitunnz/feat/bundle-init branch May 12, 2026 21:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add flowmesh stack bundle init#38

feat: add flowmesh stack bundle init#38
kaiitunnz merged 8 commits into
mainfrom
kaiitunnz/feat/bundle-init

kaiitunnz commented May 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kaiitunnz commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

Design

Test Plan

Test Result

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kaiitunnz commented May 12, 2026 •

edited

Loading