Skip to content

refactor: scope stack docker object names to avoid collision on shared hosts#20

Merged
kaiitunnz merged 4 commits into
mainfrom
kaiitunnz/fix/collision
May 6, 2026
Merged

refactor: scope stack docker object names to avoid collision on shared hosts#20
kaiitunnz merged 4 commits into
mainfrom
kaiitunnz/fix/collision

Conversation

@kaiitunnz
Copy link
Copy Markdown
Collaborator

@kaiitunnz kaiitunnz commented May 5, 2026

Purpose

The flowmesh stack CLI hardcoded the compose project name, container names, network name, and named volumes. Two contributors sharing a single host could not bring up independent stacks side-by-side: the second flowmesh stack up would either reuse the first stack's containers and volumes or fail outright on name collisions. This PR templates every stack-managed Docker object name behind an opt-in FLOWMESH_STACK_SUFFIX, so a fresh suffix yields a fully isolated stack.

Changes

  • cli/stack/src/flowmesh_cli_stack/assets/compose.yml — template the compose project name, all container_name fields, the network, and every named volume on ${FLOWMESH_STACK_SLUG:-flowmesh_node}; default WORKER_RESULTS_DIR to the slug-scoped volume so the server can mount the worker's results.
  • cli/stack/src/flowmesh_cli_stack/utils.py — add _resolve_stack_suffix (sanitizes the user-supplied suffix to a Docker-safe token, capped at 48 chars) and apply_stack_resource_env (resolves the slug, sets COMPOSE_PROJECT_NAME / FLOWMESH_STACK_SLUG, and defaults WORKER_RESULTS_DIR when empty).
  • cli/stack/src/flowmesh_cli_stack/stack.py — call apply_stack_resource_env from the stack loader so every compose invocation inherits the resolved slug.
  • cli/stack/src/flowmesh_cli_stack/env_schema.py + assets/.env.example — register FLOWMESH_STACK_SUFFIX and drop the hardcoded flowmesh_results defaults from SERVER_RESULTS_DIR / WORKER_RESULTS_DIR so they fall back to the slug-scoped volume.
  • docs/CLI.md, docs/ENV.md — document the multi-stack workflow and the suffix knob.
  • tests/cli/test_stack_utils.py — cover the empty / sanitized / invalid suffix paths and the WORKER_RESULTS_DIR default.
  • .github/workflows/security.yml + docs/CODE_STYLE.md — extend the pip-audit ignore list (and rationale table) for four new pillow advisories (GHSA-wjx4-4jcj-g98j, GHSA-5xmw-vc9v-4wf2, GHSA-r73j-pqj5-w3x7, GHSA-pwv6-vv43-88gr); all four require pillow ≥ 12.2.0, blocked by the existing gradio 5.50 cap held by vllm-omni 0.18.

Design

FLOWMESH_STACK_SUFFIX is the user-facing knob; apply_stack_resource_env resolves it once into FLOWMESH_STACK_SLUG and COMPOSE_PROJECT_NAME, which compose then interpolates into every templated name. An empty suffix keeps the historical flowmesh_node name, so single-stack users see no change.

The suffix is sanitized at the CLI boundary (punctuation → -, runs collapsed, capped at 48 chars) so arbitrary user input still produces valid Docker names; a suffix that sanitizes to empty fails fast with an actionable ValueError instead of an opaque compose error.

WORKER_RESULTS_DIR is defaulted in Python rather than in compose, because the server reads it at runtime and needs the actual slug-scoped volume name, not a compose-key alias. SERVER_RESULTS_DIR keeps its compose-key fallback since it is consumed only by compose itself.

Test Plan

  • uv run pre-commit run --all-files
  • uv run pytest tests/
  • Two-stack deployment on a shared host:
# >> Set .env for the first stack
flowmesh stack up && flowmesh stack worker up gpu -t 0

# >> Set .env for the second stack
flowmesh stack up && flowmesh stack worker up gpu -t 1
flowmesh stack clean    # Shutdown the second stack

# >> Set .env for the first stack
flowmesh workflow submit templates/dag_inference_example.yaml
flowmesh stack clean

Test Result

  • pre-commit passes on the changed files.
  • 592 unit tests pass under uv run pytest tests/.
  • The two-stack deployment above succeeds end-to-end: both stacks come up without container/volume/network collisions, the second stack tears down cleanly with flowmesh stack clean while the first keeps running, and the workflow submitted against the first stack runs to completion before the final teardown.

Pre-submission Checklist
  • I have read the contribution guidelines.
  • I have run pre-commit run --all-files and fixed any issues.
  • I have added or updated tests covering my changes (if applicable).
  • I have verified that uv run pytest tests/ passes locally.
  • If I changed shared schemas or proto definitions, I have checked downstream compatibility across Server and Worker.
  • If I changed the SDK or CLI, I have verified the affected packages work (uv sync --all-extras --frozen).
  • If this is a breaking change, I have prefixed the PR title with [BREAKING] and described migration steps above.
  • I have updated documentation or config examples if user-facing behavior changed.

kaiitunnz added 2 commits May 5, 2026 17:50
Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Copy link
Copy Markdown
Collaborator

@timzsu timzsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a few minor comments.

Comment thread docs/CLI.md Outdated
`FLOWMESH_VERSION` to a PR-identifying slug (e.g. `myfeature`) so parallel
PRs don't overwrite each other's local images.

When multiple people share one host, give each stack its own
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When multiple people share one host, give each stack its own
When multiple deployments share one host, give each stack its own

Those deployments are not necessarily owned by multiple people. One people testing multiple deployments concurrently can benefit from this as well.

Comment thread docs/CLI.md Outdated
When multiple people share one host, give each stack its own
`FLOWMESH_STACK_SUFFIX` and distinct `SERVER_HTTP_PORT`,
`SERVER_GRPC_PORT`, `REDIS_CONTROL_PORT`, and `REDIS_TELEMETRY_PORT`.
The suffix isolates Docker object names; the ports isolate host bindings.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The suffix isolates Docker object names; the ports isolate host bindings.
The suffix isolates Docker object names (including containers, volumes, and networks); the ports isolate host bindings.

The Docker image names are not isolated, so we still need to avoid image tag overlap. This is explicit in cli/stack/src/flowmesh_cli_stack/env_schema.py, but here I am not sure if it will be a bit ambiguous for agents.

Comment thread docs/ENV.md Outdated
directory or volume so the server can access the worker's task results.
Otherwise, downstream tasks that depend on upstream outputs will stall
in the dispatching loop indefinitely.
- When multiple people share one host, you can set `FLOWMESH_STACK_SUFFIX`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- When multiple people share one host, you can set `FLOWMESH_STACK_SUFFIX`
- When multiple deployments share one host, you can set `FLOWMESH_STACK_SUFFIX`

Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>

Co-authored-by: Zhengyuan Su (苏政渊) <su.zhengyuan@u.nus.edu>
@kaiitunnz kaiitunnz requested a review from timzsu May 6, 2026 04:37
Copy link
Copy Markdown
Collaborator

@timzsu timzsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
@kaiitunnz kaiitunnz merged commit 1d13bb5 into main May 6, 2026
10 checks passed
@kaiitunnz kaiitunnz deleted the kaiitunnz/fix/collision branch May 6, 2026 05:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants