Skip to content

feat: add ignore_unreachable to NodeClient.destroy_all_workers#2

Merged
timzsu merged 7 commits into
mainfrom
zsu/destroy-workers-ignore-unreachable
Apr 29, 2026
Merged

feat: add ignore_unreachable to NodeClient.destroy_all_workers#2
timzsu merged 7 commits into
mainfrom
zsu/destroy-workers-ignore-unreachable

Conversation

@timzsu
Copy link
Copy Markdown
Collaborator

@timzsu timzsu commented Apr 29, 2026

Purpose

Teardown flows currently wrap destroy_all_workers() in their own try/except FlowMeshConnectionError so they don't crash when a stack up failed before the server became reachable. Lift that pattern into the SDK as an opt-in keyword.

Changes

  • sdk/stack/src/flowmesh_stack/node_client.pyNodeClient.destroy_all_workers accepts ignore_unreachable: bool = False and returns bool. With True, FlowMeshConnectionError is swallowed and the call returns False. Other errors propagate either way.
  • tests/sdk/test_node_client.py — covers both branches: default re-raises FlowMeshConnectionError; ignore_unreachable=True returns False.

Design

Keyword-only flag follows pathlib.Path.unlink(missing_ok=True). Bool return is the canonical "did it reach the server?" signal; the SDK doesn't log itself so callers control messaging.

Test Plan

uv run pre-commit run --all-files
uv run pytest tests/ -q

Test Result

  • pre-commit — clean (isort / black / ruff / codespell / mypy / sync-requirements)
  • pytest — 531 passed in 73.4s
  • env-examples + sync-requirements + check-pr-title — all pass

Pre-submission Checklist
  • I have read CONTRIBUTING.md (or AGENTS.md if no CONTRIBUTING.md).
  • I have run uv run pre-commit run --all-files and fixed any issues.
  • I have added or updated tests covering my changes (if applicable).
  • I have verified that the test suite passes locally.
  • If this is a breaking change, I have prefixed the PR title with [BREAKING] and described migration steps above.

timzsu and others added 2 commits April 29, 2026 09:56
Teardown flows that need to tolerate a never-reachable FlowMesh server
(e.g. a previous ``stack up`` that failed before the server was healthy)
currently have to wrap the destroy call in try/except. Lift that into a
``ignore_unreachable`` keyword on ``destroy_all_workers`` itself: when
True, ``FlowMeshConnectionError`` is swallowed and the method returns
cleanly. There are no workers to destroy when the server isn't there
anyway. Other errors (auth, 5xx) still propagate so genuine
misconfiguration stays loud.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
…ten _drain_workers

``NodeClient.destroy_worker`` and ``NodeClient.stop_worker`` gain the same
``ignore_unreachable`` flag as ``destroy_all_workers``: when ``True``,
``FlowMeshConnectionError`` is swallowed and the method returns cleanly;
auth, 5xx, and other errors still propagate. Two new tests per method
cover both branches.

``flowmesh_cli_stack.stack._drain_workers`` switches from a broad
``except Exception`` (which silently swallowed auth / 5xx / programming
bugs) to ``destroy_all_workers(ignore_unreachable=True)``. Connection-
down is still tolerated during ``stack down`` / ``clean`` / ``reset``,
but real misconfiguration now surfaces instead of being logged and
ignored.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
@timzsu timzsu changed the title feat: add ignore_unreachable flag to NodeClient.destroy_all_workers feat: add ignore_unreachable to NodeClient destroy/stop methods Apr 29, 2026
timzsu and others added 3 commits April 29, 2026 10:59
…unreachable

Brings ``destroy_all_workers``'s docstring back to the same shape as
``destroy_worker`` / ``stop_worker`` — the rationale belongs in the PR
description, not three places in the source.

When ``ignore_unreachable=True`` swallows a ``FlowMeshConnectionError``,
each method now logs a WARNING via ``flowmesh_stack.node_client``'s
module logger so teardown flows that previously emitted a "skipping"
warning at the call site keep that visibility. Tests assert the warning
is emitted on each swallow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
….core.logging

``stop_worker`` / ``destroy_worker`` / ``destroy_all_workers`` now return
``True`` on success and ``False`` when ``ignore_unreachable=True`` and
the FlowMesh server was unreachable. The internal stdlib WARNING stays
for SDK consumers.

``flowmesh_cli_stack.stack._drain_workers`` checks the return value and,
on a skipped destroy, emits a yellow ``logging.warning`` via the CLI's
own ``flowmesh_cli.core.logging`` (typer-echo) so users running
``stack down`` / ``clean`` / ``reset`` against a half-broken stack see
why teardown skipped workers — the SDK's stdlib warning doesn't
otherwise surface in CLI output.

Tests assert the return value alongside the warning on each swallow
path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
…ss``

The bool return from ``destroy_all_workers`` / ``destroy_worker`` /
``stop_worker`` is the canonical signal — log emission is the consumer's
job. Removes the SDK's stdlib ``logger.warning`` (and the unused
``import logging`` / ``logger = ...`` setup) so the SDK stays purely a
transport layer.

``_drain_workers`` binds the result to ``success`` before checking, both
to make the intent explicit and to give the variable a place where a
future caller could thread additional reactions.

Tests no longer assert log emission; they verify the return value alone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
@timzsu timzsu requested a review from kaiitunnz April 29, 2026 11:40
@timzsu timzsu marked this pull request as ready for review April 29, 2026 11:41
Per review discussion: ``destroy_all_workers`` is the only call that
fits the "best-effort teardown against an absent server" use case (it
runs from ``_drain_workers`` during ``stack down`` / ``clean`` /
``reset``). Per-worker stop / destroy is invoked from user-facing CLI
commands where unreachable should be a hard error — extending the flag
there bloats the API for no real consumer.

``stop_worker`` and ``destroy_worker`` go back to their original
signatures (no flag, no bool return). The flag and the corresponding
tests stay on ``destroy_all_workers``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
@timzsu timzsu changed the title feat: add ignore_unreachable to NodeClient destroy/stop methods feat: add ignore_unreachable to NodeClient.destroy_all_workers Apr 29, 2026
Copy link
Copy Markdown
Collaborator

@kaiitunnz kaiitunnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment.

client = stack_node_client(env_file, base_url=None, token=None)
success = client.destroy_all_workers(ignore_unreachable=True)
if not success:
logging.warning("Server unreachable; skipping worker destruction.")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest keeping the old behavior: ignore exception. log warning, and fall through. Otherwise, you need to catch other exceptions, log properly, and exit.

Per review: keep this PR scoped strictly to the SDK addition. The
``ignore_unreachable=True`` flag is available for downstream callers
(lumilake.optimizer's deploy library) to use; rolling out the new shape
inside FlowMesh's own CLI can land separately if/when wanted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
@timzsu timzsu requested a review from kaiitunnz April 29, 2026 12:39
Copy link
Copy Markdown
Collaborator

@kaiitunnz kaiitunnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@timzsu timzsu merged commit 762f0e2 into main Apr 29, 2026
0 of 9 checks passed
@timzsu timzsu deleted the zsu/destroy-workers-ignore-unreachable branch April 29, 2026 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants