feat: add ignore_unreachable to NodeClient.destroy_all_workers#2
Merged
Conversation
Teardown flows that need to tolerate a never-reachable FlowMesh server (e.g. a previous ``stack up`` that failed before the server was healthy) currently have to wrap the destroy call in try/except. Lift that into a ``ignore_unreachable`` keyword on ``destroy_all_workers`` itself: when True, ``FlowMeshConnectionError`` is swallowed and the method returns cleanly. There are no workers to destroy when the server isn't there anyway. Other errors (auth, 5xx) still propagate so genuine misconfiguration stays loud. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
…ten _drain_workers ``NodeClient.destroy_worker`` and ``NodeClient.stop_worker`` gain the same ``ignore_unreachable`` flag as ``destroy_all_workers``: when ``True``, ``FlowMeshConnectionError`` is swallowed and the method returns cleanly; auth, 5xx, and other errors still propagate. Two new tests per method cover both branches. ``flowmesh_cli_stack.stack._drain_workers`` switches from a broad ``except Exception`` (which silently swallowed auth / 5xx / programming bugs) to ``destroy_all_workers(ignore_unreachable=True)``. Connection- down is still tolerated during ``stack down`` / ``clean`` / ``reset``, but real misconfiguration now surfaces instead of being logged and ignored. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
…unreachable Brings ``destroy_all_workers``'s docstring back to the same shape as ``destroy_worker`` / ``stop_worker`` — the rationale belongs in the PR description, not three places in the source. When ``ignore_unreachable=True`` swallows a ``FlowMeshConnectionError``, each method now logs a WARNING via ``flowmesh_stack.node_client``'s module logger so teardown flows that previously emitted a "skipping" warning at the call site keep that visibility. Tests assert the warning is emitted on each swallow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
….core.logging ``stop_worker`` / ``destroy_worker`` / ``destroy_all_workers`` now return ``True`` on success and ``False`` when ``ignore_unreachable=True`` and the FlowMesh server was unreachable. The internal stdlib WARNING stays for SDK consumers. ``flowmesh_cli_stack.stack._drain_workers`` checks the return value and, on a skipped destroy, emits a yellow ``logging.warning`` via the CLI's own ``flowmesh_cli.core.logging`` (typer-echo) so users running ``stack down`` / ``clean`` / ``reset`` against a half-broken stack see why teardown skipped workers — the SDK's stdlib warning doesn't otherwise surface in CLI output. Tests assert the return value alongside the warning on each swallow path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
…ss`` The bool return from ``destroy_all_workers`` / ``destroy_worker`` / ``stop_worker`` is the canonical signal — log emission is the consumer's job. Removes the SDK's stdlib ``logger.warning`` (and the unused ``import logging`` / ``logger = ...`` setup) so the SDK stays purely a transport layer. ``_drain_workers`` binds the result to ``success`` before checking, both to make the intent explicit and to give the variable a place where a future caller could thread additional reactions. Tests no longer assert log emission; they verify the return value alone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Per review discussion: ``destroy_all_workers`` is the only call that fits the "best-effort teardown against an absent server" use case (it runs from ``_drain_workers`` during ``stack down`` / ``clean`` / ``reset``). Per-worker stop / destroy is invoked from user-facing CLI commands where unreachable should be a hard error — extending the flag there bloats the API for no real consumer. ``stop_worker`` and ``destroy_worker`` go back to their original signatures (no flag, no bool return). The flag and the corresponding tests stay on ``destroy_all_workers``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
kaiitunnz
requested changes
Apr 29, 2026
| client = stack_node_client(env_file, base_url=None, token=None) | ||
| success = client.destroy_all_workers(ignore_unreachable=True) | ||
| if not success: | ||
| logging.warning("Server unreachable; skipping worker destruction.") |
Collaborator
There was a problem hiding this comment.
I suggest keeping the old behavior: ignore exception. log warning, and fall through. Otherwise, you need to catch other exceptions, log properly, and exit.
Per review: keep this PR scoped strictly to the SDK addition. The ``ignore_unreachable=True`` flag is available for downstream callers (lumilake.optimizer's deploy library) to use; rolling out the new shape inside FlowMesh's own CLI can land separately if/when wanted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Teardown flows currently wrap
destroy_all_workers()in their owntry/except FlowMeshConnectionErrorso they don't crash when astack upfailed before the server became reachable. Lift that pattern into the SDK as an opt-in keyword.Changes
sdk/stack/src/flowmesh_stack/node_client.py—NodeClient.destroy_all_workersacceptsignore_unreachable: bool = Falseand returnsbool. WithTrue,FlowMeshConnectionErroris swallowed and the call returnsFalse. Other errors propagate either way.tests/sdk/test_node_client.py— covers both branches: default re-raisesFlowMeshConnectionError;ignore_unreachable=TruereturnsFalse.Design
Keyword-only flag follows
pathlib.Path.unlink(missing_ok=True). Bool return is the canonical "did it reach the server?" signal; the SDK doesn't log itself so callers control messaging.Test Plan
Test Result
Pre-submission Checklist
CONTRIBUTING.md(orAGENTS.mdif noCONTRIBUTING.md).uv run pre-commit run --all-filesand fixed any issues.[BREAKING]and described migration steps above.