fix: send api key on worker node registration#43
Merged
Conversation
timzsu
requested changes
May 13, 2026
Collaborator
timzsu
left a comment
There was a problem hiding this comment.
LGTM. One potential risk posed by the broad try-except.
As this is a critical bugfix, I suggest we publish a patch release 0.1.1 after this PR.
a54e713 to
2981011
Compare
dede814 to
e5b464e
Compare
`auth_headers` / `add_auth_headers` were defined in `worker/executors/utils/artifacts.py` but are useful anywhere that talks to the FlowMesh server — supervisor self-registration, future SDK callers, etc. Hoist them to `shared.utils.http` so non-worker modules can import them without pulling in the executor stack. Both functions gain an optional `token` arg; the default still reads `FLOWMESH_API_KEY` from the env so the existing zero-arg call sites keep working. Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Sits next to the auth-headers helpers it already uses. No call sites update — `HttpSession` has none today; the move just puts it where future supervisor / shared callers can pick it up without depending on `server.utils.helpers`. `__init__` now goes through `auth_headers(token)` so the bearer formatting stays in one place. Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
`dedup_json`, `restore_json`, `lookup_deduped_json`, `normalize_numbers`, and the private `_restore_deduped_node` were verbatim copies of the canonical versions in `shared.utils.json` (already re-exported via `shared.utils.__init__`). No code imported the server copies. Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Two copies existed: `clients/redis.py` (bare loop) and `utils/helpers.py` (loop wrapped in `try/except (ConnectionError, ValueError, OSError)`). Keep the safer wrapping version on `clients/redis.py` — pubsub iteration belongs with the Redis client — and drop the helpers copy. Pre-existing server-side callers (`registries/node.py`, `services/monitoring.py`) gain the same shutdown-tolerance the supervisor callers already had; both wrappers either had their own outer `try/except` or didn't need the connection-error surface. Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
The helpers-side `get_logger` was a singleton wrapping
`logging.getLogger("server")` with a `RotatingFileHandler` side
effect on first call. Only the supervisor Docker / vastai adapters
called it, both at module-import time — so the first import would
silently spin up a `server.log` handler before `supervisor.py`
configured the real `supervisor` logger.
Replace both call sites with `logging.getLogger("supervisor")`, which
returns the same instance the supervisor entrypoint later attaches
handlers to (no pre-configuration side effects). Drop the helpers
function and its `_logger` global; the canonical logger factory in
`utils/logging.py` is now the only `get_logger` in the package.
Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
`_register_http` posted to `/api/v1/nodes/register` without an `Authorization` header, so a worker node's supervisor couldn't register against a root server with an `IdentityProvider` plugin chain installed — the server's auth chain rejected the unauthenticated POST. Pass `auth_headers()` so the same `FLOWMESH_API_KEY` the rest of the runtime already uses rides on the request. Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
HttpSession had no live callers anywhere in the tree, and its `aiohttp` / `requests` imports leaked into the worker `ssh_executor` import chain — breaking `runtime-worker-core` installs that don't ship aiohttp (the registry would silently drop SSHExecutor via `_safe_import`). Drop it; `shared.utils.http` is back to the auth-header helpers only. Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Pins the contract that worker node registration sends `Authorization: Bearer <FLOWMESH_API_KEY>` when the env var is set, and omits the header otherwise. Guards the regression this branch fixed. Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
e5b464e to
1fa17a4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
A worker node's supervisor couldn't register with a root server when the root had any
IdentityProviderplugin chain installed.Lifecycle._register_httpposted to/api/v1/nodes/registerwithout anAuthorizationheader, so the server's auth chain rejected the unauthenticated POST. Every other supervisor / runtime path already attachesFLOWMESH_API_KEYas a bearer token — registration was the odd one out.This PR ships the fix plus the refactor it landed on top of: the
auth_headers/add_auth_headershelpers (previously buried inworker/executors/utils/artifacts.py) move toshared.utils.httpso the supervisor can reuse them without depending on the executor stack. While there, drop dead duplicates fromserver/utils/helpers.py.Changes
Lifecycle._register_httpnow sendsAuthorization: Bearer ${FLOWMESH_API_KEY}onPOST /api/v1/nodes/register;tests/server/test_supervisor_lifecycle.pypins the header contract for both the set / unset key cases.shared.utils.httpholdsauth_headersandadd_auth_headers. Worker call sites re-import from here; the supervisor reuses the same helpers. Module staysos-only (noaiohttp/requests) so the worker SSH executor import chain stays clean underruntime-worker-core.server/utils/helpers.pycleanup: drop the JSON helpers already duplicated inshared.utils.json, folditer_pubsub_messagesintoclients/redis.py(keeping the safer error-tolerant variant), and dropget_loggerin favor oflogging.getLogger("supervisor")at the two adapter call sites.Design
The bug fix on its own is a two-line change. The refactor isn't strictly required, but
auth_headersis the only obviously-shared piece between_register_httpand the rest of the runtime — leaving it underworker.executors.utils.artifactswould have left the supervisor depending on the executor module just to format a bearer header. Moving it toshared.utils.httpkeeps the fix readable and makes the helper available to anything else that talks to the server. Thehelpers.pycleanups are opportunistic — same file, same review, no extra surface.Test Plan
No live multi-node smoke run on this branch — the fix is mechanical (header attach) and covered by the new lifecycle tests.
Test Result
Pre-submission Checklist
pre-commit run --all-filesand fixed any issues.uv run pytest tests/passes locally.uv sync --all-packages --group ci --frozen).[BREAKING]and described migration steps above.