Skip to content

chore: add PyPI release workflow#26

Merged
kaiitunnz merged 25 commits into
mainfrom
kaiitunnz/chore/release
May 12, 2026
Merged

chore: add PyPI release workflow#26
kaiitunnz merged 25 commits into
mainfrom
kaiitunnz/chore/release

Conversation

@kaiitunnz
Copy link
Copy Markdown
Collaborator

@kaiitunnz kaiitunnz commented May 8, 2026

Purpose

Add the release automation needed to publish FlowMesh's lightweight PyPI distributions and its GHCR container images from immutable release tags. The release pipeline builds and verifies every published distribution, validates synchronized versions and internal pins, smoke-tests the umbrella extras, publishes through PyPI Trusted Publishing, then verifies the matching container images and retags them as :latest with downgrade protection.

Changes

  • PyPI release workflow (.github/workflows/release.yml) — build/verify and publish jobs, manual TestPyPI/PyPI dispatch, package_pattern filter for staggered Trusted Publisher onboarding, twine check distribution metadata validation, and a tag-reachability guard that aborts unless the release tag is an ancestor of origin/main.
  • Image release workflow (.github/workflows/release-images.yml) — verify-and-retag pipeline for the six GHCR images: anonymous read of each published ref, multi-arch manifest + OCI version/revision label assertions against the release tag and commit SHA, then a :latest retag with downgrade-protection (overrideable via force_latest dispatch input). Idempotently appends a digest table to the GitHub Release body via sentinel markers.
  • Release helpers (scripts/ci/, scripts/dev/) — check_release_version.py (synchronized versions + internal pin agreement), check_image_release.py (manifest/label verifier), retag_image_release.py (downgrade-protected :latest retag, accepts OCI indexes and legacy Docker manifest lists), append_release_digests.py (Release-body digest table writer with null-body normalization), and bump_version.py (synchronized version + pin bump helper).
  • CLI override flags (cli/stack/src/flowmesh_cli_stack/stack.py, docs/CLI.md) — --image-tag and --build-ref on flowmesh stack build / stack push, applied after load_env so they actually override the .env value. Matches the pre-existing --image-tag pattern on pull / up / down.
  • Release docs (docs/RELEASE.md, CONTRIBUTING.md) — PyPI Trusted Publishing setup, release prep, TestPyPI rehearsal, PyPI publishing, image push from the build host, GHCR first-time setup (visibility + repo link + ghcr environment), install verification, and an "If a release goes wrong" recovery section (PyPI immutability, yank, next-patch, .postN, image-only re-run).
  • Tests (tests/scripts/test_append_release_digests.py) — regression coverage for _read_current_body's null / empty / whitespace normalization and the sentinel strip-and-replace contract.
  • Packaging metadata (root + sdk/, sdk/stack/, cli/, cli/stack/, hook/ pyproject.toml, plus LICENSE copies under each sub-package) — bump [build-system].requires to setuptools>=77, switch to SPDX-string license = "Apache-2.0", declare license-files = ["LICENSE"]. Each published wheel now carries its own Apache-2.0 text under *.dist-info/licenses/LICENSE and declares License-Expression: Apache-2.0 instead of License: UNKNOWN.

Design

  • Tag-driven, immutable inputs. Both workflows check out the release tag and refuse to proceed if it isn't reachable from origin/main. PyPI versions and GHCR image refs are derived from the tag, never from main.
  • Publishing split from PR validation. package-build.yml still smoke-tests PRs; release.yml owns tag-based artifacts and PyPI upload. release-images.yml is the post-publish auditor for the GHCR image set built from a GPU build host outside CI (no self-hosted runners on the public repo).
  • Trusted Publishing via OIDC. GitHub pypi / testpypi environments host the OIDC claim; production publishing can require manual approval without static upload tokens. A ghcr environment gates :latest retag with the same approval model.
  • Synchronized version policy, explicit. check_release_version.py fails if any of the six package versions diverge from the vX.Y.Z tag or if any first-party dependency pin still points at an older version. bump_version.py applies the same policy locally so prep is mechanical.
  • PEP 440-aware :latest policy. _is_release (in check_image_release.py) gates :latest to non-prerelease, non-dev, non-local versions; .postN is eligible. retag_image_release.py parses the existing :latest's OCI image.version label and refuses to retag when it points at a newer or equal version (overrideable with --force). A transient inspect failure aborts rather than silently disabling the guard.
  • Per-distribution license metadata. flowmesh[sdk] / flowmesh[cli] / flowmesh[hook] resolve to multiple separately-published wheels; each declares Apache-2.0 and ships LICENSE text for Apache § 4 redistribution and so PyPI / audit tools don't report License: UNKNOWN.

Test Plan

uv run python scripts/ci/check_release_version.py --tag v0.1.0
uv run python scripts/dev/bump_version.py 0.1.0 --check
uv run pre-commit run --all-files
uv run pytest tests --ignore=tests/worker/test_mp_executor_cleanup_gpu.py
uv build --all-packages --out-dir /tmp/flowmesh-release-dist
uv run python scripts/ci/check_package_build.py --dist /tmp/flowmesh-release-dist
uvx twine==6.2.0 check /tmp/flowmesh-release-dist/*
for w in /tmp/flowmesh-release-dist/*.whl; do
  unzip -l "$w" | grep "licenses/LICENSE"
  unzip -p "$w" "*/METADATA" | grep -E '^License(-Expression|-File)?:'
done

Test Result

  • check_release_version.py --tag v0.1.0 → exit 0, "Release package versions are synchronized at 0.1.0".
  • bump_version.py 0.1.0 --check → exit 0, "Package versions are already set to 0.1.0".
  • pre-commit run --all-files → all hooks pass.
  • pytest tests --ignore=tests/worker/test_mp_executor_cleanup_gpu.py → 759 passed.
  • uv build --all-packages → produced all six distributions (flowmesh, flowmesh-sdk, flowmesh-sdk-stack, flowmesh-cli, flowmesh-cli-stack, flowmesh-hook) as .whl + .tar.gz.
  • check_package_build.py --dist <dist> → smoke tests pass for flowmesh[sdk], flowmesh[hook], flowmesh[cli]; flowmesh --help runs from the built CLI wheel.
  • twine check dist/* → all distributions pass metadata validation.
  • License audit: every wheel ships *.dist-info/licenses/LICENSE (11351 bytes, root Apache-2.0 text) and declares License-Expression: Apache-2.0 + License-File: LICENSE in METADATA.

Pre-submission Checklist
  • I have read the contribution guidelines.
  • I have run pre-commit run --all-files and fixed any issues.
  • I have added or updated tests covering my changes (if applicable).
  • I have verified that uv run pytest tests/ passes locally.
  • If I changed shared schemas or proto definitions, I have checked downstream compatibility across Server and Worker.
  • If I changed the SDK or CLI, I have verified the affected packages work (uv sync --all-packages --group ci --frozen).
  • If this is a breaking change, I have prefixed the PR title with [BREAKING] and described migration steps above.
  • I have updated documentation or config examples if user-facing behavior changed.

@kaiitunnz kaiitunnz force-pushed the kaiitunnz/feat/packages branch from b80547a to f5164e5 Compare May 10, 2026 11:16
@kaiitunnz kaiitunnz force-pushed the kaiitunnz/chore/release branch from ce318bd to 09a5769 Compare May 10, 2026 11:16
@kaiitunnz kaiitunnz force-pushed the kaiitunnz/chore/release branch from 09a5769 to 1ecc94b Compare May 10, 2026 13:05
Base automatically changed from kaiitunnz/feat/packages to main May 11, 2026 07:24
@kaiitunnz kaiitunnz force-pushed the kaiitunnz/chore/release branch 5 times, most recently from 62d4e4d to 516af19 Compare May 12, 2026 06:55
kaiitunnz added 16 commits May 12, 2026 14:59
Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
…ishing

Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
`flowmesh stack build` and `flowmesh stack push` had no way to override
`FLOWMESH_VERSION` / `FLOWMESH_BUILD_REF` per invocation. Setting them in
the shell did not take effect because `_run_bake` calls `load_env` before
reading `os.getenv`, and `load_env` overwrites `os.environ` from the env
file — so a stale `.env` value clobbered the shell value. The flags now
act as explicit overrides applied after the env-file load, matching the
`--image-tag` pattern already on `pull`/`up`/`down`.

Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
The release flow ships container images alongside PyPI distributions, but
the actual `flowmesh stack push` happens on a GPU build host outside CI
(no self-hosted runners on the public repo). `release-images.yml` is the
post-publish auditor: anonymously reads each of the six expected refs
from GHCR, asserts the multi-arch index, OCI version label, and revision
label match the GitHub Release tag and the tagged commit, then retags
the set as `:latest` from the `ghcr` environment with a downgrade guard
on existing `:latest`. The digest table is appended to the Release body
idempotently via sentinel-marker replacement. Verifier accepts both OCI
indexes and legacy Docker manifest lists so it doesn't reject valid
buildx output. Retag aborts on unexpected inspect failures rather than
silently treating them as a missing `:latest`.

Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
@kaiitunnz kaiitunnz force-pushed the kaiitunnz/chore/release branch from 5943b8e to 95d1e2b Compare May 12, 2026 15:01
kaiitunnz added 3 commits May 12, 2026 15:16
Zizmor's `template-injection` pedantic check flagged the `docker login`
step for expanding `${{ secrets.GITHUB_TOKEN }}` and `${{ github.actor }}`
directly into the shell script. Bind both to step-level env vars and
reference them as `$GHCR_TOKEN` / `$GHCR_USER` so the runner doesn't
splice attacker-controllable text into the script body.

Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Both check_release_version.py and bump_version.py hand-rolled their own
version regex, which disagreed on whether the leading 'v' was required
and accepted strings PEP 440 itself would reject. Route both through
packaging.version.Version: tags and pyproject versions are parsed and
compared as Version objects, with directed "not PEP 440" errors on
malformed inputs.

Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Catch InvalidVersion on the input --tag so a non-PEP-440 tag exits with
a directed "::error::tag <x> is not PEP 440" rather than an uncaught
stacktrace. Expand the warning printed when --force suppresses a
TransientInspectError to spell out that the downgrade guard is OFF and
that :latest may silently move backward — the previous wording read as
"just a transient blip, proceeding" and hid the real consequence from
the operator.

Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
kaiitunnz added 3 commits May 12, 2026 15:26
A typo like NODE_ROLE=worke on a worker node previously fell through
to the "default to root" branch, deploying Redis containers and routing
as a root node. Treat unset as root (unchanged), but raise typer.Exit
with a directed error on any other unrecognized value.

Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Pin the three return states of _existing_latest_version (missing
reference, transient inspect failure, parsed Version) plus the
label-parsing branches that distinguish "no version label" from "label
is not PEP 440". Add a parametrized matrix over _is_release covering
plain releases, post-releases (eligible) and pre/dev/local/non-PEP-440
tags (rejected).

Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Note in docs/CLI.md that .env values always win over shell-set
FLOWMESH_VERSION / FLOWMESH_BUILD_REF, so --image-tag / --build-ref are
the only way to override without editing the file. Document in the
release.yml package_pattern input help that patterns are split on
commas and therefore cannot contain a literal comma.

Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
@kaiitunnz kaiitunnz marked this pull request as ready for review May 12, 2026 15:27
@kaiitunnz kaiitunnz requested a review from timzsu May 12, 2026 15:27
Copy link
Copy Markdown
Collaborator

@timzsu timzsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two inline comments, plus this test coverage issue:

The release script tests cover the helper predicates well, but the main release-safety paths are still lightly tested. It would be useful to add mocked tests for check_image_release._check_target() and retag_image_release._plan_retags(), especially around missing labels, platform mismatch, and newer/equal/older :latest behavior. That would also make the builder-label issue easier to catch in unit tests.

Comment thread .github/workflows/release.yml Outdated
Comment thread scripts/ci/check_image_release.py
kaiitunnz added 3 commits May 12, 2026 16:20
The cuda.builder Dockerfile was the only published target missing the
ARG/LABEL block, so release-images.yml's _check_target would have
failed verification for flowmesh_worker_builder:<tag>-gpu on its per-
platform image.version / image.revision asserts. Match the pattern
already used by Dockerfile.cuda, Dockerfile.cpu, Dockerfile.ssh.*,
and the server Dockerfile: declare BUILD_VERSION / BUILD_REF /
BUILD_CREATED ARGs (the bake target already passes them) and emit the
three opencontainers labels.

Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
filter_distributions.py replaces the inline bash glob filter in
release.yml. fnmatch.fnmatchcase keeps the same matching semantics
without the IFS / case quirks, and the helper is now unit-testable for
empty pattern sets, zero-match outcomes, and whitespace-only patterns
— all paths that would have silently misbehaved in shell.

Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Pin the release-safety paths flagged in review. _check_target tests
exercise OCI-vs-Docker manifest-list acceptance, non-index mediatype
rejection, platform-set mismatch, attestation-manifest filtering, and
per-platform image.version / image.revision drift. _plan_retags tests
walk older / equal / newer :latest, --force override, MissingVersionLabel
and TransientInspectError both with and without --force, and the
invalid-tag exit.

Signed-off-by: Noppanat Wadlom <noppanat.wad@gmail.com>
Copy link
Copy Markdown
Collaborator

@timzsu timzsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@kaiitunnz kaiitunnz merged commit eaeca41 into main May 12, 2026
11 checks passed
@kaiitunnz kaiitunnz deleted the kaiitunnz/chore/release branch May 12, 2026 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants