Skip to content

chore: 2.9.6-rc.0 release with soft_purge fix for hot-path modules#1001

Open
mentels wants to merge 3 commits into
mainfrom
fix/relup-soft-purge-hot-modules
Open

chore: 2.9.6-rc.0 release with soft_purge fix for hot-path modules#1001
mentels wants to merge 3 commits into
mainfrom
fix/relup-soft-purge-hot-modules

Conversation

@mentels
Copy link
Copy Markdown
Contributor

@mentels mentels commented May 22, 2026

Bumps to 2.9.6-rc.0 and switches ClientHandler, DbHandler, ClientHandler.Error, ClientHandler.Checks to soft_purge (PostPurge) in the relup appups.

Under load, the deferred code:purge/1 inside release_handler:make_permanent/1 kills ClientHandlers blocked in :gen_statem.call(db_handler, …) during the suspended-DbHandler window — they have Supavisor.ClientHandler.* frames on their stack and count as "lingerers". soft_purge makes make_permanent skip the kill; both code versions stay resident until the next relup drains the old one. Suspend/resume kept; all other modules stay brutal_purge.

The 2.9.5 relups are also patched in-place so a to-2.9.5 upgrade with the fix remains possible.

Also fixes mix.exs:151 so the appup-copy step survives hyphenated to-versions: the filename split now uses parts: 2, otherwise supavisor-2.9.6-rc.0.appup would parse as ["supavisor", "2.9.6", "rc.0"] and crash the release build with a MatchError.

Plan: tag this branch as v2.9.6-rc.0, validate in staging, then a follow-up commit bumps to v2.9.6.

Under pgbench load, the 2.9.0 -> 2.9.5 soft deploy killed ~170 client
connections at the moment `release_handler:make_permanent/1` ran. The
kill mechanism is `code:purge/1` against ClientHandlers that were
blocked inside `:gen_statem.call(db_handler, ...)` while DbHandler was
sys:suspended -- they have `Supavisor.ClientHandler.*` frames on their
stack and so count as "lingerers" subject to the deferred brutal_purge.

Switching PostPurge to soft_purge for ClientHandler, DbHandler,
ClientHandler.Error, and ClientHandler.Checks makes make_permanent skip
the kill: both code versions stay resident until the next relup naturally
drains them. Suspend/resume of DbHandler is kept -- it's not the source
of the kill.

Applied to relups 2.9.0-2.9.5, 2.9.0-rc.4-2.9.5, 2.9.1-2.9.5, and
2.9.2-2.9.5. The 2.9.4-2.9.5 relup is runtime-config-only and needs no
change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mentels mentels requested a review from a team as a code owner May 22, 2026 10:58
Bumps VERSION to 2.9.6-rc.0 and adds relups/<from>-2.9.6-rc.0/ for each
from-version targeting 2.9.5 (2.9.0, 2.9.0-rc.4, 2.9.1, 2.9.2, 2.9.4).
The new appups are copies of their 2.9.5 counterparts with the outer
release tuple and apply_runtime_config/reconsolidate_inspect to-version
strings updated to "2.9.6-rc.0".

The 2.9.5 relups are left in place so a to-2.9.5 upgrade with the
soft_purge fix remains possible if needed.

Plan: tag this branch as v2.9.6-rc.0, test in staging, then a follow-up
commit bumps to v2.9.6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mentels mentels changed the title fix: soft_purge hot-path modules in 2.9.5 relups chore: 2.9.6-rc.0 release with soft_purge fix for hot-path modules May 22, 2026
mix.exs:151 split the appup filename on every "-" and pattern-matched
exactly two elements, which blows up on filenames like
supavisor-2.9.6-rc.0.appup → ["supavisor", "2.9.6", "rc.0"].

Use parts: 2 so the first "-" is the app/version separator and
everything after stays as the version string.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@v0idpwn v0idpwn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this cause make_permanent to fail if we have processes running old code?

Edit: sorry, realized this doesn't make sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants