Skip to content

Cap minion AsyncAuth retry loop with auth_tries (#69442)#69443

Merged
dwoz merged 2 commits into
saltstack:3006.xfrom
dwoz:fix/issue-69442
Jul 1, 2026
Merged

Cap minion AsyncAuth retry loop with auth_tries (#69442)#69443
dwoz merged 2 commits into
saltstack:3006.xfrom
dwoz:fix/issue-69442

Conversation

@dwoz

@dwoz dwoz commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Backports the auth_tries outer-loop cap on the minion's
AsyncAuth._authenticate() from 3008.x. When sign_in() keeps returning
the "retry" sentinel, the minion will now bail out of the
authentication loop after auth_tries attempts (default 7) with
SaltClientError("Failed to authenticate with the master after N attempts"), instead of looping silently forever with exponential backoff
up to acceptance_wait_time_max.

auth_tries=0 preserves the legacy "loop forever" behaviour for
operators who explicitly want it. The SAuth.authenticate() synchronous
path is intentionally left unchanged — it is the salt-call / single-shot
CLI codepath and is out of scope for this fix.

What issues does this PR fix or reference?

Fixes #69442

Previous Behavior

On 3006.x and 3007.x, a minion whose sign_in() consistently returns
"retry" (master key not yet accepted, master AES rotation in flight,
multi-master probe against an unreachable peer, etc.) sleeps
acceptance_wait_time between attempts, doubles up to
acceptance_wait_time_max, and never logs an error. The minion appears
stuck with no operator-visible signal.

New Behavior

After auth_tries consecutive "retry" responses, the loop terminates
with a SaltClientError:

salt.exceptions.SaltClientError: Failed to authenticate with the master after 7 attempts

which is then wrapped by salt.channel.client.AsyncPubChannel.connect()
into the user-visible "Unable to sign_in to master: ..." log line. This
matches the behaviour 3008.x has had since the auth_tries cap was
introduced.

Merge requirements satisfied?

  • Docs (no documented behaviour changes; auth_tries is already
    documented and its default of 7 carries over)
  • Changelog (changelog/69442.fixed.md)
  • Tests written/updated
    (tests/pytests/unit/test_crypt.py::test_authenticate_caps_retry_loop_with_auth_tries_69442)

Commits signed with GPG?

No (matching the rest of 3006.x history; let me know if you want this
re-signed.)

@dwoz dwoz requested a review from a team as a code owner June 13, 2026 00:14
@dwoz dwoz added this to the Sulphur v3006.26 milestone Jun 13, 2026
@dwoz dwoz added the test:full Run the full test suite label Jun 13, 2026
@dwoz dwoz force-pushed the fix/issue-69442 branch from 6a74a26 to 053210a Compare June 15, 2026 06:04
@dwoz dwoz force-pushed the fix/issue-69442 branch from 7d0c696 to 86f697d Compare June 25, 2026 22:46
dwoz added a commit to dwoz/salt that referenced this pull request Jun 30, 2026
Per maintainer direction on PR saltstack#69443: on the 3006.x LTS branch, the new
outer-loop cap added in 86f697d must not silently change failure modes
for existing users.  A long-disconnected minion that used to eventually
reconnect should keep doing so on upgrade unless the operator
explicitly opts in to the new safety cap.

Rather than reuse the pre-existing ``auth_tries`` option (which is
already consumed by ``sign_in()`` as the per-request channel-send retry
count, and which would change behavior for every existing 3006.x
installation if we used it for the outer loop too), introduce a new
dedicated ``auth_retries`` minion option:

* Added to ``VALID_OPTS`` and ``DEFAULT_MINION_OPTS`` with default ``0``.
* ``0`` is interpreted by ``AsyncAuth._authenticate()`` as "unlimited /
  preserve the pre-3006.26 behavior of retrying forever".
* A positive integer caps the outer loop at that many attempts and
  surfaces a ``SaltClientError("Failed to authenticate with the master
  after N attempts")``.

Tests
-----
* Renamed ``test_authenticate_caps_retry_loop_with_auth_tries_69442``
  to ``..._with_auth_retries_69442`` and switched the opts key.
* Added ``test_authenticate_default_does_not_cap_retry_loop_69442``
  asserting that with no ``auth_retries`` set the loop runs well past
  any plausible small finite cap and the resulting error is the generic
  "Attempt to authenticate ... failed" rather than the cap-specific
  "...after N attempts".

Refs saltstack#69442
dwoz and others added 2 commits June 30, 2026 15:03
The minion's AsyncAuth._authenticate() outer loop on 3006.x and 3007.x
keeps calling sign_in() forever whenever the master answers with the
"retry" sentinel (key not yet accepted, master AES rotation in flight,
multi-master probe). The minion sleeps acceptance_wait_time between
attempts, doubling up to acceptance_wait_time_max, and never surfaces
an error: no log, no traceback, just a stuck minion.

3008.x already caps this loop using the existing auth_tries option
(default 7); backport the same guard so the minion bails out of
_authenticate() with SaltClientError("Failed to authenticate with the
master after N attempts") once auth_tries iterations have been spent
returning "retry". auth_tries=0 keeps the old "loop forever" behavior
for operators who actually want it.

The synchronous SAuth.authenticate() path is intentionally left
unchanged: that is a separate code path used by salt-call and other
single-shot CLI flows, and its existing semantics are out of scope for
this fix.

Fixes saltstack#69442
Per maintainer direction on PR saltstack#69443: on the 3006.x LTS branch, the new
outer-loop cap added in 86f697d must not silently change failure modes
for existing users.  A long-disconnected minion that used to eventually
reconnect should keep doing so on upgrade unless the operator
explicitly opts in to the new safety cap.

Rather than reuse the pre-existing ``auth_tries`` option (which is
already consumed by ``sign_in()`` as the per-request channel-send retry
count, and which would change behavior for every existing 3006.x
installation if we used it for the outer loop too), introduce a new
dedicated ``auth_retries`` minion option:

* Added to ``VALID_OPTS`` and ``DEFAULT_MINION_OPTS`` with default ``0``.
* ``0`` is interpreted by ``AsyncAuth._authenticate()`` as "unlimited /
  preserve the pre-3006.26 behavior of retrying forever".
* A positive integer caps the outer loop at that many attempts and
  surfaces a ``SaltClientError("Failed to authenticate with the master
  after N attempts")``.

Tests
-----
* Renamed ``test_authenticate_caps_retry_loop_with_auth_tries_69442``
  to ``..._with_auth_retries_69442`` and switched the opts key.
* Added ``test_authenticate_default_does_not_cap_retry_loop_69442``
  asserting that with no ``auth_retries`` set the loop runs well past
  any plausible small finite cap and the resulting error is the generic
  "Attempt to authenticate ... failed" rather than the cap-specific
  "...after N attempts".

Refs saltstack#69442
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:full Run the full test suite

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant