Skip to content

Per-job AllowPromotion opts out of ZADD GT guard#4

Merged
nyergler merged 7 commits into
masterfrom
allow-promotion-flag
May 14, 2026
Merged

Per-job AllowPromotion opts out of ZADD GT guard#4
nyergler merged 7 commits into
masterfrom
allow-promotion-flag

Conversation

@nyergler
Copy link
Copy Markdown
Member

Enqueue and PromoteJob have both used ZADD XX GT since #3 to preserve the deferral guarantee for jobs enqueued with deterministic IDs: once a schedule sits at time T in the future, a duplicate enqueue at T' < T must be a no-op, and PromoteJob must not demote a job whose score has been bumped to now + InvisibleSec by Dequeue. That is the right semantic for dedup-style jobs that may race their own re-enqueue.

It is the wrong semantic for two cases that have grown up around the queue since:

  1. Worker retry rescheduling. When a handler returns an error the retry middleware computes a backoff delay and calls Enqueue with score = now + delay. With GT, that score is rejected because the Dequeue invisibility mark (now + InvisibleSec, typically 60s) is greater. The configured Backoff is effectively dead for any value less than InvisibleSec; gated handlers re-run only on the InvisibleSec cadence regardless of how short the backoff is.

  2. Subqueue PromoteOnAck. The subqueue middleware advances the next gated job after the prior handler Acks by calling PromoteJob on that job's ID. With GT, the score (still sitting at the InvisibleSec mark from its last dequeue/gated cycle) is also rejected and the next job continues to wait out its full invisibility window.

Add a per-job AllowPromotion flag. Default false preserves today's GT semantics so dedup-deferral jobs are unaffected; setting true causes Enqueue to use plain ZADD XX so backoff can lower the score, and causes PromoteJob to use ZADD XX (without GT) so the next gated subqueue entry can be advanced. The flag rides in the job's msgpack storage so it survives the worker retry round-trip without callers having to track it across Enqueue/PromoteJob boundaries.

The Enqueue Lua script splits the per-job arg list into two ZADD calls (one with gt, one without) so a mixed BulkEnqueue stays atomic. PromoteJob does an HGET to read the flag before issuing the ZADD; this adds one round-trip per promotion but keeps the API stable.

nyergler added 7 commits May 13, 2026 16:58
Enqueue and PromoteJob have both used ZADD XX GT since #3 to preserve
the deferral guarantee for jobs enqueued with deterministic IDs: once a
schedule sits at time T in the future, a duplicate enqueue at T' < T
must be a no-op, and PromoteJob must not demote a job whose score has
been bumped to now + InvisibleSec by Dequeue. That is the right
semantic for dedup-style jobs that may race their own re-enqueue.

It is the wrong semantic for two cases that have grown up around the
queue since:

1. Worker retry rescheduling. When a handler returns an error the
   retry middleware computes a backoff delay and calls Enqueue with
   score = now + delay. With GT, that score is rejected because the
   Dequeue invisibility mark (now + InvisibleSec, typically 60s) is
   greater. The configured Backoff is effectively dead for any value
   less than InvisibleSec; gated handlers re-run only on the
   InvisibleSec cadence regardless of how short the backoff is.

2. Subqueue PromoteOnAck. The subqueue middleware advances the next
   gated job after the prior handler Acks by calling PromoteJob on
   that job's ID. With GT, the score (still sitting at the
   InvisibleSec mark from its last dequeue/gated cycle) is also
   rejected and the next job continues to wait out its full
   invisibility window.

Add a per-job AllowPromotion flag. Default false preserves today's GT
semantics so dedup-deferral jobs are unaffected; setting true causes
Enqueue to use plain ZADD XX so backoff can lower the score, and
causes PromoteJob to use ZADD XX (without GT) so the next gated
subqueue entry can be advanced. The flag rides in the job's msgpack
storage so it survives the worker retry round-trip without callers
having to track it across Enqueue/PromoteJob boundaries.

The Enqueue Lua script splits the per-job arg list into two ZADD
calls (one with gt, one without) so a mixed BulkEnqueue stays atomic.
PromoteJob does an HGET to read the flag before issuing the ZADD;
this adds one round-trip per promotion but keeps the API stable.
AllowPromotion is consulted only server-side (by Enqueue and PromoteJob),
so rehydrating it onto jobs returned by Dequeue/BulkFind added a brittle
fields[0].(string) assertion and a wider Lua-vs-Go contract for no
benefit. Drop the rehydration; the Lua scripts return jobm strings as
before, and the Go side reverts to its prior simple form.

PromoteJob now runs as a single Lua script (HGET allow_promotion + ZADD
XX [GT]) instead of two round trips. This closes a narrow race where an
Ack + re-enqueue between the HGET and ZADD could let a stale
"AllowPromotion=true" read demote a freshly-enqueued job, and matches
the atomic style of every other queue op.
Reset previously called FlushAll, which wiped the entire Redis DB.
Because go test ./... runs packages in parallel against a shared
Redis instance, one package's Reset could erase another package's
in-flight data mid-test, surfacing as flakes (most visibly the
100k-job bulk-enqueue tests).

Reset now takes namespaces and scans+deletes only matching keys,
and each test package uses a distinct namespace so parallel
packages no longer collide.
@nyergler nyergler merged commit 33d8963 into master May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant