Skip to content

taquba-v0.8.0

Latest

Choose a tag to compare

@micllam micllam released this 15 Jun 12:39
· 7 commits to master since this release
73ca489

Added

  • Queue::claim_batch claims up to max_jobs pending jobs in one
    transaction, sharing one claim-lock hold and one commit across the
    batch. Jobs are returned in claim order and share one lease.
    Queue::claim is now a batch of one.
  • Queue::wait_for_jobs_on blocks until a job becomes claimable on
    one queue. Unlike Queue::wait_for_jobs, the wakeup is queue-scoped
    and delivered to one waiter per inserted job.
  • Queue::ack_with acknowledges a job and applies a set of effects in
    the same transaction: follow-up enqueues (AckEffects::enqueues,
    honouring run_at, dedup_key, priority, and id_override per
    request) and caller KV writes and deletes. Either the ack and every
    effect land together or nothing does; when the claim is gone the
    call fails with ClaimLost and applies nothing, so a chained job
    exists only if the settlement that created it won. Queue::ack is
    now ack_with with empty effects.
  • Error::ClaimLost: returned by ack, ack_with, nack,
    dead_letter, and renew_lease when the record's claim is no
    longer present (the lease expired and the reaper requeued the job,
    or the record is a stale copy from before a lease renewal rotated
    the claimed key). These cases previously returned the catch-all
    Error::InvalidState, which remains for genuine misuse (a record
    missing lease_expires_at, requeue_dead_job on a non-dead
    record).
  • Worker::process_with_effects: workers can return AckEffects
    from processing, which run_worker and run_worker_concurrent
    apply atomically with the job's acknowledgement via
    Queue::ack_with. process and process_with_effects default to
    each other; implement exactly one. Existing Worker
    implementations are unaffected.
  • Queue::close persists each queue's claim-scan state (scan bound
    and emptiness marker) under a new cursor: key prefix; the next
    open restores the in-memory state from it and deletes the record. The
    first claim after a clean restart resumes at the recorded bound
    instead of re-scanning the tombstone band left by previously claimed
    jobs, whose cost grows with the band and the store's latency. After
    a crash the record is absent and the first claim falls back to the
    front prefix scan as before.

Changed

  • run_worker no longer exits when settling a job fails. Settlement
    failures (including ClaimLost when a job outlives its lease and
    the reaper requeues it) are logged and the loop continues, matching
    run_worker_concurrent; the redelivered attempt settles the job.
    Claim-path errors still terminate both loops. Both loops log a lost
    claim distinctly from other settlement failures.
  • run_worker_concurrent claims jobs in batches sized to its free
    capacity via Queue::claim_batch, costing one claim transaction
    per batch instead of per job under a backlog. Jobs are still
    processed concurrently and acked individually.
  • Queue::claim_with_wait and the run_worker / run_worker_concurrent
    loops wait on a queue-scoped wakeup that wakes one waiter per
    inserted job, instead of the process-wide notification that woke
    every waiting worker on every insert. A pool of idle workers no
    longer contends on the claim path when a single job arrives, and a
    worker claiming a job passes one wakeup on so a backlog keeps waking
    further workers. Queue::claim_with_wait now also keeps waiting out
    its full max_wait after losing a claim race instead of returning
    None early.
  • Queue::claim commits without awaiting WAL durability. Claims
    serialise per queue through the claim lock, which excluded them from
    WAL group commit: the lock holder awaited its flush before the next
    claim could start, making the flush round trip the queue's claim
    throughput ceiling.
    Losing an unflushed claim in a crash leaves the job pending, so it
    is redelivered immediately on recovery instead of after its lease
    expires; at-least-once delivery is unaffected, and a settled job's
    claim is always durable because later durable commits flush
    preceding WAL entries.
  • The scheduler promotes due jobs without awaiting WAL durability,
    for the same reasons and with the same crash behaviour as the
    reaper change below: a lost promotion leaves the scheduled key in
    place with its run_at in the past, and the next tick re-promotes
    it. A backlog of due jobs (a retry-backoff wave, or scheduled jobs
    accumulated during downtime) no longer promotes at one job per
    flush interval.
  • The reaper requeues and dead-letters expired claims without awaiting
    WAL durability. Each expired claim is processed in its own
    transaction, and awaiting the flush serialised the sweep at one job
    per flush interval (about ten per second at the default 100 ms
    flush). A commit lost in a crash leaves the expired claim in place
    for the next sweep, which re-processes it without consuming an
    attempt, and later durable commits flush preceding WAL entries, so
    a settled job's requeue is durable by ordering.
  • The done and dead-letter retention sweeps delete expired records
    without awaiting WAL durability, for the same reasons as the reaper
    and scheduler changes above: a delete lost in a crash leaves the
    record in place for the next sweep, whose existence re-check keeps
    the rerun idempotent. With this, no background sweep awaits the
    flush; only caller-driven operations do. A retention backlog no
    longer delays the lease reaping that shares its tick.
  • Queue::claim tracks per-queue emptiness and a scan bound in
    process memory. Polling an empty queue answers without a storage
    scan or the claim lock, and the pending tombstone band is never
    re-walked from the front while the process stays up; a full prefix
    scan now happens only on cold start or process restart.
  • Queue stats counter merges are excluded from transaction conflict
    detection. The merges are commutative, so concurrent job-state
    transitions on the same queue no longer abort and retry each other
    over the shared stats keys.

Fixed

  • A pending: insert landing behind the claim cursor while a claim
    was in flight could have its cursor invalidation overwritten by
    that claim's cursor update, hiding the job from cursor scans until
    the queue next drained. The scan bound now moves back to include
    such inserts, and a claim drops its bound advance when the bound
    moved while it ran.
  • A pending: key could be hidden from claims indefinitely when its
    insert committed while a claim was in flight and the key sorted at
    or below the keys that claim advanced the scan bound past. Job ids
    are generated before the enqueue transaction commits, so commit
    order can invert key order under concurrent producers, and a
    requeue (reaper or nack) restores a job at its original key. The
    next claim then recorded emptiness at a valid epoch and the queue
    answered None while live jobs were pending. Bound advances now
    clamp to the smallest key recorded since the bound was observed,
    including when no bound exists yet (the first claim after a
    process restart) and when the key equals the claimed one (the
    claimed job requeued after its lease expired within the claim).
  • Duplicate EnqueueOptions::id_override values are now rejected
    transactionally with Error::DuplicateJobId instead of overwriting
    jobindex:{id} and leaving older queue-state records behind.
  • Queue::ack, Queue::nack, Queue::dead_letter, and
    Queue::renew_lease now check that the expected claimed: record
    still exists before settling a job. A worker finishing after its
    lease was reaped now gets Error::ClaimLost instead of being
    able to ack, retry, dead-letter, renew, or corrupt stats from a
    stale JobRecord.
  • Queue::nack and Queue::renew_lease now retry on transaction
    conflict like Queue::ack and Queue::dead_letter already did.
    A reaper committing the expired-lease delete concurrently with a
    late settlement is now retried (and resolves to Error::ClaimLost
    on the next attempt) instead of surfacing a raw SlateDB transaction
    error to the caller.
  • Queue::requeue_dead_job now checks that the dead-letter record
    still exists before reviving it. Requeueing a stale record after
    dead-letter retention swept it now returns Error::JobNotFound
    instead of recreating the job and corrupting queue stats.