Releases · micllam/taquba

15 Jun 12:46

micllam

taquba-workflow-v0.6.0

73ca489

taquba-workflow-v0.6.0

Changed

Terminal marker filenames lead with an inverted timestamp, and the
memo-retention sweep lists only expired markers (via the
object-store list_with_offset contract) instead of every retained
marker on every tick, so a sweep's listing cost is proportional to
the expired set. MemoStore::list_expired_terminal_markers is the
new sweeper building block; list_terminal_markers remains for
inspection. Markers written by earlier versions are not recognised
by the sweeper: when upgrading a store that ran with
memo_retention enabled, clear the <memo_prefix>/terminals/
prefix out-of-band.
Step transitions settle atomically. The next step's enqueue (for
Continue / ContinueAfter) and the terminal run-record delete
(for terminal outcomes) now join the current step's acknowledgement
transaction via Taquba's ack_with, halving the durable commits
per transition and removing the crash window between enqueuing the
next step and acking the current one: a step's successor exists if
and only if the step's settlement committed. The terminal hook now
fires before the settlement commits rather than after the run-record
delete; hooks remain at-least-once as before.

Fixed

WorkflowRuntime::submit no longer serialises every submission on a
process-wide lock held across queue I/O. The duplicate-check lock is
now per run id, so concurrent submissions of distinct runs proceed in
parallel and share WAL group commits. Previously a batch of
submissions completed at one run per flush interval regardless of
submission concurrency (about ten runs per second at SlateDB's
default 100 ms flush); same-run-id submissions keep their existing
duplicate and input-mismatch semantics.

Added

WorkflowRuntimeBuilder::step_output_replay: opt-in
content-addressed replay of runner-returned step outcomes, keyed by
(run_id, step_number, SHA-256(step payload)). When enabled, the
runtime persists every StepOutcome the runner returns (including
Fail and Cancel) before applying it; if the same step is delivered
again after a crash before ack, the stored outcome is replayed without
invoking the runner again. Step errors are not recorded, so retries
still invoke the runner. A replayed ContinueAfter reduces its delay
by the time already elapsed since the outcome was stored, preserving
the original schedule.
Memo::content_get and Memo::content_put derive per-step memo keys
from a MessagePack serialization of caller-supplied input hashed with
SHA-256.

Assets 2

15 Jun 12:43

micllam

taquba-webhooks-v0.3.0

73ca489

taquba-webhooks-v0.3.0

Changed

Raised the minimum taquba requirement to 0.8.

Assets 2

15 Jun 12:39

micllam

taquba-v0.8.0

73ca489

taquba-v0.8.0 Latest

Latest

Added

Queue::claim_batch claims up to max_jobs pending jobs in one
transaction, sharing one claim-lock hold and one commit across the
batch. Jobs are returned in claim order and share one lease.
Queue::claim is now a batch of one.
Queue::wait_for_jobs_on blocks until a job becomes claimable on
one queue. Unlike Queue::wait_for_jobs, the wakeup is queue-scoped
and delivered to one waiter per inserted job.
Queue::ack_with acknowledges a job and applies a set of effects in
the same transaction: follow-up enqueues (AckEffects::enqueues,
honouring run_at, dedup_key, priority, and id_override per
request) and caller KV writes and deletes. Either the ack and every
effect land together or nothing does; when the claim is gone the
call fails with ClaimLost and applies nothing, so a chained job
exists only if the settlement that created it won. Queue::ack is
now ack_with with empty effects.
Error::ClaimLost: returned by ack, ack_with, nack,
dead_letter, and renew_lease when the record's claim is no
longer present (the lease expired and the reaper requeued the job,
or the record is a stale copy from before a lease renewal rotated
the claimed key). These cases previously returned the catch-all
Error::InvalidState, which remains for genuine misuse (a record
missing lease_expires_at, requeue_dead_job on a non-dead
record).
Worker::process_with_effects: workers can return AckEffects
from processing, which run_worker and run_worker_concurrent
apply atomically with the job's acknowledgement via
Queue::ack_with. process and process_with_effects default to
each other; implement exactly one. Existing Worker
implementations are unaffected.
Queue::close persists each queue's claim-scan state (scan bound
and emptiness marker) under a new cursor: key prefix; the next
open restores the in-memory state from it and deletes the record. The
first claim after a clean restart resumes at the recorded bound
instead of re-scanning the tombstone band left by previously claimed
jobs, whose cost grows with the band and the store's latency. After
a crash the record is absent and the first claim falls back to the
front prefix scan as before.

Changed

run_worker no longer exits when settling a job fails. Settlement
failures (including ClaimLost when a job outlives its lease and
the reaper requeues it) are logged and the loop continues, matching
run_worker_concurrent; the redelivered attempt settles the job.
Claim-path errors still terminate both loops. Both loops log a lost
claim distinctly from other settlement failures.
run_worker_concurrent claims jobs in batches sized to its free
capacity via Queue::claim_batch, costing one claim transaction
per batch instead of per job under a backlog. Jobs are still
processed concurrently and acked individually.
Queue::claim_with_wait and the run_worker / run_worker_concurrent
loops wait on a queue-scoped wakeup that wakes one waiter per
inserted job, instead of the process-wide notification that woke
every waiting worker on every insert. A pool of idle workers no
longer contends on the claim path when a single job arrives, and a
worker claiming a job passes one wakeup on so a backlog keeps waking
further workers. Queue::claim_with_wait now also keeps waiting out
its full max_wait after losing a claim race instead of returning
None early.
Queue::claim commits without awaiting WAL durability. Claims
serialise per queue through the claim lock, which excluded them from
WAL group commit: the lock holder awaited its flush before the next
claim could start, making the flush round trip the queue's claim
throughput ceiling.
Losing an unflushed claim in a crash leaves the job pending, so it
is redelivered immediately on recovery instead of after its lease
expires; at-least-once delivery is unaffected, and a settled job's
claim is always durable because later durable commits flush
preceding WAL entries.
The scheduler promotes due jobs without awaiting WAL durability,
for the same reasons and with the same crash behaviour as the
reaper change below: a lost promotion leaves the scheduled key in
place with its run_at in the past, and the next tick re-promotes
it. A backlog of due jobs (a retry-backoff wave, or scheduled jobs
accumulated during downtime) no longer promotes at one job per
flush interval.
The reaper requeues and dead-letters expired claims without awaiting
WAL durability. Each expired claim is processed in its own
transaction, and awaiting the flush serialised the sweep at one job
per flush interval (about ten per second at the default 100 ms
flush). A commit lost in a crash leaves the expired claim in place
for the next sweep, which re-processes it without consuming an
attempt, and later durable commits flush preceding WAL entries, so
a settled job's requeue is durable by ordering.
The done and dead-letter retention sweeps delete expired records
without awaiting WAL durability, for the same reasons as the reaper
and scheduler changes above: a delete lost in a crash leaves the
record in place for the next sweep, whose existence re-check keeps
the rerun idempotent. With this, no background sweep awaits the
flush; only caller-driven operations do. A retention backlog no
longer delays the lease reaping that shares its tick.
Queue::claim tracks per-queue emptiness and a scan bound in
process memory. Polling an empty queue answers without a storage
scan or the claim lock, and the pending tombstone band is never
re-walked from the front while the process stays up; a full prefix
scan now happens only on cold start or process restart.
Queue stats counter merges are excluded from transaction conflict
detection. The merges are commutative, so concurrent job-state
transitions on the same queue no longer abort and retry each other
over the shared stats keys.

Fixed

A pending: insert landing behind the claim cursor while a claim
was in flight could have its cursor invalidation overwritten by
that claim's cursor update, hiding the job from cursor scans until
the queue next drained. The scan bound now moves back to include
such inserts, and a claim drops its bound advance when the bound
moved while it ran.
A pending: key could be hidden from claims indefinitely when its
insert committed while a claim was in flight and the key sorted at
or below the keys that claim advanced the scan bound past. Job ids
are generated before the enqueue transaction commits, so commit
order can invert key order under concurrent producers, and a
requeue (reaper or nack) restores a job at its original key. The
next claim then recorded emptiness at a valid epoch and the queue
answered None while live jobs were pending. Bound advances now
clamp to the smallest key recorded since the bound was observed,
including when no bound exists yet (the first claim after a
process restart) and when the key equals the claimed one (the
claimed job requeued after its lease expired within the claim).
Duplicate EnqueueOptions::id_override values are now rejected
transactionally with Error::DuplicateJobId instead of overwriting
jobindex:{id} and leaving older queue-state records behind.
Queue::ack, Queue::nack, Queue::dead_letter, and
Queue::renew_lease now check that the expected claimed: record
still exists before settling a job. A worker finishing after its
lease was reaped now gets Error::ClaimLost instead of being
able to ack, retry, dead-letter, renew, or corrupt stats from a
stale JobRecord.
Queue::nack and Queue::renew_lease now retry on transaction
conflict like Queue::ack and Queue::dead_letter already did.
A reaper committing the expired-lease delete concurrently with a
late settlement is now retried (and resolves to Error::ClaimLost
on the next attempt) instead of surfacing a raw SlateDB transaction
error to the caller.
Queue::requeue_dead_job now checks that the dead-letter record
still exists before reviving it. Requeueing a stale record after
dead-letter retention swept it now returns Error::JobNotFound
instead of recreating the job and corrupting queue stats.

Assets 2

15 Jun 12:45

micllam

taquba-jobs-v0.4.0

73ca489

taquba-jobs-v0.4.0

Changed

Terminal marker filenames lead with an inverted timestamp, and the
result-retention sweep lists only expired markers (via the
object-store list_with_offset contract) instead of every retained
marker on every tick, so a sweep's listing cost is proportional to
the expired set. Markers written by earlier versions are not
recognised by the sweeper: when upgrading a store that ran with
result_retention enabled, clear the <result_prefix>/terminals/
prefix out-of-band.

Assets 2

15 Jun 12:42

micllam

taquba-cron-v0.4.0

73ca489

taquba-cron-v0.4.0

Changed

Raised the minimum taquba requirement to 0.8.

Assets 2

15 Jun 12:47

micllam

taquba-bulk-v0.2.0

73ca489

taquba-bulk-v0.2.0

Changed

Batch submission runs with bounded concurrency instead of one
awaited submit at a time. Each submission blocks on a durable
enqueue commit and concurrent commits share WAL flushes, so serial
submission capped at one item per flush interval (one item per
100ms at the SlateDB default). Enqueue order across in-flight
submissions is not defined; batch items are independent.

Added

BulkCtx::memoized_by_content and
BulkCtx::memoized_by_content_with_cached_cost for memoized steps
whose keys should be derived from serialized input content rather
than caller-supplied strings.
BulkCtx::memoized_with_cached_cost for memoized steps whose cost counters
should be recorded both on fresh compute and on memo hits.

Assets 2

30 May 12:23

micllam

taquba-bulk-v0.1.0

bb3fd01

taquba-bulk-v0.1.0

Initial release. Per-batch orchestrator that runs one pipeline over many
inputs in a single process on top of taquba-workflow.

Added

Pipeline: the per-item contract (typed Input / Output, an Error
that converts into a StepError, and an async run). Each input item
becomes one taquba-workflow run whose single step invokes run; the
pipeline's own logical steps live inside run as BulkCtx::memoized
calls.
BulkCtx<T>: per-item execution context. Carries the typed input,
run_id, and submitter headers; exposes memoized (durable per-step
result caching so an at-least-once retry replays cached results instead of
repeating a paid call), record_cost, and cancel_token.
CostReport: generic named-metric accumulator (token counts, paid-API
units, compute-seconds, dollars). Interior-mutable while a step runs and
serializable for the per-item envelope and the batch rollup.
Bulk / BulkBuilder: the runner. Submits N runs, drives the worker pool,
streams output as items complete, and aggregates progress and cost.
Builder options: output, key_fn, headers, max_concurrent,
poll_interval, queue_name, memo_prefix, fail_threshold. run
executes to completion; run_with_shutdown drains in-flight items on a
shutdown signal (e.g. spot preemption).
ProgressSnapshot: point-in-time counts, rate, estimated time remaining,
and cost rollup, returned by Bulk::progress.
BulkReport: final counts, elapsed time, cost rollup, and
failed_run_ids (re-submitting those ids resumes from cached memo state).
OutputSink with JsonlSink (one JSON record per line) and NullSink
(discards records, for side-effecting pipelines); read_jsonl for
line-delimited JSON input.
Error / Result: crate error type, including
Error::FailureThresholdExceeded when the share of failed items crosses
the configured threshold.
Re-exports StepError and StepErrorKind from taquba-workflow for the
Pipeline::Error type.

Assets 2

29 May 11:56

micllam

taquba-jobs-v0.3.0

5db214a

taquba-jobs-v0.3.0

Added

JobRunnerBuilder::result_retention(Duration): opt-in retention
window for persisted result blobs. When set, the runner writes a
terminal marker every time a job reaches a terminal state and an
in-process sweeper deletes that job's result blob retention after
termination. When unset (default), result blobs are retained
indefinitely (the previous behaviour). Once a blob is swept,
JobHandle::fetch_result for that job returns Ok(None) and an
idempotent re-submission of the same payload falls through to
re-running the job rather than short-circuiting; size the window
so it covers the longest gap callers need between submission and
idempotent re-submit.
JobRunnerBuilder::clock(Arc<dyn Clock>): override the time source
the runner reads its timestamps from (terminal-marker timestamps
and the retention sweep cutoff). Defaults to the queue's clock
(Queue::clock), so passing a MockClock to
Queue::open_with_options is enough for tests; this override is
for the rarer case where the runner needs a different clock than
the queue.

Changed

Idempotent submissions now short-circuit to a prior submission's
persisted outcome. Previously, Job::idempotency_key only deduped
against jobs that were still pending or scheduled: a re-submission
after the original acked would create a new job (re-paying for the
work). The dedup record now carries the assigned job_id (written
atomically with the enqueue via the new
EnqueueOptions::id_override) so a re-submission with a matching
payload returns a handle pointing at the cached result blob (with
newly_submitted = false).
The result-store prefix now reserves a sibling terminals/ segment
for retention markers (<prefix>/terminals/<terminal_at_ms:020>_<job_id>).
Existing result blobs (<prefix>/<job_id>) are unaffected: ULID
job ids cannot collide with the literal terminals segment. Markers
are only written when result_retention is configured.
Breaking (on-disk): JobSubmissionRecord (the durable per-idem-key
dedup record) gained a job_id field. Records written by earlier
versions of taquba-jobs will fail to deserialize and need to be
cleared; the simplest path is to delete the queue's user KV prefix
(usr:jobs/dedup/...) when upgrading.

Assets 2

28 May 12:57

micllam

taquba-workflow-v0.5.0

c444f84

taquba-workflow-v0.5.0

Added

Memo: per-step durable key-value store for memoizing within-step
side effects, backed by object storage. Bound to a specific
(run_id, step_number); get(key) / put(key, value) take only
the user key. Strictly per-step; the durable channel between steps
is StepOutcome::Continue's payload, not memo.
MemoStore: the backing store Memo views are derived from
(Arc<dyn ObjectStore> + path prefix). Used internally by the
runtime builder; users construct one directly mainly in tests.
Step::memo: every step receives a Memo scoped to its own
(run_id, step_number). Runners use it to cache results of
expensive within-step side effects (LLM calls, paid APIs) so
at-least-once retries don't re-pay for work the prior attempt
already did.
WorkflowRuntimeBuilder::memo_prefix: configures the object-store
prefix Step::memo entries live under. Defaults to "workflow-memo";
set a distinct prefix when multiple runtimes share one store.
Error::Store(taquba::object_store::Error): surfaced from memo
read/write failures. Classified as transient by is_permanent.
WorkflowRuntimeBuilder::memo_retention(Duration): opts the runtime
into writing a terminal marker via MemoStore::write_terminal_marker
on every terminal state (Succeeded, Failed, Cancelled). Markers
outlive the run record and provide the input a memo-retention sweep
consumes to decide when a run's memo entries are eligible for
deletion. Without this setter no marker is written and memo entries
are retained indefinitely (appropriate for short-lived runs or
external cleanup).
Memo-retention sweeper: when memo_retention is set,
WorkflowRuntime::run spawns a background task that periodically
scans terminal markers and, for each marker older than the
configured window, deletes the run's memo entries and then the
marker itself. The first sweep fires on startup so a fresh process
catches markers left behind by an earlier one. The sweeper shuts
down with the caller-supplied shutdown future.
WorkflowRuntime now reads every timestamp it writes
(DurableRunRecord::submitted_at_ms, the ContinueAfter run_at,
and the terminal-marker timestamp) through a taquba::Clock. By
default the runtime shares the clock its Queue was opened with
(via Queue::clock), so passing a MockClock to OpenOptions
virtualises time for the queue and the workflow runtime together.
WorkflowRuntimeBuilder::clock(Arc<dyn Clock>) overrides the
defaulted-from-queue clock when a test or specialised setup needs a
separate time source.

Changed

Breaking: WorkflowRuntime::builder now takes an additional
required object_store: Arc<dyn ObjectStore> argument between the
queue and the runner. The store backs Step::memo and need not be
the same store the queue was opened with, though sharing one (just
cloning the Arc) is the common case. Existing call sites must add
the store argument:
```
// Before:
let runtime = WorkflowRuntime::builder(queue, runner, hook).build();
// After:
let runtime = WorkflowRuntime::builder(queue, store, runner, hook).build();
```

Assets 2

28 May 12:56

micllam

taquba-v0.7.0

908744f

taquba-v0.7.0

Added

EnqueueOptions::id_override lets callers supply the job id instead
of receiving a generated ULID. Useful when the id must be known before
the enqueue returns. Ids are validated at the API boundary (1-128 bytes
of [A-Za-z0-9_-]) and bad inputs return the new
Error::InvalidId { id, reason } variant. Callers should prefer
ULID-shaped ids when FIFO-within-priority claim order matters:
pending/scheduled keys end with the id, so claim order follows
id sort.
Queue::clock() accessor returns the Arc<dyn Clock> the queue
was opened with (or the default SystemClock). Lets downstream
crates share the queue's time source for their own timestamp work
so virtualising time with MockClock advances the whole stack
in lockstep.
OpenOptions::flush_interval: Option<Duration> exposes SlateDB's
WAL flush interval. None keeps SlateDB's own default (100ms).
Every taquba state transition (enqueue, claim, ack, nack,
dead_letter) blocks on txn.commit() which waits for the next
flush tick, so this value is the lower bound on per-operation
latency.

Changed

Breaking on-disk layout: the done: keyspace is reordered from
done:{queue}:{id} to done:{completed_at:020}:{queue}:{id},
mirroring the existing time-first layout of claimed: and
scheduled:. The retention sweep can now early-exit on the first
unexpired record instead of walking the full prefix. Public API is
unchanged; in-flight runs from prior versions must be drained
before upgrading because the old keys will not be observed by the
reaper.
Queue::claim (and therefore claim_next / claim_with_wait)
serialises same-queue claim attempts through an in-process
tokio::sync::Mutex. Same-queue attempts no longer rely on
SlateDB's transaction-conflict retry to resolve which worker
takes the head of pending:. The lock is per-queue, so different
queues' claims still run in parallel. Per-claim wall-clock latency
under high single-queue concurrency drops from seconds to roughly
one commit interval. Public API unchanged.
Queue::claim now maintains an in-memory per-queue cursor that
records the most recently claimed pending: key, and starts the
next claim's scan from immediately after it. This skips the
tombstone band left by previously claimed (and deleted) pending:
entries that the SlateDB iterator would otherwise walk. The
cursor is invalidated whenever a pending: key is written at or
before it (nack-requeue, dead-job requeue, reaper-requeue,
scheduler promotion, and any enqueue at a lower-numbered
priority); when this happens the next claim falls back to a full
prefix scan. The cursor is not persisted: on process restart the
first claim falls back to the prefix scan and re-warms naturally.
Public API unchanged.
Bumped minimum slatedb version from 0.13 to 0.13.1.

Fixed

enqueue_with's non-dedup path (write_new) now retries on
transaction conflict, matching the dedup path (write_unique),
enqueue_with_kv, ack, dead_letter, and every other write path
in the crate. Previously a conflict during a non-dedup enqueue would
surface as Error::Storage to the caller; under normal contention
this would have manifested as spurious enqueue failures that a retry
could resolve.

Assets 2

Releases: micllam/taquba

taquba-workflow-v0.6.0

Changed

Fixed

Added

Uh oh!

taquba-webhooks-v0.3.0

Changed

Uh oh!

taquba-v0.8.0

Added

Changed

Fixed

Uh oh!

taquba-jobs-v0.4.0

Changed

Uh oh!

taquba-cron-v0.4.0

Changed

Uh oh!

taquba-bulk-v0.2.0

Changed

Added

Uh oh!

taquba-bulk-v0.1.0

Added

Uh oh!

taquba-jobs-v0.3.0

Added

Changed

Uh oh!

taquba-workflow-v0.5.0

Added

Changed

Uh oh!

taquba-v0.7.0

Added

Changed

Fixed

Uh oh!