Skip to content

v0.3.166 - Selectable Upload Key Strategy + Safe Adopt-Existing

Choose a tag to compare

@mow-coding mow-coding released this 04 Jul 07:45

v0.3.166 - Selectable Upload Key Strategy + Safe Adopt-Existing

v0.3.166 fixes the highest-consequence gap in the live object-storage upload
adapter (WOM #11): an operator whose objects already live under their own key
layout could have those objects re-uploaded, and — worse — a recorded key could
be trusted for a skip against a location the object was never at. This release
makes the upload key selectable and recorded, adds a safe adopt-existing
workflow, and pins one non-negotiable invariant: a skip is legal only when a live
HEAD proves the object is present at the recorded key with a matching size, in the
same run that skips. When in doubt, upload.

The one invariant

A skip is legal only when backed by a live HEAD proving present-at-the-recorded-
key + size-match, right now, in the same run that skips. Never skip on a purely
computed or purely claimed key.

Concretely, under a live transport the executor ALWAYS re-HEADs the recorded
remote_key before skipping. A recorded key that 404s re-uploads (never a silent
skip — silent data loss on restore is the cardinal sin). The re-HEAD matches the
recorded verification: a location adopted presence+size is re-checked presence-only
(no whole-object download just to confirm a skip), while a content-hashed upload
keeps its stronger checksum re-check. The resume ledger's terminal-success
short-circuit is subordinate to this live proof — once a re-HEAD proves an object
absent, the re-upload is forced past any stale ledger row, so a wiped remote is
never silently skipped from a prior run's ledger. Plan and apply resolve the same
key by construction: the plan echoes the fully-resolved remote_key into each row,
and apply refuses the run (fail closed) if its re-resolved key diverges. The plan
verdict is strategy-aware — a prior location under a different key layout does not
predict a skip the apply path would then re-upload.

What ships

  • Selectable key strategy. --key-strategy {sha256_content_addressed, prefix}
    (default unchanged, byte-identical to before), plus --key-prefix <literal> and
    --key-append-extension, on object-storage-upload,
    object-storage-upload-plan, object-storage-upload-verify, and the new
    object-storage-adopt-existing. The prefix strategy places an object at
    <configured-prefix>/<sha256>[.<ext>]; the default lands at exactly
    sha256/<first2>/<sha256> with no prefix prepended.

  • Two-field key model (additive, no migration). Every object-storage location
    and execution receipt now records a new remote_key (the literal
    bucket-relative key the object is/was PUT/HEAD at) alongside the unchanged
    content-addressed key_hint. The idempotency HEAD, the skip matcher, and future
    download tooling key off remote_key; the digest audits keep validating
    key_hint, so no existing location is flagged corrupt and the default strategy's
    remote_key equals its key_hint.

  • object-storage-adopt-existing (the 158 GB false-skip fix). A verified
    adopt (--approve + a live transport) HEADs each computed client key and adopts
    ONLY on presence + Content-Length size-match — not a content hash, because a full
    re-hash would GetObject the whole archive (R2 has no server-side whole-object
    sha256). --content-hash-verify is an explicit per-object opt-in. A 404 /
    size-mismatch is not adopted, so a wrong prefix or extension self-limits to zero
    adopts and those objects simply re-upload. A declared adopt
    (--accept-unverified-adopt, a flag distinct from --approve) records a
    NON-gating declared_uploaded location that never skips a PUT. Adopt reports
    verified-count vs total so a template miss is visible, never a silent partial.
    Because a verified adopt HEADs presence-only, adopting a 158 GB set costs a HEAD
    per object, not a download per object. Verified adopt is a live surface, so it
    honours the same tiny-first tiered gate as object-storage-upload: a bulk
    first-live adopt refuses until a single tiny-first object (--only <id>) has
    proved the store.

  • Audits accept the new strategy without weakening. The three manifest audits
    and the execution-receipt doctor audit accept a correct prefix-strategy
    location/receipt AND additionally verify that a non-default remote_key binds
    the record's digest — catching a valid-looking key for the wrong object. The
    upload-evidence writer now shares the single content-addressed key_hint
    producer (the duplicate literal is gone). The execution-contract preview tells
    the truth for both strategies.

  • remote_key has its own validator. It holds a path within the bucket
    (slashes, dots, and the archive-id colon are legal) but never a leading slash,
    .., a bucket name, an endpoint host, or a URL — leak-checked so public-privacy
    stays green.

Discoverability

object-storage-upload-plan now emits a visible hint when objects may already
exist under a different key layout ("run object-storage-adopt-existing with your
--key-prefix before --approve to avoid re-uploading"), and a default-strategy
upload warns when the store already has non-default-strategy locations.

Safety and scope

No archive migration and no hash change. The default strategy is byte-identical to
v0.3.165. This release builds the strategy/adopt machinery so it is correct
whenever the live transport is enabled; it does not change live-execution gating.
Verified adopt is presence+size only and is labeled as such
(remote_key_verification: presence_size); a declared adopt is labeled "claimed,
not verified — will NOT skip a PUT".

Upgrade

See UPGRADE.md. New flags are opt-in; existing runs behave exactly as before.