v0.8.0
Two arcs. Type support gains a per-target type resolver (DuckDB,
BigQuery, Snowflake) that maps every column to its native type, the
degraded type a generic loader autoloads, and the SQL to recover the
difference — with Snowflake added and live-verified, BigQuery array/JSON
recovery, MySQL decimal precision read from the wire,Decimal256/i256
past the old ~38-digit ceiling, and CSV failing loud on unsupported
columns instead of writing silent empties. Integrity gains
no-download content verification: each part's MD5, computed once before
upload, is compared to the checksum the store already exposes in its
listing (GCSmd5Hash, S3 single-PUT ETag, Azure singlePut Blob
Content-MD5) — so--validateconfirms content, not just size, without
pulling a byte back. A new per-exportverify: size | contentmakes that
a declarable contract. Verified end-to-end on live GCS, S3, and Azure.No breaking changes for operators:
verifydefaults tosize, the new
summary.jsonfields are additive, and the type pipeline is unchanged for
existing exports.
Type support
feat(types)—src/types/target.rs: aRivetType→ target resolver.
ExportTarget::{DuckDb, BigQuery, Snowflake}each map a column to a
TargetColumnSpec(native type, autoload type, status, note, recovery
cast_sql). Dispatch keys off the semanticRivetType, not the physical
Arrow type, sojson/uuid/enumresolve correctly. Total and
infallible — an unmappable column is astatus: Failrow, never an error.
Per-exporttarget:config +rivet check --type-report --target <t>
prints the autoload-vs-native table and a ready-to-run recovery
CREATE TABLE … AS SELECTover the staging table.feat(types)— Snowflake target added and live-verified
(snowflake_validates_*harness): json→TEXT/PARSE_JSON, uuid→BINARY/
HEX_ENCODE+REGEXP, naive ts→NUMBER/TO_TIMESTAMP_NTZ, time→NUMBER/
TIME_FROM_PARTS, list→VARIANT/::ARRAY, plus theBINARY_AS_TEXT=FALSE
load-format requirement.feat(types)— BigQuery type-recovery SQL (L5, post-load CTAS — BQ
rejects declared native types on load) and array recovery via
--parquet_enable_list_inference+UNNEST.fix(types)—Decimal256parses straight intoi256, removing the
i128 ~38-digit bottleneck (now up to 76 digits).feat(types)— MySQLDECIMALprecision/scale read from wire metadata
(works for any query, unlike PG's catalog-only path).fix(csv)— CSV fails loud on array / unsupported columns instead of
writing a silent empty value;uuid: stringand similar overrides honoured.
Integrity — no-download content verification
feat(validate)—--validateconfirms each part's content by
comparing its manifest MD5 to the checksum the store surfaces in object
listings — no download. Encodings are normalised to raw digest bytes so
GCS base64md5Hashand S3 hex ETag of the same content compare equal; an
S3 multipart composite ETag or a checksum-less object degrades to size-only.feat(destination)— small parts upload as a single PUT
(op.write) so the store computes and stores a content checksum the listing
exposes; this is the only way to get aContent-MD5on Azure (a single
Put Blob, neverPut Block List). A process-wide byte budget bounds the
RAM held in one-shot buffers regardless of upload concurrency; larger parts
stream (size-only).feat(destination)—Destination::writereturnsWriteOutcome
carrying the store's upload-response checksum; the commit path compares it to
the locally computed MD5 for a fail-fast, no-round-trip transit-integrity
check (catches a part corrupted in flight at write time).feat(validate)— per-exportverify: size | content.content
fails validation for any part only size-verified, with an actionable message
(lowermax_file_sizeso parts fit a single PUT, or the backend exposes no
checksum). The run report andrivet validateshow coverage explicitly:
N verified (M md5, K size-only).
Architecture — verification seam
refactor(pipeline)— one purereconcile_manifest_against_listing
(manifest × destination listing) with two thin consumers: destination verify
(Presence → Failure) and chunked resume (Presence → ResumeDecision).
Replaces two near-identical walks; destination verify now derives part
presence from a singlelist_prefixinstead of per-part HEADs.refactor(validate)—ManifestVerification.passedis derived
(manifest_foundand no fatal failure — advisoryUntrackedObjectdoes
not count) in one place, so a new failure variant is fatal by default rather
than relying on every site to flip a bool. Per-runparts_md5_verified
reports content coverage. The single-variantIntegrityLevel(a no-op after
re-download verification was rejected) was removed.refactor(pipeline)— part xxh3 fingerprint + MD5 computed in a single
streamed read (compute_part_checksums).
Live verification
- Type fidelity re-confirmed on DuckDB, ClickHouse, BigQuery, and
Snowflake after the integrity changes. Content verification + transit
check +verifypolicy verified end-to-end on live GCS, AWS S3
(eu-north-1), and Azure buckets, including manifest-tamper detection.