v0.9.2
The SQL Server source grows up: the export streams (peak RSS bounded by
batch_size, not the chunk —mode: fullon a 2 M-row heavy table went from a
~10 GB OOM to 1 file at 171 MB), detects connection poolers / the Azure SQL
gateway, honourslock_timeout/statement_timeout/throttle_ms, and is
covered by full live parity suites (resume / chunked-recovery / crash-recovery
/ reconcile-repair) plus a type matrix round-tripped through DuckDB, ClickHouse
and live BigQuery. Internally, the probe → memory-cap → adaptive → throttle
batch policy that was triplicated across the PG / MySQL / MSSQL export loops is
now one unit-testedAdaptiveBatchController— PG/MySQL fully revalidated, no
behaviour change.
SQL Server — streaming export + pooler detection + parity test suites
refactor(source)— the probe → memory-cap → adaptive-resize → throttle
batch-sizing policy, previously triplicated across the PG / MySQL / SQL Server
export loops, is now one unit-testedAdaptiveBatchController. Engines provide
only what differs (row source + memory-cap formula). SQL Server gains the same
first-batch memory-cap probe PG/MySQL have. No behaviour change — full live
validation across all three engines (parity, recovery, type matrices).feat(mssql)— the export now honours the source-safetySourceTuning
knobs it previously ignored (it read onlybatch_size):lock_timeout
(server-sideSET LOCK_TIMEOUTso a blocked read fails fast),statement_timeout
(client-side wall-clock budget — SQL Server has no statement-durationSET;
live-verified to abort + retry), andthrottle_ms(sleep between batches).
Brings MSSQL to in-export tuning parity with the PG/MySQL engines.feat(mssql)— the export now streams: it consumes thetiberius
result set incrementally and emits one Arrow batch pertuning.batch_size
rows instead of materialising the whole chunk (into_first_result). Peak RSS
is bounded bybatch_size × row_bytes, independent ofchunk_size— the
SQL Server analogue of the PG cursor'sFETCH N. So a largechunk_size(or
mode: full) gives few large files at low memory;chunk_sizenow controls
file count,batch_sizecontrols memory. Measured on 2 M heavy rows:
mode: fullwent from a ~10 GB materialise (OOM) to 1 file at 171 MB.feat(mssql)— connection pooler / gateway detection (MssqlProxyKind):
@@SPIDdrift across two queries → transaction-mode multiplexer (Multiplexed);
SERVERPROPERTY('EngineEdition')5/8 (or an Azure@@VERSIONbanner) →
AzureGateway. One connect-time warning, mirroring PG (pg_backend_piddrift)
and MySQL (CONNECTION_ID()drift). Pure classifier exhaustively unit-tested;
live direct-connection guard inlive_pool_safety.test(mssql)— full live parity suites mirroring the PG/MySQL twins:
live_mssql_resume,live_mssql_chunked_recovery,live_mssql_crash_recovery,
live_mssql_reconcile_repair,live_mssql_chunked. Wired into the per-PR
e2ejob and Nightly (mssql service + seed step added to both).docs(mssql)—datetime2sub-microsecond truncation documented as a
tracked gap: rivet maps timestamps to microsecond, so a baredatetime2
(precision 7 = 100 ns) incremental cursor lands one tick below the source max
and re-exports the boundary row each run — usedatetime2(6)or coarser.
Reliability / type-mapping / tuning matrices gained SQL Server rows.docs(mssql)— Gentle SQL Server extraction
best-practice + copy-paste config (rivet_mssql_gentle.yaml): how to stay easy
on both the source DB and the rivet worker. Documents the one MSSQL footgun —
use an explicitchunk_size(rows), notchunk_size_memory_mb(which can't
size by bytes yet on MSSQL and falls back to ~500k-row chunks, so wide rows
buffer multiple GB). Measured live on a 2 M-row heavy table: 2 759 MB →
101 MB peak RSS (~27×) at the same wall time by switching tochunk_size.
Includes a row-width sizing table and theenvironment: productiongovernor lever.bench(mssql)— DBA-harm matrix for SQL Server
(REPORT_mssql.md+
mssql_db_bench.sh): measured against live SQL Server 2022, rivet's chunked
autocommit reads hold no long transaction (0 ms), pin nothing back
from log truncation, add zero write pressure (read-only), and take a
3–4 lock peak footprint. A per-tool comparison (mssql_harm_compare.sh,
2 M-rowcontent_items) puts that next to the competitors: rivet's longest
single query is 6.6 s vs ~9 min for sling/dlt scanning the table in one
shot, and rivet holds no open transaction where dlt holds one for ~8 min
(version-store / log-truncation hazard).bench(mssql)— competitive performance matrix
(REPORT_mssql.md) vs sling / dlt on
live SQL Server 2022: rivet wins throughput on narrow-to-medium tables
(sub-second, 3–30× faster) and — with the streaming export — holds the lowest
or competitive peak RSS on every table, including the wide ones. The one
remaining gap is wall-time on heavy-text rows (thetiberiusrow decode), not
memory.test(mssql)—bigquery_validates_mssql_type_matrix_parquet: the SQL
Server type matrix now also round-trips through live BigQuery (autoload types,
microsecond TIME/TIMESTAMP,uniqueidentifier→BYTES, decimal sums). All
three type matrices (PG / MySQL / SQL Server) now pass through every oracle —
DuckDB, ClickHouse, and live BigQuery.
Upgrade notes
- No config or API changes. Existing PostgreSQL / MySQL / SQL Server exports
are unaffected; theAdaptiveBatchControllerrefactor is internal and fully
re-validated on all three engines. - SQL Server chunked exports open a fresh connection per chunk (as the PG and
MySQL engines do) and run the one-time pooler / gateway detection on each. On a
many-chunk export that is real connection + auth churn — prefer a larger
chunk_size(fewer, larger files; the streaming export keeps memory bounded
regardless) over many small chunks. See
gentle SQL Server extraction.