fix: Memory-efficient sync and reconciliation for large tables by taariq · Pull Request #76 · serenorg/database-replicator

taariq · 2025-12-10T19:03:28Z

Summary

This PR fixes critical memory and timeout issues in the xmin-based sync daemon that were causing failures on tables with millions of rows:

Sync daemon: Was loading ALL rows into memory before processing, causing 10GB+ memory usage and connection timeouts
Reconciler: Was loading ALL primary keys from both databases into memory (~2.4GB for 14M row tables)

Changes

Batched sync processing (sync_table)
- Use existing read_changes_batched() + fetch_batch() instead of loading all rows
- Process and write each batch immediately (memory = O(batch_size))
- Update sync state after each batch for resume capability
- Add progress logging every 10 batches
Auto-detect batch size based on available memory
- Cross-platform memory detection (Linux, macOS, Windows)
- Calculate optimal batch size using 25% of available memory
- Range: 1K-50K rows, default 10K if detection fails
- Enables same code to work on t3.nano (512MB) through r6i.24xlarge (768GB)
Batched reconciliation (reconcile_table_batched)
- Implement merge-join comparison on sorted primary keys
- Use keyset pagination (WHERE pk > last_pk) for efficient batching
- Fetch PKs in batches from both source and target
- Delete orphans in batches as discovered
- Progress logging every 100K comparisons

Memory Impact

Operation	Before	After
Sync (14M rows)	~10 GB	~20 MB
Reconciliation (14M rows)	~2.4 GB	~20 MB

Testing

All 228 unit tests pass
Clippy passes with no warnings
Manual testing recommended on production-scale tables

Test plan

Unit tests pass
Clippy lints pass
Code formatted with cargo fmt
Manual test: Sync table with millions of rows
Manual test: Reconcile table with millions of rows
Verify memory stays bounded on t3.nano instance

Closes #74
Closes #75

The sync_table method was loading entire tables into memory before processing, causing: - 10GB+ memory usage for tables with millions of rows - Connection timeouts when queries exceeded ELB idle timeouts - Failed syncs with "connection closed" errors Changes: - Use existing batched reader (read_changes_batched + fetch_batch) instead of loading all rows at once - Process and write each batch immediately (memory = O(batch_size)) - Update sync state after each batch for resume capability - Add progress logging every 10 batches - Increase default batch_size from 1000 to 10000 for better throughput - Check for xmin wraparound at start rather than during read This reduces memory from O(total_rows) to O(batch_size), enabling sync of tables with millions of rows without OOM or timeouts. Closes #74

Add cross-platform memory detection and automatic batch size calculation to prevent OOM on small instances while maximizing throughput on larger ones. New functions in utils.rs: - get_available_memory(): Cross-platform (Linux, macOS, Windows) - Linux: Reads MemAvailable from /proc/meminfo - macOS: Uses sysctl + vm_stat for free/inactive pages - Windows: Uses GlobalMemoryStatusEx Win32 API - calculate_optimal_batch_size(): Auto-calculates based on memory - Uses 25% of available memory as working budget - Assumes 2KB per row (conservative estimate) - Clamps between 1,000 and 50,000 rows Expected batch sizes by instance type: - t3.nano (512MB): ~1,000 rows - t3.small (2GB): ~10,000 rows - t3.large (8GB+): 50,000 rows (capped) Refs #74

The reconciler was loading ALL primary keys from both source and target tables into memory before comparing them. For tables with millions of rows (e.g., 14M rows), this caused: - 2-3 GB memory usage just for PKs - Potential OOM on memory-constrained instances - Connection timeouts during long-running PK fetch queries Changes: - Add reconcile_table_batched() using merge-join comparison - Implement PkBatchReader with keyset pagination (WHERE pk > last_pk) - Fetch PKs in sorted batches from both databases - Compare using single-pass merge-join (both streams sorted) - Delete orphans in batches as they're discovered - Add progress logging every 100K comparisons This reduces memory from O(total_rows) to O(batch_size), enabling reconciliation of tables with millions of rows without OOM. Closes #75

This commit fixes critical correctness issues identified in PR #76 review: ## Critical Fix 1: xmin batching skipping rows with same xmin The batched xmin reader was using `WHERE xmin > $1` which skips rows when multiple rows share the same xmin (bulk inserts in single transaction). Fix: Use (xmin, ctid) as compound pagination key. ctid provides a stable tie-breaker for rows with identical xmin values. - Add `last_ctid` field to BatchReader - Use `WHERE (xmin, ctid) > ($1, $2::tid)` for subsequent batches - Include `ctid::text` in SELECT and ORDER BY ## Critical Fix 2: Reconciler PK ordering mismatch PKs were cast to ::text in SELECT but ORDER BY used native column types. For numeric PKs: "10" < "2" lexicographically but 10 > 2 numerically. This caused false orphan detection and data loss. Fix: Use ::text cast in both SELECT and ORDER BY to ensure SQL stream order matches Rust's lexicographic string comparison. - Change ORDER BY from `"col"` to `"col"::text` - Change WHERE from `"col" > $1` to `"col"::text > $1` ## Moderate Fix: macOS page size detection Apple Silicon uses 16KB pages, not 4KB. Hardcoded 4KB underestimated available memory by 4x, leading to unnecessarily small batch sizes. Fix: Use `sysctl hw.pagesize` to get actual page size.

taariq · 2025-12-10T19:18:38Z

Review Fixes Applied

Addressed all findings from the code review:

Critical Fixes

1. xmin batching now handles duplicate xmin values (reader.rs)

Added ctid as tie-breaker for pagination
Uses WHERE (xmin, ctid) > ($1, $2::tid) for subsequent batches
Prevents skipping rows when bulk inserts share the same xmin

2. Reconciler PK ordering now consistent (reconciler.rs)

Both SELECT and ORDER BY now use ::text cast
SQL stream order matches Rust lexicographic comparison
Prevents false orphan detection for numeric PKs

Moderate Fix

3. macOS page size detection (utils.rs)

Now uses sysctl hw.pagesize instead of hardcoded 4KB
Apple Silicon machines correctly report 16KB pages
Batch sizes now accurate on M1/M2/M3 Macs

All 228 tests pass, clippy clean.

taariq added 5 commits December 10, 2025 10:32

style: Format code with cargo fmt

6f36b9d

taariq merged commit e4f63d1 into main Dec 10, 2025
7 checks passed

taariq deleted the fix/memory-efficient-sync-reconciler branch December 10, 2025 19:32

This was referenced Dec 10, 2025

sync command loads entire table into memory causing OOM and connection timeouts #74

Closed

Reconciler loads all PKs into memory causing OOM on large tables #75

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Memory-efficient sync and reconciliation for large tables#76

fix: Memory-efficient sync and reconciliation for large tables#76
taariq merged 5 commits into
mainfrom
fix/memory-efficient-sync-reconciler

taariq commented Dec 10, 2025

Uh oh!

taariq commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

taariq commented Dec 10, 2025

Summary

Changes

Memory Impact

Testing

Test plan

Uh oh!

taariq commented Dec 10, 2025

Review Fixes Applied

Critical Fixes

Moderate Fix

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant