test(resharding): add resharding test by meskill · Pull Request #924 · pgdogdev/pgdog

meskill · 2026-04-22T20:36:38Z

No description provided.

codecov · 2026-04-22T20:45:32Z

Codecov Report

❌ Patch coverage is 23.52941% with 52 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...nd/replication/logical/publisher/publisher_impl.rs	0.00%	45 Missing ⚠️
...c/backend/replication/logical/subscriber/stream.rs	69.56%	7 Missing ⚠️

📢 Thoughts on this report? Let us know!

levkk · 2026-05-07T14:47:14Z

+solution 1, it catches the multi-row cases that sequential apply cannot prevent.
+
+---
+


Solution 3:

Create one async writer task per table and send it rows using mpsc. Then, buffer entire transactions in memory and only write them once you received all of it. Writes will be serialized on a table-per-table basis, while parallelized across all shards.

Solution 4:

Create one async writer task per destination shard: this way whatever writes shards need to process, they will always be in sequence. Buffer transactions in memory also, to make sure you don't have contention.

Solution 5:

Keep the architecture the same and just buffer transactions. Don't send "partial requests" to any shard, i.e. don't use Flush, just Sync.

I've integrated this proposed solutions into the doc and tried to think if they will fit.
I think there are some possible issues with the separate worker approach that could be hard to solve properly. Mostly considering the transactions, reverting and also the latency since we basically introduce more locks above the pg locks.

I created 3 pr with different levels of solutions that fixes the particular issue with the resharding test - the deadlocks for omni tables, but they are also controversial I'd say.

The one that makes only one shard responsible for each omni table makes sense to me the most. The tables are the same on all shards, so we are effectively writing the same data N times (N = number of original shards). So if we reduce that number by N to 1, 1 per table, that's a completely reasonable approach I think.

We do something similar for SELECT queries against omni tables - we pick a shard "at random". If we cared about consistency that much, we would look up the data on all shards and check, but that's too expensive, so we make the assumption omni tables are identical - which is true and if it's not, something broke.

meskill force-pushed the push-tmpxoryqppmp branch from 82bd5de to a17559f Compare April 22, 2026 21:58