Skip to content

test(resharding): add resharding test#924

Draft
meskill wants to merge 3 commits intomainfrom
push-tmpxoryqppmp
Draft

test(resharding): add resharding test#924
meskill wants to merge 3 commits intomainfrom
push-tmpxoryqppmp

Conversation

@meskill
Copy link
Copy Markdown
Contributor

@meskill meskill commented Apr 22, 2026

No description provided.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 22, 2026

Codecov Report

❌ Patch coverage is 23.52941% with 52 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...nd/replication/logical/publisher/publisher_impl.rs 0.00% 45 Missing ⚠️
...c/backend/replication/logical/subscriber/stream.rs 69.56% 7 Missing ⚠️

📢 Thoughts on this report? Let us know!

@meskill meskill force-pushed the push-tmpxoryqppmp branch from 82bd5de to a17559f Compare April 22, 2026 21:58
@blacksmith-sh

This comment has been minimized.

@meskill meskill changed the title ci: add resharding test to ci ci: remove rwx from ci Apr 23, 2026
@blacksmith-sh

This comment has been minimized.

@meskill meskill force-pushed the push-tmpxoryqppmp branch 7 times, most recently from 2ab35a0 to 9bcf206 Compare April 28, 2026 14:25
@blacksmith-sh

This comment has been minimized.

@meskill meskill force-pushed the push-tmpxoryqppmp branch 8 times, most recently from 45fa833 to 440fadb Compare April 29, 2026 20:02
@blacksmith-sh

This comment has been minimized.

@meskill meskill force-pushed the push-tmpxoryqppmp branch 3 times, most recently from 2cc7d5a to 95dd3d9 Compare May 6, 2026 08:25
@meskill meskill force-pushed the push-tmpxoryqppmp branch from 95dd3d9 to 4a058fa Compare May 6, 2026 09:09
@blacksmith-sh

This comment has been minimized.

@meskill meskill force-pushed the push-tmpxoryqppmp branch from 4a058fa to 0489312 Compare May 6, 2026 09:26
@meskill meskill force-pushed the push-tmpxoryqppmp branch 4 times, most recently from 6a6afc8 to 5f052e8 Compare May 7, 2026 11:48
@meskill meskill changed the title ci: remove rwx from ci test(resharding): add resharding test May 7, 2026
@meskill meskill force-pushed the push-tmpxoryqppmp branch from 5f052e8 to 5c1ef01 Compare May 7, 2026 13:20
solution 1, it catches the multi-row cases that sequential apply cannot prevent.

---

Copy link
Copy Markdown
Collaborator

@levkk levkk May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solution 3:

Create one async writer task per table and send it rows using mpsc. Then, buffer entire transactions in memory and only write them once you received all of it. Writes will be serialized on a table-per-table basis, while parallelized across all shards.

Solution 4:

Create one async writer task per destination shard: this way whatever writes shards need to process, they will always be in sequence. Buffer transactions in memory also, to make sure you don't have contention.

Solution 5:

Keep the architecture the same and just buffer transactions. Don't send "partial requests" to any shard, i.e. don't use Flush, just Sync.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've integrated this proposed solutions into the doc and tried to think if they will fit.
I think there are some possible issues with the separate worker approach that could be hard to solve properly. Mostly considering the transactions, reverting and also the latency since we basically introduce more locks above the pg locks.

I created 3 pr with different levels of solutions that fixes the particular issue with the resharding test - the deadlocks for omni tables, but they are also controversial I'd say.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one that makes only one shard responsible for each omni table makes sense to me the most. The tables are the same on all shards, so we are effectively writing the same data N times (N = number of original shards). So if we reduce that number by N to 1, 1 per table, that's a completely reasonable approach I think.

We do something similar for SELECT queries against omni tables - we pick a shard "at random". If we cared about consistency that much, we would look up the data on all shards and check, but that's too expensive, so we make the assumption omni tables are identical - which is true and if it's not, something broke.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants