Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
82bd5de to
a17559f
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
2ab35a0 to
9bcf206
Compare
This comment has been minimized.
This comment has been minimized.
45fa833 to
440fadb
Compare
This comment has been minimized.
This comment has been minimized.
2cc7d5a to
95dd3d9
Compare
This comment has been minimized.
This comment has been minimized.
6a6afc8 to
5f052e8
Compare
| solution 1, it catches the multi-row cases that sequential apply cannot prevent. | ||
|
|
||
| --- | ||
|
|
There was a problem hiding this comment.
Solution 3:
Create one async writer task per table and send it rows using mpsc. Then, buffer entire transactions in memory and only write them once you received all of it. Writes will be serialized on a table-per-table basis, while parallelized across all shards.
Solution 4:
Create one async writer task per destination shard: this way whatever writes shards need to process, they will always be in sequence. Buffer transactions in memory also, to make sure you don't have contention.
Solution 5:
Keep the architecture the same and just buffer transactions. Don't send "partial requests" to any shard, i.e. don't use Flush, just Sync.
There was a problem hiding this comment.
I've integrated this proposed solutions into the doc and tried to think if they will fit.
I think there are some possible issues with the separate worker approach that could be hard to solve properly. Mostly considering the transactions, reverting and also the latency since we basically introduce more locks above the pg locks.
I created 3 pr with different levels of solutions that fixes the particular issue with the resharding test - the deadlocks for omni tables, but they are also controversial I'd say.
There was a problem hiding this comment.
The one that makes only one shard responsible for each omni table makes sense to me the most. The tables are the same on all shards, so we are effectively writing the same data N times (N = number of original shards). So if we reduce that number by N to 1, 1 per table, that's a completely reasonable approach I think.
We do something similar for SELECT queries against omni tables - we pick a shard "at random". If we cared about consistency that much, we would look up the data on all shards and check, but that's too expensive, so we make the assumption omni tables are identical - which is true and if it's not, something broke.
No description provided.