-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: sharding phase 1 RFC #5432
Conversation
2652 tests run: 2528 passed, 0 failed, 124 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
be81eaa at 2024-03-14T11:04:10.157Z :recycle: |
I also wonder if the initial idea with the safekeepers is actually better:
So I'd rather start with "all wal to all shards" and then went through that list async without blocking initial sharding. |
I am still not quite sure about whether scattering WAL is good idea. Also, as I already wrote, in principle when we are sending the same stream to multiple consumers we can use low level broadcast. Which seems to be much more efficient than scattering. Not sure if it is applicable to safekeepers<->PS communication (at least because PS are not synced and may have different positions in the stream). Concerning decoding - there are are some WAL commands which modify pages not specified in pagerefs: |
7335475
to
3187af3
Compare
Sorry if I missed something, but I failed to find answer for one question related to key sharding in this RFC. So to which shard compute should send get_rel_size request? |
One more key sharding issue not covered by this RFC. VM fork is updated either by special WAL records, either as part of heap insert/update/delete operations (when correspondent bit is set). The problem is that block number of the updated VM fork page is very different from block number in the main fork specified in heap_* record. But for FSM/VM fork updates it is not so trivial to do. Their buffer tags are not specified in record's blocks data. We have to decode WAL record, check bit (if it changes page visibility) then calculate position of VM page and check if it is assigned to this shard and if so - update it. Task becomes more challenged if we are going to scatter WAL. In this case we should look for this VM flags and broadcast record in this case. |
6bcb09f
to
a65e709
Compare
I've cleaned this up to cover what we did in Q4 -- will publish a smaller follow-on RFC that describes shard splitting (Q1). |
We need to shard our Tenants to support larger databases without those large databases dominating our pageservers and/or requiring dedicated pageservers.
This RFC aims to define an initial capability that will permit creating large-capacity databases using a static configuration
defined at time of Tenant creation.
Online re-sharding is deferred as future work, as is offloading layers for historical reads. However, both of these capabilities would be implementable without further changes to the control plane or compute: this RFC aims to define the cross-component work needed to bootstrap sharding end-to-end.