-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: implement substore support #3131
Conversation
(If someone wants to try running with this idea for a bit, feel free, just be sure to push up WIP changes as you go). |
crates/storage/src/snapshot.rs
Outdated
|
||
let (prefix, config) = self.0.multistore.route_key(prefix); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the prefix queries, I think we need to do another check: we need to ensure that the iteration doesn't cross a "mount point". I think we can do this by banning queries supplying prefixes that are themselves prefixes of mount points, returning an error instead. (I don't think this is a meaningful restriction, maybe it means we can't iterate over every single key but I think that's OK)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @erwanor for carrying this forward. There are a lot of changes here but they look good to me. There are a lot of new tests. It seems like the smoke test indicates that we can drop this in without problems.
Is there any reason we should not merge this now, and then attempt to remove the penumbra-specific apphash in favor of an IBC substore?
Thanks for the review, really appreciate it. I'll wait for CI to pass before merging. Next steps like adjusting the ICS23 proof api are scoped in this comment. |
## Describe your changes This PR contains a minimal reproduction for a bug in cnidarium's prefix query handling. It also contains a sketch for a fix that we can workshop. The bug was introduced in the original substore implementation PR (#3131). ## Minimal reproduction of the prefix range cache bug. **Context** `cnidarium`, our storage layer, supports prefix storage. This allows users to configure independent storage units, each with their own merkle tree, nonverifiable sidecar, and separate namespace. Routing is done transparently without the user having to worry about the details. **Overview** Prefix queries return tuples of (key, value)s, but instead of returning the full key, they return the substore key. This is a layering violation, and indeed causes a bug in the cache interleaving logic. **Terminology** - a `full_key`: a key that contains a substore prefix, a delimiter, and a substore key. - a `substore_key`: a key with a stripped prefix. **Walkthrough** `StateDelta` index changes using full keys, as it is not aware of the particular substore configuration that it is working against, by design. As part of the cache interleaving logic, the `StateDetla` will try look for new writes or covering deletions. However, since the base prefix implementation returns substore keys, the cache will build an incoherence range and panic (or miss data). ## Checklist before requesting a review - [x] If this code contains consensus-breaking changes, I have added the "consensus-breaking" label. Otherwise, I declare my belief that there are not consensus-breaking changes, for the following reason: > Consensus breaking in the sense that the chain won't halt if we hit this.
Towards #3129, this PR implements support for prefixed substores.
A substore is an independent section of storage parametrized over a prefix, and a set of column families. It embeds its own independent merkle tree, flat KV store (nonverifiable storage), and key preimage index.
A substore has three logical components:
penumbra_storage::store::SubstoreConfig
.penumbra_storage::store::SubstoreSnapshot
.penumbra_storage::store::SubstoreStorage
.The default store (aka. "main store") has an empty prefix and hosts both non-prefixed values and substore root hashes stored at their respective prefix. For example, consider this simplified layout:
In addition to non-prefixed keys, the main store hosts two substores: one with prefix
ibc/
and the other with prefixmisc/
. These prefixes are both defining a namespace and are actual keys that map to the root hash of their respective merkle trees.Rather than requiring consumers to track storage namespaces, we perform routing transparently. When querying a key from the JMT, we determine whether the key belongs to a substore, and if so, we fetch the latest known version of that store. This means that state access interfaces do not change.
Versioning is done on a per-substore basis, each batch of writes yielding an increase in version numbers.
Writes are performed atomically, using
rocksdb::WriteBatch
. Substore writes are performed first, so that we can then collect their root hashes and store them at their respective prefix key in the main store.