Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: implement substore support #3131

Merged
merged 77 commits into from
Nov 15, 2023
Merged

storage: implement substore support #3131

merged 77 commits into from
Nov 15, 2023

Conversation

hdevalence
Copy link
Member

@hdevalence hdevalence commented Oct 2, 2023

Towards #3129, this PR implements support for prefixed substores.

A substore is an independent section of storage parametrized over a prefix, and a set of column families. It embeds its own independent merkle tree, flat KV store (nonverifiable storage), and key preimage index.

A substore has three logical components:

  • a stateless configuration that specifies its scope and columns: penumbra_storage::store::SubstoreConfig.
  • a read-only view to a specific version of its state: penumbra_storage::store::SubstoreSnapshot.
  • a write-handle to its storage instance: penumbra_storage::store::SubstoreStorage.

The default store (aka. "main store") has an empty prefix and hosts both non-prefixed values and substore root hashes stored at their respective prefix. For example, consider this simplified layout:

/// Main store layout:
/// ┌────────────────┬──────────────────┐
/// │                │    merkle tree   │
/// │ empty prefix   │                  │
/// │                │       Root R0    │
/// └────────────────┴─────────┬────────┘
///                ┌───────────┼───────────┐
///                ▼           ▼           ▼
///   ┌─────┬───────────┐    ┌───┐  ┌─────┬───────────┐
///   │ibc/ │    R1     │    │   │  │misc/│     R2    │
///   └─────┴─────┬─────┘    └┬─┬┘  └─────┴─────┬─────┘
///               │           │ │               │
///          ┌────┼───┐       │ │            ┌──┴──┐
///          │    │   │       ▼ ▼            │     │
///          ▼    ▼   ▼       ▩ ▩            ▼     │
///        ┌───┐  ▩   ▩                      ▩     ▼
///        └┬─┬┘                                 ┌───┐
///         │ │                                  └┬─┬┘
///         │ │                                   │ │
///         ▼ ▼                                   ▼ ▼
///         ▩ ▩                                   ▩ ▩

In addition to non-prefixed keys, the main store hosts two substores: one with prefix ibc/ and the other with prefix misc/. These prefixes are both defining a namespace and are actual keys that map to the root hash of their respective merkle trees.

Rather than requiring consumers to track storage namespaces, we perform routing transparently. When querying a key from the JMT, we determine whether the key belongs to a substore, and if so, we fetch the latest known version of that store. This means that state access interfaces do not change.

Versioning is done on a per-substore basis, each batch of writes yielding an increase in version numbers.

Writes are performed atomically, using rocksdb::WriteBatch. Substore writes are performed first, so that we can then collect their root hashes and store them at their respective prefix key in the main store.

@hdevalence hdevalence temporarily deployed to smoke-test October 2, 2023 03:10 — with GitHub Actions Inactive
@hdevalence
Copy link
Member Author

(If someone wants to try running with this idea for a bit, feel free, just be sure to push up WIP changes as you go).

@erwanor erwanor self-assigned this Oct 3, 2023
@erwanor erwanor temporarily deployed to smoke-test October 4, 2023 18:00 — with GitHub Actions Inactive
@erwanor erwanor temporarily deployed to smoke-test October 4, 2023 19:04 — with GitHub Actions Inactive
@erwanor erwanor temporarily deployed to smoke-test October 4, 2023 20:07 — with GitHub Actions Inactive
@erwanor erwanor temporarily deployed to smoke-test October 4, 2023 21:34 — with GitHub Actions Inactive
@erwanor erwanor temporarily deployed to smoke-test October 4, 2023 22:20 — with GitHub Actions Inactive
@erwanor erwanor marked this pull request as ready for review October 4, 2023 22:21
@erwanor erwanor marked this pull request as draft October 4, 2023 22:23
@erwanor erwanor marked this pull request as ready for review October 4, 2023 22:33
@erwanor erwanor changed the title wip: hack on substore support storage: first pass at implementing substores Oct 4, 2023

let (prefix, config) = self.0.multistore.route_key(prefix);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the prefix queries, I think we need to do another check: we need to ensure that the iteration doesn't cross a "mount point". I think we can do this by banning queries supplying prefixes that are themselves prefixes of mount points, returning an error instead. (I don't think this is a meaningful restriction, maybe it means we can't iterate over every single key but I think that's OK)

@erwanor erwanor temporarily deployed to smoke-test October 5, 2023 14:20 — with GitHub Actions Inactive
Copy link
Member Author

@hdevalence hdevalence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @erwanor for carrying this forward. There are a lot of changes here but they look good to me. There are a lot of new tests. It seems like the smoke test indicates that we can drop this in without problems.

Is there any reason we should not merge this now, and then attempt to remove the penumbra-specific apphash in favor of an IBC substore?

@erwanor
Copy link
Member

erwanor commented Nov 15, 2023

Thanks for the review, really appreciate it. I'll wait for CI to pass before merging. Next steps like adjusting the ICS23 proof api are scoped in this comment.

@erwanor erwanor merged commit 11d3b52 into main Nov 15, 2023
6 checks passed
@erwanor erwanor deleted the storage-substore branch November 15, 2023 21:46
erwanor added a commit that referenced this pull request Jun 24, 2024
## Describe your changes

This PR contains a minimal reproduction for a bug in cnidarium's prefix
query handling. It also contains a sketch for a fix that we can
workshop. The bug was introduced in the original substore implementation
PR (#3131).

## Minimal reproduction of the prefix range cache bug.

**Context**
`cnidarium`, our storage layer, supports prefix storage.
This allows users to configure independent storage units, each with
their own merkle tree, nonverifiable sidecar, and separate namespace.
Routing is done transparently without the user having to worry about
the details.

**Overview**
Prefix queries return tuples of (key, value)s, but instead of
returning the full key, they return the substore key. This is a layering
violation, and indeed causes a bug in the cache interleaving logic.

**Terminology**
- a `full_key`: a key that contains a substore prefix, a delimiter, and
a substore key.
- a `substore_key`: a key with a stripped prefix.

**Walkthrough**
`StateDelta` index changes using full keys, as it is not aware of the
particular substore configuration that it is working against, by design.
As part of the cache interleaving logic, the `StateDetla` will try look
for
new writes or covering deletions. However, since the base prefix
implementation
returns substore keys, the cache will build an incoherence range and
panic (or miss data).

## Checklist before requesting a review

- [x] If this code contains consensus-breaking changes, I have added the
"consensus-breaking" label. Otherwise, I declare my belief that there
are not consensus-breaking changes, for the following reason:

> Consensus breaking in the sense that the chain won't halt if we hit
this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Status: Testnet 64: Titan
Development

Successfully merging this pull request may close these issues.

2 participants