Skip to content

Latest commit

 

History

History
213 lines (137 loc) · 15 KB

cap-0046-05.md

File metadata and controls

213 lines (137 loc) · 15 KB

Preamble

CAP: CAP-0046-05 (formerly 0053)
Title: Smart Contract Data
Working Group:
    Owner: Graydon Hoare <@graydon>
    Authors: Graydon Hoare <@graydon>
    Consulted: Siddharth Suresh <@sisuresh>, Jon Jove <@jonjove>, Nicolas Barry <@MonsieurNicolas>, Leigh McCulloch <@leighmcculloch>, Tomer Weller <@tomerweller>
Status: Final
Created: 2022-05-25
Discussion: https://groups.google.com/g/stellar-dev/c/vkzMeM_t7e8
Protocol version: 20

Simple Summary

This CAP defines a ledger entry types for storing data records for smart contracts, as well as host functions for interacting with it and some discussion of its interaction with contract invocation and execution.

Working Group

This protocol change was authored by Graydon Hoare, with input from the consulted individuals mentioned at the top of this document.

Motivation

Most nontrivial smart contracts have persistent state. Earlier smart contract CAPs left this topic undefined, this CAP attempts to fill in the gap.

Goals Alignment

Same goals alignment as CAP-46. This CAP is essentially "a continuation of the work initiated in CAP-46".

Requirements

  • Smart contracts must be able to store state in the ledger that persists between transactions.
  • Contracts should be given as much flexibility as possible in how they organize their data.
  • As much as possible, multiple contracts should be able to execute in parallel.
  • Parallel execution must maintain a strong consistency model: strict serializability.
  • The performance impact of user-initiated IO should be strictly limited, as IO can be very costly.
  • The granularity of IO should balance the desirability of amortizing fixed per-IO costs with the undesirability of IO on redundant data.
  • The ledger space consumed by contract data should be attenuated when possible, especially transient and dormant data.

Additionally, several considerations that applied to the data model of CAP-0046-01 apply here, especially around interoperability and simplicity:

  • At least some data should be readable passively without running contract code.
  • Data should be at least somewhat robust to version changes in the contract code accessing it.

Abstract

A new ledger entry type is added that stores key-value pairs, where both key and val are of type SCVal (defined in CAP-0046-01).

An additional small key-value map called instance storage is also available inside each contract instance ledger entry (defined in CAP-0046-02).

New host functions are added to query and modify these key-value pairs from within a smart contract.

Specification

Readers should be familiar with the content of CAP-0046-01 and CAP-0046-02 at least, this CAP uses their definitions.

Ledger entry and key

This CAP adds an entry type code LedgerEntryType.CONTRACT_DATA, an entry struct ContractDataEntry, and a variant of the LedgerEntry and LedgerKey unions to store the ledger entry and its key material, respectively, under the CONTRACT_DATA type code.

The LedgerKey of a CONTRACT_DATA ledger entry is composed of:

  • A ContractID field (as defined in CAP-0046-02).
  • A ContractDataDurability field (as defined in CAP-0046-TBD data expiry).
  • A key field, an SCVal chosen by the contract.

Each CONTRACT_DATA ledger entry has a unique LedgerKey, but multiple ledger entries may have the same key field within that LedgerKey, so long as the other earlier fields differ. In other words, there is a separate "key-space" for key-value mappings within each ContractID and ContractDataDurability level.

The LedgerKey implied by an access to a key-value pair held in instance storage is different from that of a key-value pair in its own ledger entry. See the section on instance storage.

Instance storage

This CAP adds (or rather, describes access to) an SCMap called storage stored inside each SCContractInstance. This instance storage map contains data that is closely coupled to the lifecycle and read-access pattern of the contract instance itself. See CAP-0046-02 for details on contract instances.

A contract accesses instance storage by key, but this key is not used to form a LedgerKey to access a ledger entry in the storage system. Rather, the entire SCContractInstance (including its entire SCMap storage) is accessed as a single CONTRACT_DATA ledger entry with its LedgerKey keyed by a special SCVal reserved just for this purpose: SCV_LEDGER_KEY_CONTRACT_INSTANCE.

In other words, all keys and values in instance storage are loaded or saved together, and the ledger entry storing them is just the instance entry itself.

Host functions

Host functions are provided to get, put, delete, and check for the existence of a key-value pair.

The same host functions are used to access key-value pairs in CONTRACT_DATA ledger entries or instance storage.

When accessing CONTRACT_DATA ledger entries:

  • The ContractID in the LedgerKey being accessed is implicitly that of the calling contract
  • The host function can only access ledger entries with that ContractID

When accessing instance storage:

  • The instance storage being accessed is implicitly that of the calling contract
  • The host function can only access that instance storage

Host functions for key-value access are also passed a storage type.

StorageType

The storage type provided to a given key-value access host function is, at the host interface level, a plain u64 that encodes the Rust enum StorageType.

This type has 3 cases:

  • StorageType::Temporary = 0 which directs the host function to access a ContractData ledger entry with ContractDataDurability::TEMPORARY.
  • StorageType::Persistent = 1 which directs the host function to access a ContractData ledger entry with ContractDataDurability::PERSISTENT.
  • StorageType::Instance = 2 which directs the host function to access contract-instance storage.

The storage type specifies two separate dimensions of storage:

  • Whether to access a key-value mapping held in instance storage or a separate per-key ledger entry
  • The durability of the key-value mapping to access.

These two separate dimensions are selected with a single value because instance storage is always implicitly PERSISTENT, as it is stored in contract instances which are PERSISTENT.

Restrictions

Point access only

Contract data IO is restricted to so-called "point access" to specific keys. In particular there is no support for "range queries", upper or lower bounds, or any sort of iteration over the keyspace.

Static footprints

To facilitate parallel execution, contract data IO is also restricted to operate on keys that are declared in the so-called footprint of each transaction. The footprint is a set of LedgerKeys each of which is marked as either read-only or read-write. The footprint permits any read of a key within it, or a write of any key within it that is marked as read-write. All other reads and writes are not permitted.

Note that access to instance storage is described, at the level of LedgerKeys and therefore footprints, as an access to the key reserved for identifying instances: SCV_LEDGER_KEY_CONTRACT_INSTANCE.

Any call to a host function to interact with a LedgerKey that is not permitted by the footprint will generally trap. For instance storage, trapping such an invalid access may be deferred until contract exit, when any modified instance storage map is implicitly written back to the ledger.

The footprint of a transaction is static for the duration of the transaction: it is established before transaction execution begins and does not change during execution.

XDR changes

See the XDR diffs in the Soroban overview CAP, specifically those covering new CONTRACT_DATA ledger entries.

Host function additions

;; Stores a value $v_val under the key $k_val with storage type $t_storagetype.
;; Traps if the current footprint does not allow writing to the LedgerKey implied by
;; the write. For instance storage, the footprint check may be deferred to contract
;; completion.
(func $put_contract_data (param $k_val i64) (param $v_val i64) (param $t_storagetype i64) (result i64))

;; Retrieves the value associated with the key $k_val with storage type $t_storagetype.
;; Traps if the current footprint does not allow reading from the LedgerKey implied by
;; the read, or if there is no value associated with $k_val. For instance storage, the
;; footprint check may be deferred to contract completion.
(func $get_contract_data (param $k_val i64) (param $t_storagetype i64) (result i64))

;; Deletes the value associated with the key $k_val with storage type $t_storagetype.
;; Traps if the current footprint does not allow writing to the LedgerKey implied by
;; the delete, or if there is no value associated with $k_val. For instance storage, the
;; footprint check may be deferred to contract completion.
(func $del_contract_data (param $k_val i64) (param $t_storagetype i64) (result i64))

;; Returns a boolean value indicating whether there is a value associated with $k_val
;; with storage type $t_storagetype. Traps if the current footprint does not allow
;; reading from the LedgerKey implied by the query, or if there is no value associated
;; with $k_val. For instance storage, the footprint check may be deferred to contract
;; completion.
(func $has_contract_data (param $k_val i64) (param $t_storagetype) (result i64))

Semantics

The semantics of each host function is described in the associated comments above.

These semantics should be considered in the light of the strict serializability requirement and the understanding that all IO occurs within a transaction. In particular:

  • Each write is visible immediately within the issuing transaction, but not to any other transaction, until the writing transaction commits
  • All reads and writes are observable to transactions as they would be if the transactions executed sequentially in transaction-set application order

The durability of each write is controlled by the archival strategy described in CAP-0046-12 state archival.

Design Rationale

Granularity

Granularity of data elements is a key consideration in storage. Too large and IO is wasted loading and storing redundant data; too small and the fixed space and time overheads associated with storing each data element overwhelm the system. Moreover when parallel execution is included in consideration, the storage granularity becomes the unit of contention, with two contracts constrained to execute serially (or with some mechanism to enforce serializable consistency) when they share access to a single data element and at least one of them performs a write.

Keying contract data by arbitrary SCVal values allows users to choose the granularity of data entering and leaving IO functions: fine-grained data may be stored under very large and specific keys, or coarser-grained data may be stored under smaller prefixes or "group" keys, with inner data structures such as vectors or maps combining together groups of data values. This is an intentional decision to allow contract authors to experiment and find the right balance, rather than deciding a priori on a granularity.

Instance storage

Some data (such as authentication and configuration data) is both small and has a strong implicit connection to a contract instance: it should have the same lifecycle as the instance, and will commonly be accessed (read-only) on every call to the contract. For this sort of data, storage in a separate ledger entry can introduce unwanted performance overhead and, more importantly, potential failure modes (for example when the data expires but the contract instance does not). To simplify such cases and improve performance, instance storage was added.

Note that instance storage should not be used for unbounded key-value data or for data that will be frequently written during execution, as this will introduce artificial contention on the instance storage and defeat concurrent execution.

Static footprint

The requirement that each transaction have a static footprint serves both to limit arbitrary user-initiated IO mid-transaction (i.e. to enable efficient bulk IO only at transaction endpoints) as well as to enable static scheduling of parallel execution.

This limits transactions to those which have static footprints, which at first glance may seem overly restrictive. To make it work in practice, contracts with dynamic footprints need to be run twice, once "offline" (or out of the main stellar-core processing phase, for example on a horizon server with a recent ledger snapshot) and then once again online, as part of normal stellar-core processing.

The first run is executed in a special trial-run or "recording" mode that permits any reads or writes and just observes and records the footprint; but it also does not actually effect any changes to the ledger, running against a (possibly stale) read-only snapshot and discarding all writes at the end of execution. The recorded footprint is then used as the static footprint for the second run, when the transaction is submitted for real execution against the real ledger, in stellar-core. The second execution thereby validates and enforces the footprint, assuming nothing has changed between recording and enforcing. If the true footprint has changed between recording and enforcing, the transaction fails the second run and the user must retry the cycle.

This technique is taken from the "deterministic database" and "conflict-free concurrency control" literature, where footprints are sometimes also called "read-write sets" and footprint recording is sometimes also called "reconnaissance queries".

Protocol Upgrade Transition

Backwards Incompatibilities

There is no backwards incompatibility consideration in this CAP.

Resource Utilization

By restricting IO to pre-declared static footprints, IO costs are fairly limited. The transaction execution lifecycle will perform bulk IO of all ledger entries in the footprint at the beginning of transaction execution, and write back those modified entries only at the end of execution. Calibrating the costs of such IO and reflecting it in fees charged for use remains an open problem to address. This CAP expects to build on the cost model that CAP-46 and CAP-51 will eventually provide.

Security Concerns

The main security risk is unauthorized data-writing, as all data on the blockchain is publicly readable in any case.

The authorization model for writes is narrow and easy to understand: contracts are restricted to only being able to write to data with their contract ID. Further authorization checks are delegated to contracts themselves to manage.

Test Cases

TBD.

Implementation

There is are two work-in-progress branches associated with this CAP: