Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPC 2.0 #13700

Open
amnn opened this issue Sep 8, 2023 · 26 comments
Open

RPC 2.0 #13700

amnn opened this issue Sep 8, 2023 · 26 comments

Comments

@amnn
Copy link
Contributor

amnn commented Sep 8, 2023

Note

RPC 2.0 has launched, the most up-to-date version of its docs can be found on the Sui Docs.

Motivation

We’re excited to share a proposal re-imagining Sui’s RPCs (front- and back-end), optimizing for:

  • Query Expressiveness. The API faithfully represents Sui’s highly composable and inter-connected object model.
  • Stable Releases. The API offers a stable platform for developers to work against: RPC releases come with a commitment to not introduce breaking interface changes. This does not come at the cost of pace of development: Features continue releasing early and often for the community to try, according to a predictable roadmap.
  • Extensibility. Advanced apps and RPC providers can add additional post-processing pipelines, and RPC endpoints serving data from those pipelines. Custom deployments can serve a subset of the default endpoints, and only post-process and store a subset of the backing data.
  • Performance and Reliability. The RPC service should continue serving reads in times of high transaction activity on the network, and vice versa.

Summary of Changes

The biggest user-facing change is that RPC 2.0 will offer a GraphQL interface, instead of JSON-RPC. GraphQL offers a better fit for Sui’s Object Model, comes with established standards for extensions (federation, schema stitching) and pagination (cursor connections), and a more mature tooling ecosystem including an interactive development environment.

On the back-end, the RPC service and its data-store will be decoupled from fullnodes. Fullnodes’ APIs will be limited to transaction execution and data ingestion for indexers, with all read requests served by a new, stateless RPC service, reading from its own data store. Indexers will consume transaction data from fullnodes in bulk, post-process them and write them to the store.

This redesign also offers an opportunity to address many known pain points with the existing RPCs such as deprecating the unsafe transaction serialization APIs, and providing more efficient query patterns for dynamic fields, among other usability issues reported by users of the current RPC.

Timeline

By end of October 2023, we will release an interactive demo supporting most queries in the schema linked in the GraphQL section to follow. This service is offered as beta software, without SLA, primarily intended for SDK builders to target ahead of a production release. It will not support transaction execution or dry runs and will operate on a static snapshot of the data which will be periodically updated as new features in RPC 2.0 are implemented.

By end of December 2023 the first version of the new RPC will be released as 2024-01.0, at the end of the fourth quarter of 2023. This version of the RPC will support all MVP features in the proposed schema and it will be deployed by Mysten Labs and shared with third party RPC providers for integration into their services. There will be an opportunity for RPC providers to provide feedback on the service architecture and for their customers to provide more feedback on how the API is to use which we will aim to incorporate into future versions of the RPC (released quarterly).

Support for the existing RPC will continue at least until end of Q1 2024, to give time to migrate. Until that time, changes to the existing RPC will be kept to a minimum (barring bug-fixes), to avoid disruption. We will assess whether there is sufficient support for GraphQL in the ecosystem, before we sunset the existing RPC.

Versioning

RPCs will adopt a quarterly release schedule and a versioning scheme of [year]-[month].[patch] (e.g. 2023-01.0, 2023-02.3, etc). Breaking version changes are reserved for new [year]-[month] versions, while patch versions maintain interface backwards compatibility.

This replaces the current scheme which ties RPC version to fullnode/validator version (which can update weekly). Decoupling node and RPC versions allows RPC to evolve at its own pace and differentiates breaking RPC changes from breaking node changes, and even changes to the indexer that processes data for the RPC to read (which will be versioned separately).

Setting Versions

API versions can be supplied as a header, not including the patch version (e.g. X-Sui-RPC-Version: 2023-10). If a header is not supplied, the latest released version is assumed. The response header will include the full version used to respond to the request, including the patch version (e.g. X-Sui-RPC-Version: 2023-10.2).

Deprecation

Each RPC major version will receive 6 months of support for bugfixes. Publicly available Mysten-operated RPC instances will also continue to provide access to an RPC version for 6 months after its initial releaes. Clients that continue to use versions older than 6 months will be automatically routed to the oldest supported version by the public Mysten RPC instances. E.g. clients who continue to use 2023-10.x past the release of 2024-04.y will automatically be served responses by 2024-01.z to limit the number of versions an RPC provider needs to support.

When deprecating an individual feature, care will be taken to initially make changes in a schema-preserving way, and reserve breaking changes for a time when usage of the initial schema has dropped. When deprecations remove fields, subsequent interface changes will avoid re-adding the field with new semantics, to reduce the chances of an unexpected breaking change for a client that is late to update.

GraphQL

A draft of part of the schema follows, giving a flavor of what the new interface will look like. The design leverages GraphQL’s ability to nest entities (e.g. when querying a transaction block, it will be possible to query for the contents of the gas coin). Fields will be nullable by default, to leave flexibility for field deprecations without breaking backwards compatibility. Pagination will be implemented using using Cursor Connections with opaque cursor types:

type Query {
  # Find a transaction block either by its transaction digest or its
  # effects digest.
  transactionBlock(filter: TransactionBlockID!): TransactionBlock
}

# String containing 32B hex-encoded address
scalar SuiAddress

# String representation of an arbitrary width, possibly signed integer
scalar BigInt

# String containing Base64-encoded binary data.
scalar Base64

# ISO-8601 Date and Time
scalar DateTime

# Find a transaction block either by its transaction digest, or its
# effects digest (can't be both, can't be neither).
input TransactionBlockID {
  transactionDigest: String
  effectsDigest: String
}

input EventFilter {
  # ... snip ...
}

type TransactionBlock {
  id: ID!
  digest: String!

  sender: Address
  gasInput: GasInput
  kind: TransactionBlockKind
  signatures: [TransactionSignature]
  effects: TransactionBlockEffects

  expiration: Epoch

  bcs: Base64
}

type TransactionBlockEffects {
  digest: String!
  status: ExecutionStatus!

  errors: String
  transactionBlock: TransactionBlock
  dependencies: [TransactionBlock]

  lamportVersion: BigInt
  gasEffects: GasEffects
  objectReads: [Object]
  objectChanges: [ObjectChange]
  balanceChanges: [BalanceChange]

  epoch: Epoch
  checkpoint: Checkpoint

  eventConnection(
    first: Int,
    after: String,
    last: Int,
    before: String,
    filter: EventFilter,
  ): EventConnection

  bcs: Base64
}

enum ExecutionStatus {
  SUCCESS
  FAILURE
}

type GasInput {
  gasSponsor: Address
  gasPayment: [Object!]

  gasPrice: BigInt
  gasBudget: BigInt
}

type GasEffects {
  gasObject: Coin
  gasSummary: GasCostSummary
}

type GasCostSummary {
  computationCost: BigInt
  storageCost: BigInt
  storageRebate: BigInt
  nonRefundableStorageFee: BigInt
}

type ObjectChange {
  inputState: Object
  outputState: Object

  idCreated: Boolean
  idDeleted: Boolean
}

type BalanceChange {
  owner: Owner
  coinType: MoveType
  amount: BigInt
}

type Coin {
  id: ID!
  balance: BigInt
  asMoveObject: MoveObject!
}

type EventConnection {
  edges: [EventEdge!]!
  pageInfo: PageInfo!
}

type EventEdge {
  cursor: String
  node: Event!
}

type PageInfo {
  hasNextPage: Boolean!
  hasPreviousPage: Boolean!
  startCursor: String
  endCursor: String
}

type Address {
  # ... snip ...
}

type Object {
  # ... snip ...
}

type Epoch {
  # ... snip ...
}

type Event {
  # ... snip ...
}

union TransactionBlockKind =
    ConsensusCommitPrologueTransaction
  | GenesisTransaction
  | ChangeEpochTransaction
  | ProgrammableTransactionBlock

type ConsensusCommitPrologueTransaction {
  # ... snip ...
}

type GenesisTransaction {
  # ... snip ...
}

type ChangeEpochTransaction {
  # ... snip ...
}

type ProgrammableTransactionBlock {
  # ... snip ...
}

type TransactionSignature {
  # ... snip ...
}

type MoveObject {
  # ... snip ...
}

type MoveType {
  # ... snip ...
} 

For a more detailed look at the proposed schema, and to follow its development, consult draft_schema.graphql or the snapshot of the schema currently supported by the implementation.

Extensions

The ability to add extensions to the RPC is a common request. Apps may require secondary indices, and RPC providers often provide their own data to augment what the chain provides.

GraphQL offers multiple standards for schema extensions (e.g. Federation, Schema Stitching) and even multiple implementations of those standards (e.g. Apollo Federation, Conductor) that offer the ability to seamlessly serve an extended schema over multiple services (which could be implemented on different stacks).

Mysten Labs will offer a base RPC service implementation that supports the same functionality as the existing indexer. Functionality will be split into logical groups (see below) which can be turned off for a given deployment to reduce CPU and storage requirements. This implementation will be compatible with a schema extension standard (but will not require one to work out-of-the-box).

Logical Functional Groups

  • Core: Reading objects, checkpoints, transactions and events, executing, inspecting and dry-running transactions.
  • Coins: Accessing coin metadata, and per-address coin and balance information.
  • Dynamic Fields: Querying an object’s dynamic fields
  • Subscriptions (Transactions and Events)
  • Packages: Accessing struct and function signatures, tracking package versions and UpgradeCaps, tracking popular packages.
  • System State: Read information about the current epoch (protocol config, committee, reference gas price).
  • Name Server: SuiNS name lookup and reverse lookup.
  • Analytics: Statistics about how the network was running (TPS, top packages, APY, etc).

Interface Changes

Unsafe APIs

The unsafe_ APIs, responsible for serializing transactions to BCS (e.g. unsafe_moveCall, unsafe_paySui, unsafe_publish etc) will be removed in RPC 2.0. SDKs that depend on these APIs for transaction serialization will be offered a native library with a C interface to convert transaction-related data structures between JSON and BCS to replace this functionality.

See Issue #13483 for a detailed proposal for this new library.

Dynamic Fields

The 1 + N query pattern related to dynamic fields was a common complaint with the existing RPC interface: Clients that wanted to access the contents of all dynamic fields for an object needed to issue a query to list all the dynamic fields, followed by N queries to get the contents of each object.

This will be addressed through RPC 2.0’s use of GraphQL, which allows a single query to access an object’s dynamic fields and their contents:

type Object {
  dynamicFieldConnection(
    first: Int,
    after: String,
    last: Int,
    before: String,
  ): DynamicFieldConnection

	type DynamicFieldConnection {
	  edges: [DynamicFieldEdge!]!
	  pageInfo: PageInfo!
	}
	
	type DynamicFieldEdge {
	  cursor: String
	  node: DynamicField!
	}

  type DynamicField {
	  id: ID!
	  name: MoveValue
	  value: DynamicFieldValue
	}
	
	union DynamicFieldValue = MoveObject | MoveValue
}

Using this API, the following query can be used to page through the contents of all dynamic field names and values for a given object:

query {
  object(address: objectId) {
    dynamicFieldConnection {
      pageInfo { hasNextPage, endCursor }
      edges {
        node {
          name,
          value {
            ... on MoveObject { contents { type, data } }
            ... on MoveValue { type, data }
          }
        }
      }
    }
  }
}

Dry Run and Dev Inspect

The existence of both a dryRun and a devInspect API has been a source of confusion, as they offer overlapping functionality. RPC 2.0 will combine the two into a single API to provide the behavior of both without overlap:

type Query {
  dryRunTransactionBlock(
    txBytes: Base64!,
    txMeta: TransactionMetadata,
    skipChecks: Boolean,
  ): DryRunResult
}

input TransactionMetadata {
  sender: SuiAddress
  gasPrice: U64
  gasObjects: [SuiAddress!]
}

type DryRunResult {
  transaction: TransactionBlock
  errors: String

  events: [Event!]
  results: [DryRunEffect!]
}

type DryRunEffect {
  mutatedReferences: [DryRunMutation!]
  returnValues: [DryRunReturn!]
}

type DryRunMutation {
  input: TransactionInput
  type: MoveType
  bcs: Base64
}

type DryRunReturn {
  type: MoveType
  bcs: Base64
}

The combined API can be used to replicate the current functionality of dryRun and devInspect as follows:

query {
  # To dry-run a transaction, pass its `TransactionData` BCS and Base64 encoded
  # as `txBytes`.
  dryRun: dryRunTransactionBlock(txBytes: "...") {
    transaction
    errors
    events
  }

  # To replicate dev inspect, pass a `TransactionKind` BCS and Base64 encoded as
  # `txBytes` and supply the information to turn it into `TransactionData` in
  # `txMeta`.  Pass `skipChecks: true` to bypass the usual consistency checks.
  #
  # All fields in `txMeta` are optional, and will be replaced by sensible defaults
  # if omitted, but `txMeta` itself must be passed in order to treat `txBytes` as
  # a `TransactionKind` rather than `TransactionData`.
  devInspect: dryRunTransactionBlock(
    txBytes: "...",
    txMeta: {
      sender: senderAddress,
      gasPrice: "1000",
    },
    skipChecks: true,
    epoch: 42,
  ) {
    transaction
    errors
    events
    results
  }

  # Gas estimation is currently done using `dryRun` and requires:
  #
  # - Making a call to get the reference gas price,
  # - Creating `TransactionData` with a sentinel `gasObjects` value (an empty
  #   list), the reference gas price, and the `TransactionKind` containing the 
  #   transaction body.
  # - Calling `dryRun` with this `TransactionData` to get an estimate of the gas
  #   cost.
  # - Creating a new `TransactionData` after having selected the appropriate
  #   coins for the transaction.
  #
  # The new API can also perform gas estimation by dry-running such a 
  # `TransactionData`, but also offers a more convenient API that accepts a
  # `TransactionKind`.  With this new API, the gas estimation flow is as follows:
  #
  # - Call `dryRunTransactionBlock` with the `TransactionKind`, to simultaneously
  #   get the reference gas price and the gas cost estimate.
  # - Create a `TransactionData` from the `TransactionKind` after having selected
  #   the appropriate coins for the transaction.
  gasEstimation: dryRunTransactionBlock(
    txBytes: "...", 
    txMeta: { sender: senderAddress }) {
    transaction { 
      gasInput { gasPrice }
      effects { gasEffects { gasSummary } } 
    }
  }
}

Data Formats

The number of input and output formats will be limited to maintain consistency across API surfaces:

  • Digests will use Base58 encoding everywhere.
  • Other binary blobs will use Base64 encoding.
  • IDs (Objects and Addresses) will be normalized to their maximum width with a leading 0x on output (but will be accepted in a truncated form in input).
  • 64-bit and wider integers will be represented as strings to avoid loss of precision bugs when converting to and from double-precision floating points (Doubles will not be accepted directly, and numbers will not be returned as Doubles even if the particular value fits in a Double).

Clients that depend on truncated package IDs in outputs and numbers represented as Doubles will need to migrate to the new data formats while adopting the new API.

Types that are represented using BCS on-chain (such as TransactionBlocks, TransactionEffects and Objects) will offer a consistent API for querying as BCS, to cater to clients that relied on the BCS or Raw output functionality in the existing JSON-RPC.

Data Consistency

Currently, clients that need read-after-write consistency use the WaitForLocationExecution execution type. This guarantees that reads on the fullnode that ran the transaction will be consistent with the writes from that transaction, however:

  • If a service deploys multiple fullnodes behind a load balancer, clients must be pinned to the same fullnode to take advantage of this consistency guarantee.
  • The guarantee does not extend to secondary indices, which may still lag behind.

The new interface will do away with WaitForLocalExecution and provide a blanket consistency guarantee for all its data sources (i.e. all data returned from a single RPC request will be from a consistent snapshot).

To enable this, the RPC’s indexer will need to commit writes as a result of checkpoint post-processing atomically, which increases latency (transactions that are final may take longer to show up in fullnode responses). A Core API will be provided to query which range of checkpoints the service has complete information for:

type Query {
  # Range of checkpoints that the RPC has data available for (for data
  # that can be tied to a particular checkpoint).
  availableRange: AvailableRange!
}

type AvailableRange {
  first: Checkpoint
  last: Checkpoint
}

type Checkpoint {
  id: ID!
  digest: String!
  sequenceNumber: BigInt!

  # ... snip ...
}

Typically, last will be the latest checkpoint in the current epoch (modulo some post-processing latency), and first will be determined by RPC store pruning. All APIs are guaranteed to work above first, some may continue to work when serving data based on checkpoints below it.

Consistency and Cursors

Paginated queries (using the Relay Cursor Connection spec) will also support consistency. For example, suppose Alice has an account, 0xA11CE, with many objects, and we query their objects’ IDs:

query {
  address("0xA11CE") {
    objectConnection {
      edges { node { location } }
      pageInfo { endCursor }
    }
  }
}

After issuing this query, Alice transfers an object, O to Bob, at 0xB0B, so at the latest version of the network, Alice no longer owns an object that they previously did own. However, paginating the original query by successively querying:

query {
  address("0xA11CE") {
    objectConnection(after: endCursor) {
      edges { node { location } }
      pageInfo { endCursor }
    }
  }
}

Will iterate through a set of objects that is guaranteed to include O, regardless of whether the object was transferred before the page containing it was fetched or after. This ensures that queries that run over multiple requests still represent a consistent snapshot of the network.

This feature depends on the RPC service having access to data from historical checkpoints, which may be pruned (e.g. if the historical checkpoint has a sequence number lower than availableRange.last). If the checkpoint is pruned, cursors pointing at data in that checkpoint will become invalidated, causing subsequent queries using that cursor to fail.

The history that is retained in the RPC’s data store is configurable. Publicly available, free RPC services will aim to retain enough history to support queries on cursors that are a couple of hours old, whereas paid services can support older historical queries.

This kind of consistency only applies on a cursor-by-cursor basis, so if we run a similar query for Bob, in a separate request, after the transfer from Alice:

query {
  address("0xB0B") {
    objectConnection {
      edges { node { location } }
      pageInfo { bobsEndCursor: endCursor }
    }
  } 
}

And later paginate both sets of cursors:

query { 
  alice: address("0xA11CE") {
    objectConnection(after: alicesEndCursor) {
      edges { node { location } }
      pageInfo { alicesEndCursor: endCursor }
    }
  }

  bob: address("0xB0B") {
    objectConnection(after: bobsEndCursor) {
      edges { node { location } }
      pageInfo { bobsEndCursor: endCursor }
    }
  }
}

Then object O will appear in both Alice’s object set (paginating through cursors that were initially created before the transfer) and Bob’s object set (paginating through cursors that were created after the transfer), so although both sets are self-consistent, the overall query may not represent a consistent snapshot.

Limits and Metering

The flexibility that GraphQL offers comes with a risk of handling more complex nested queries, which could consume too many resources and result in a bad experience for other RPC users. This will be addressed through limits that are configurable per RPC instance:

  • Limits on GraphQL query nesting depth and structure.
  • Limits on response sizes and DB query complexity (the amount of compute used or pages read for all database queries tied to a particular request).
  • Timeouts on queries

Mysten-operated, publicly available RPC endpoints will be configured with conservative limits (and transaction rate-limiting) to ensure fair use, but other RPC providers are free to adapt the limits they offer.

Service Architecture

This new service architecture is intended to remove the fullnode from the data serving path as well as providing a solution that lends itself more towards scalability.

Data will be ingested from a fullnode to a customizable indexer for processing. From there indexers can do any required processing before sending the data off to various different storage solutions. The image below shows one such indexer storing blob data (raw object contents or raw transactions) in a key/value store while sending relational data to a relational database. From there we can have any number of stateless RPC services running in order to process requests from clients, grabbing data from the requisite data store in order to service each request.

Service Architecture

Some of the motivations for removing the fullnode from the data serving path are as follows:

  • We found that fullnodes that have heavy read load can fall behind leading to serving stale data to clients.
  • In the event that a fullnode has an issue and needs to be taken out of service, it could take an unknown amount of time to spin up another fullnode and have to catch up sufficiently in order to be able to properly service requests. With the above architecture RPC requests could still be served even if fullnodes were having issues.
  • Due to storage limitations a single fullnode wouldn’t be able to store the full history of the chain, leveraging scalable storage solutions lets us service historical RPC requests.
  • GraphQL Federation, makes it possible to add a custom data source and extend the stateless RPC service to also serve that data.

Fullnodes may still expose a limited API, e.g. to submit transactions, query the live object set, etc, for debugging purposes but the bulk of traffic is expected to be served via instances of the RPC service.

Data Ingestion

In order to facilitate third-party custom indexers and data processing pipelines we’re designing and implementing an Indexer Framework with a more efficient data API between a FN and an Indexer.

The framework is built to allow for third-parties to build their own Handlers which contain custom logic to process and index the chain data that they care about. At the time of writing, the trait is as follows:

#[async_trait::async_trait]
trait Handler: Send {
    fn name(&self) -> &str;
    async fn process_checkpoint(&mut self, checkpoint_data: &CheckpointData) -> anyhow::Result<()>;
}

pub struct CheckpointData {
    pub checkpoint_summary: CertifiedCheckpointSummary,
    pub checkpoint_contents: CheckpointContents,
    pub transactions: Vec<CheckpointTransaction>,
}

pub struct CheckpointTransaction {
    /// The input Transaction
    pub transaction: Transaction,
    /// The effects produced by executing this transaction
    pub effects: TransactionEffects,
    /// The events, if any, emitted by this transaciton during execution
    pub events: Option<TransactionEvents>,
    /// The state of all inputs to this transaction as they were prior to execution.
    pub input_objects: Vec<Object>,
    /// The state of all output objects created or mutated by this transaction.
    pub output_objects: Vec<Object>,
}

The latest version of this trait can be found here. Running a custom indexing pipeline involves:

  • Running a fullnode, setting enable_experimental_rest_api config to true in its fullnode.yaml file.
  • Implementing your custom logic by implementing the Handler trait (above)
  • Starting an indexer using the provided Indexer Framework by providing:
    • The URL of the FN to query data from
    • Your custom Handler or Handlers
    • The checkpoint to start from

Further Work

There are some additional known improvements that we want to add to the RPC, but have been reserved for future releases:

Filtering Dynamic Fields by Key Type

On Sui one package can extend another package’s objects using dynamic fields. Objects that are designed to be extended this way can collect a number of dynamic fields of completely unrelated types and applications that extended an object with one set of types may only be interested in querying for dynamic fields with those types. Augmenting the dynamic field querying API with filters on the types of dynamic fields will allow applications to achieve this without over-fetching dynamic fields and filtering on the client side:

type Object {
  dynamicFieldConnection(
    first: Int,
    after: String,
    last: Int,
    before: String,
    filter: DynamicFieldFilter,
  ): DynamicFieldConnection
}

input DynamicFieldFilter {
  # Cascading (type requires module requires package)
  namePackage: SuiAddress
  nameModule: String
  nameType: String

  # Cascading (type requires module requires package)
  valuePackage: SuiAddress
  valueModule: String
  valueType: String
}

Wrapped Object Indexing

Wrapped objects (objects that are held as fields or dynamic fields of other objects in the store) present similarly to deleted objects in RPC output, and consequently in Explorer too. This causes confusion, when an object is available but not by querying its on-chain address.

Wrapped object indexing tracks the location of objects that are embedded within other objects so that RPC can “find” an object’s contents even when it is wrapped. Similarly, it can be used to detect Bags and Tables to improve their representation in Explorer as well.

Reinterpreting Package IDs

If an upgraded package includes a new type, that type’s package ID will match the upgraded package’s, but types in the same package that were introduced in previous versions will retain their package IDs.

This complicates reads with filters on type: Constructing such filters requires clients to keep track of the package that introduced each type. This can be simplified using an implicit cast: A type can be supplied as 0xA::m::T and will be cast to 0xD::m::T where 0xA and 0xD are versions of the same package, with 0xD being the greatest version less than or equal to 0xA to introduce the type m::T.

This process saves significant book-keeping for clients who can now refer to all the types in a particular package by that package’s ID, and not by the IDs of the packages that introduced them.

Improvements to Dry Run

  • Being able to view a stacktrace from Move when a transaction fails during dry-run would improve transaction debuggability.
  • Provide a detailed breakdown of where gas was spent (per-transaction breakdown of computation cost for gas, per-object breakdown of storage costs and rebates, effects of bucketing).
  • Offer Display output on objects in Dry Run.

Package Upgrade History

Currently, it can be difficult to track all the versions of a package, as each version has its own Object ID. This situation can be improved with dedicated APIs for fetching all versions of a specific package.

Verifiable Reads from Fullnode

Although the proposed architecture decouples the RPC service, indexer and storage layer from fullnodes, an RPC provider is still currently required to ingest data only from a fullnode that they trust (which often means RPC providers run their own fullnode).

A trustless interface between indexers and fullnode (where the indexer could verify the integrity of data it reads from any given fullnode) will remove this requirement, as it eliminates the risk that a node run by an adversary could “lie” to an indexer for its own benefit.

API for Exposing RPC Limits

Feature request from @FrankC01 for pysui.

Some of the validation steps that the RPC performs on transactions can be replicated on the client, by SDKs, to avoid sending requests that are guaranteed to fail. Not all validation can be moved to the client (for example, it’s difficult to predict timeouts, or estimated query complexity), but this does not diminish the value of avoiding hitting other limits ahead of time.

Facilitating this feature in SDKs requires exposing information about limits in its own API. The core RPC implementation will include the following parameters:

type Query {
  serviceConfig: ServiceConfig
}

type ServiceConfig {
  # Maximum level of nesting in a valid GraphQL query
  maxQueryDepth: Int

  # Maximum GraphQL requests that can be made per second from this service
  maxRequestPerSecond: Int

  # Maximum number of responses per page in paginated queries
  maxPageSize: Int
}

Node providers are free to extend this with their own limits, to help SDKs avoid hitting their domain-specific limits.

@sheldonkreger
Copy link

sheldonkreger commented Sep 8, 2023

Thank for sharing this major announcement. You may not be aware, but my team has implemented an indexer for Sui Object data. It also has a basic GraphQL interface. We were awarded a Sui Foundation grant and have been working on it for several months. Your project is much more ambitious, but it would be great to contribute. We really ought to have a call together and make a plan to coordinate efforts. Thank you.
https://github.com/capsule-craft/huracan
https://www.youtube.com/watch?v=tWLGOlvA9mk

amnn added a commit that referenced this issue Sep 12, 2023
Introduce config for functional groups to enable and disable groups of
features in the schema, so that different operators can choose to run
the RPC service with different sets of features enabled (for example
running public nodes without analytics).

The precise set of functional groups may shift -- the current list is
derived from the list in the RFC (#13700).

This PR only introduces the config, and not the logic in the schema to
listen to the functional group config.  The config is read from a TOML
file whose path is passed as a command-line parameter.

This PR also brings the `ServerConfig` (renamed to `ConnectionConfig`)
into the same module, and applies a common pattern to ensure there's a
single source of truth for default values (the `Default` impl for the
config structs).

Stack:

- #13745

Test Plan:

New unit tests:

```
sui-graphql-rpc$ cargo nextest run
sui-graphql-rpc$ cargo run
```
amnn added a commit that referenced this issue Sep 12, 2023
Introduce config for functional groups to enable and disable groups of
features in the schema, so that different operators can choose to run
the RPC service with different sets of features enabled (for example
running public nodes without analytics).

The precise set of functional groups may shift -- the current list is
derived from the list in the RFC (#13700).

This PR only introduces the config, and not the logic in the schema to
listen to the functional group config.  The config is read from a TOML
file whose path is passed as a command-line parameter.  It is stored
as data in the schema's context so that it can later be accessed by
whatever system limits access to endpoints by flags.

A stub "experiments" section has also been added as a place to keep
ad-hoc experimental flags (to gate in-progress features).

This PR also brings the `ServerConfig` (renamed to `ConnectionConfig`)
into the same module, and applies a common pattern to ensure there's a
single source of truth for default values (the `Default` impl for the
config structs).

Stack:

- #13745

Test Plan:

New unit tests:

```
sui-graphql-rpc$ cargo nextest run
sui-graphql-rpc$ cargo run
```
amnn added a commit that referenced this issue Sep 13, 2023
Introduce config for functional groups to enable and disable groups of
features in the schema, so that different operators can choose to run
the RPC service with different sets of features enabled (for example
running public nodes without analytics).

The precise set of functional groups may shift -- the current list is
derived from the list in the RFC (#13700).

This PR only introduces the config, and not the logic in the schema to
listen to the functional group config.  The config is read from a TOML
file whose path is passed as a command-line parameter.  It is stored
as data in the schema's context so that it can later be accessed by
whatever system limits access to endpoints by flags.

A stub "experiments" section has also been added as a place to keep
ad-hoc experimental flags (to gate in-progress features).

This PR also brings the `ServerConfig` (renamed to `ConnectionConfig`)
into the same module, and applies a common pattern to ensure there's a
single source of truth for default values (the `Default` impl for the
config structs).

Finally, `max_query_depth` is moved from `ConnectionConfig` to
`ServiceConfig` as it's a config that we would want to share between
multiple instances of the RPC service in the fleet.

Stack:

- #13745

Test Plan:

New unit tests:

```
sui-graphql-rpc$ cargo nextest run
sui-graphql-rpc$ cargo run
```
amnn added a commit that referenced this issue Sep 13, 2023
Introduce config for functional groups to enable and disable groups of
features in the schema, so that different operators can choose to run
the RPC service with different sets of features enabled (for example
running public nodes without analytics).

The precise set of functional groups may shift -- the current list is
derived from the list in the RFC (#13700).

This PR only introduces the config, and not the logic in the schema to
listen to the functional group config.  The config is read from a TOML
file whose path is passed as a command-line parameter.  It is stored
as data in the schema's context so that it can later be accessed by
whatever system limits access to endpoints by flags.

A stub "experiments" section has also been added as a place to keep
ad-hoc experimental flags (to gate in-progress features).

This PR also brings the `ServerConfig` (renamed to `ConnectionConfig`)
into the same module, and applies a common pattern to ensure there's a
single source of truth for default values (the `Default` impl for the
config structs).

Finally, `max_query_depth` is moved from `ConnectionConfig` to
`ServiceConfig` as it's a config that we would want to share between
multiple instances of the RPC service in the fleet.

Stack:

- #13745

Test Plan:

New unit tests:

```
sui-graphql-rpc$ cargo nextest run
sui-graphql-rpc$ cargo run
```
amnn added a commit that referenced this issue Sep 13, 2023
## Description

Introduce config for functional groups to enable and disable groups of
features in the schema, so that different operators can choose to run
the RPC service with different sets of features enabled (for example
running public nodes without analytics).

The precise set of functional groups may shift -- the current list is
derived from the list in the RFC (#13700).

This PR only introduces the config, and not the logic in the schema to
listen to the functional group config. The config is read from a TOML
file whose path is passed as a command-line parameter. It is stored as
data in the schema's context so that it can later be accessed by
whatever system limits access to endpoints by flags.

A stub "experiments" section has also been added as a place to keep
ad-hoc experimental flags (to gate in-progress features).

This PR also brings the `ServerConfig` (renamed to `ConnectionConfig`)
into the same module, and applies a common pattern to ensure there's a
single source of truth for default values (the `Default` impl for the
config structs).


Finally, `max_query_depth` is moved from `ConnectionConfig` to
`ServiceConfig` as it's a config that we would want to share between
multiple instances of the RPC service in the fleet.

## Stack

- #13745

## Test Plan

New unit tests:

```
sui-graphql-rpc$ cargo nextest run
sui-graphql-rpc$ cargo run
```
@sheldonkreger
Copy link

Sui RPC 2.0 Data Platform Architecture

Note

This post is not meant to be offensive or personal - it is strictly my technical opinion. I value everybody's work and perspective, but I'm going to call out some bad design decisions as I see them currently. I'm in this for the tech, I believe Sui is the best blockchain tech today. I'm really trying to contribute to keep Sui at the very top! Also, Please keep in mind that I have not been included in any discussions up to this point, so I don't have full context about the current plan at Mysten. Additionally, I am going to follow up with another post about Postgres, GraphQL, and the analytical indexer.

Intro

The Sui RPC 2.0 proposal offers the opportunity to deliver a flexible, high performance, and cost-effective solution for engineers working with Sui data. Because developers must either issue RPC calls or manually scrape Sui data checkpoint-by-checkpoint, the community is eager for a new approach, which gives them the data they need in a timely manner.

On the other hand, Sui fullnode operators currently benefit from the simplicity of the existing architecture. Each Sui fullnode can serve all RPC requests independently, which minimizes the infrastructure complexity. This also affords small teams the capability to run their own fullnodes with minimal overhead - a single node or a few nodes behind a load balancer can be quickly configured without any expertise in distributed data platforms. As the fullnode infra grows in complexity, there will be winners and losers in varying contexts.

Having spent many years as a software engineer, many years as a site reliability engineer, and the last few years as an architect, I have dealt with data systems from every angle. I've felt the pain of slow API responses, rigid query patterns, inadequate documentation, and faulty payloads. I've been on-call for large, distributed databases after software engineers deploy unoptimized queries, bringing clusters to their knees and pulling me out of bed, late at night. I've also been at companies full of brilliant people, all of whom hate each other, because hastily designed data pipelines are so poorly implemented that nobody can actually identify the true values of specific data points.

Section 1: What is An Indexer?

Coming from a "big data" engineering background, the term "indexer" was something I only heard once I joined a blockchain team. It is a reflection of the chronic lack of expertise in architecture and data pipeline management in the blockchain industry. When blockchain engineers say "indexer" what they are really discussing is a system with several components. It is important to speak precisely about data management, because each of these components serves a different purpose, and different technologies are suited for each challenge.

The Anatomy of a Change Data Capture Pipeline

In data engineering, "change data capture" systems are implemented to synchronize data modifications from one data store to another. On Sui, this means synchronizing the "live" data stored in RocksDB - normally served via RPC - into a different datastore. This secondary datastore is only useful insofar as it improves one or more aspects of data access patterns - faster query times, lower compute costs, improved data models, etc.

Real-time CDC pipelines synchronize the "live state" of the upstream datastore into a secondary database. In fact, several CDC pipelines may operate independently to feed different data - or unique representations of data - into several secondary databases. The secondary databases are used to APIs, dashboards, data science operations, etc. Depending on the query patterns required by these applications, different databases and models are chosen by an architect to achieve a specific outcome.

To ensure the accuracy, timely delivery, and resiliency of data in the downstream databases, data engineers often utilize event streaming systems like Apache Pulsar, Kafka, or RabbitMQ. This decouples the data production system from the data consumption system, which ensures that data is never lost during deployments or unforseen outages. It also allows multiple software teams to consume the same data independently, model and enrich it as needed, and write to a database optimized for their application.

In summary, a CDC pipeline involves an upstream datastore (source), a downstream datastore (sink), an event streaming platform, an app which publishes data updates from the source to the event streaming platform (producer), and applications to perform transformations and enrichment, then write to the sink (consumers).

Section 2: Strengths and Weaknesses of the New RPC 2.0 Sui Indexer Architecture

The current Sui core code underway for the RPC 2.0 project includes two "indexers" which do not meet the criteria of a CDC pipeline: sui-analytics-indexer which is intended for analytical data, and sui-indexer which is intended to serve the GraphQL API. There will, presumably, be additional "indexers" added via the hook system (discussed below).

Benefit 1: Shared Types

One of the strengths of the current design is that each component shares the same fundamental data types, laid out in the sui-types package. This means that, in a CDC pipeline, the data producers and consumers will benefit from consistent data models, as long as they rely on the same version for sui-types package.

Benefit 2: Decoupled Database

Moving the database onto its own host(s) allows for predictable write performance on the core RocksDB datastore, helping keep the fullnode in sync. Depending on the database technology, it also allows for vertical and/or horizontal scaling to handle additional query volume, independently of the requirements of the core Sui fullnode / RPC server. This separation also facilitates operational integrity, but only in a carefully designed architecture.

Benefit 3: The Hook System: sui_indexer::framework::interface::Handler;

One of the largest changes in RPC 2.0 is the hook system. Developers may implement a Handler interface via the sui-indexer crate. On the surface, this gives developers access to Sui data in real-time, giving an easy way to perform ETL. However, upon closer investigation, there are a number of problems with this design, as well.

Problem 1: Data Accessibility

Developers working with this handler must deploy their own fullnode in order to access this data. As the fullnode requirements grow, this will cause a larger and larger barrier-to-entry for teams working with Sui data

Problem 2: Producer Data Resiliency

If a developer must run a fullnode to access the data via a handler, what happens if that fullnode crashes? Or, even if there is a scheduled upgrade on the node? The developer must write code to account for missed events, writing complex "backfill" code and tracking which data has been ingested previously. In other words, the developer still needs to implement checkpoint crawling code, which leaves no benefit to using the handler. This is a very complex and finicky problem which many teams will simply brush over and accept some amount of data loss.

Problem 3: Fullnode desynchronization

In the original GitHub announcement, one of the problems mentioned by Mysten engineers is that "We found that fullnodes that have heavy read load can fall behind leading to serving stale data to clients." This is because the underlying Rust code operating on RocksDB often locks data during read operations. In the current implementation, each handler initialized will also initialize a CheckpointFetcher which hammers the RPC server with checkpoint read requests on a configurable interval. (https://github.com/MystenLabs/sui/blob/main/crates/sui-indexer/src/framework/fetcher.rs#L33) Therefore, adding more handlers could eventually cause a fullnode to desynchronize with the network - especially if the handler needs to invoke several RPC calls or Sui core invokations to fetch detailed transaction or object data (or other reads) on each checkpoint. For example: https://github.com/MystenLabs/sui/blob/main/crates/sui-analytics-indexer/src/analytics_handler.rs

Problem 4: Coupled Read and Write Code

The reason a developer will implement a Handler is to capture Sui data and write it to another data store. If the write operation is performed inside the handler, what happens if there is a write error? Even well-written error handling will not stop massive data loss during peak Sui network times. Furthermore, these errors will be logged on the fullnode, requiring the operations engineers to manually detect these errors in these logs. The only way to avoid data loss when the reader/writer code is coupled is to manually track which checkpoints have been successfully processed - which again, is a major complication for the developer and undermines the benefit of the handler over RPC-based checkpoint scraping.

Section 3: Recommended Architecture - Event Streaming

Rather than forcing developers to implement their own Handler, checkpoint validation, error handling, and forcing them to run their own fullnode, Sui data should simply be published to an event streaming system.

The sui-indexer crate should implement a single Handler per type in sui-types - or, at least for the key types such as coin, object, transaction, etc. When a checkpoint is processed by the handler, the data should be fully enriched with all field-level data, then published to the event stream. The event stream will have two topics per type - one for a JSON representation, and one for a compressed format such as protobuf or flatbuffer.

The fullnode.yml configuration file should be updated to determine which types will be published, and whether to include JSON, compressed, or both for each type.

Developers may then consume the events from the topics relevant to their application, writing them to a downstream database of their choice. This is true for engineers at Mysten working with Sui core, as well as for teams indexing Sui independently.

Tech Stack - Apache Pulsar as an Example

Although there are many great choices, I will use Apache Puslar to exemplify the benefits of an event streaming architecture, in general. The important thing here isn't Pulsar - the point here is to see how an event streaming platform will resolve all of the problems discussed in Section 2.

Solution 1: Data Accessibility

RPC providers can issue API keys for Pulsar topics and monitor resource consumption for each client (built-in feature). Developers never need to run fullnode infra, they simply consume messages from Pulsar.

Solution 2: Producer Data Resiliency

Since there is only one Handler producing data, this core code can be carefully crafted by the Mysten team to ensure high resiliency. Pulsar may be configured to write data to several nodes before sending a 200 response back to the producer. Therefore, Pulsar topic data is resilient to losing one or more of its data nodes (Bookies).

In addition, Apache Pulsar can (optionally) be configured to enforce data types on a per-topic basis. In this design, any producer which attempts to publish a malformed data point will be rejected. Administrators can monitor for rejected data points and trigger an alert. This ensures consumers will never recieve malformed payloads.

Solution 3: Fullnode Desynchronization

Since there is only one Handler producing data, this core code can be carefully crafted by the Mysten team to ensure minimal read volume. Stress-testing can be done exactly once per release cycle.

Solution 4: Coupled Read and Write Code

Because the Handler on the fullnode is fully decoupled from Pulsar and the consumer code, any errors in any given consumer have no impact on the flow of data to other consumers. When a consumer has an error, they can attempt to reprocess the message as many times as needed. Pulsar optionally allows an ACK/NACK protocol, where messages must be ACKED by consumers before they are deleted from the topic. All unacked messages are persisted indefinitely in Pulsar, so even a long-standing outage requiring manual intervention in the consumer code will not incur data loss.

Developers may implement independent consumer groups for each data sink, with no additional stress on the fullnode. For example, if a team needs to publish Sui objects to multiple databases, they would implement one consumer group per database, and they would all read from the same Pulsar topic. The producer would be unaffected, regardless of how many consumer groups are added over time.

Additional Benefit 1: Developer Freedom

Because data is published to Pulsar as JSON, consumers may be written in any programming language. Developers do not need to write Rust to work with Sui data. For teams seeking to minimize latency, the compressed topic data could be used - depending on the format, various programming languages are supported.

Additional Benefit 2: Operational Excellence

Consider a scenario where Sui core is being updated, and the data producer (Sui Fullnode) must be restarted. This will cause an interruption in data flow on the producer side.

Teams concerned with high availability may operate two or more fullnodes, each publishing identical data to Pulsar. The Pulsar topics may be configured to reject duplicate data, meaning consumers never have to handle deduplication. During the upgrade, one fullnode is restarted while the other continues to run. Once the node is back online, the next node is restarted. Data flow to Pulsar is never interrupted.

Additional Benefit 3: Integration

Pulsar topic semantics are fully compliant with Kafka standards, meaning that any Kafka client may consume data from a Pulsar topic.

In addition, Pulsar has several built-in data sinks, and integration with Apache Flink for advanced data pipelines. Countless cloud-based datastores integrate with Kafka and Pulsar as point-and-click data sinks.

Next Steps

Later this week, I will make an additional post discussing the Postgres data sink, GraphQL server, and analytical indexer.

@longbowlu
Copy link
Contributor

longbowlu commented Sep 19, 2023

@sheldonkreger this is really really good feedback, thanks for thinking about them and taking the time to share your insights. I'd like to share some of my thoughts. First I want to acknowledge that despite of a lot of my time spent in this topic at mysten recently, I certainly have blind spots here and there throughout the entire thought process. I'd also use "I" instead of "we" below to avoid over-representing my teammates although I believe most of them are agreed upon.

The Personas

When I initially thought about indexing blockchain data, I asked: who are we solving the problems for? I categorize them into two types:

  1. professional RPC providers. It's part of their bread and butter to ETL the data, store somewhere and serve their customers. Their customers may be end users such as analysts (a) who write queries to trace defi TVL changes, or could be a NFT team/builder (b) that leverages the data to improve user experiences, track market performance etc.
  2. teams/builders. These teams have appetite to run their own indexers (and even fullnodes), perhaps because they have the expertise so it's easy, or because they value decentralization, or because they need highly customized data processing that no professional RPC providers could offer cheaply.

The Values

There are a few values that I honor strongly in my thought train:

  1. decentralization: I have to admit that separating RPC serving from fullnode does run the risk of reducing the number of fullnodes, possibly because most builders will choose to run indexer only, and use an paid fullnode offered by the professional RPC providers. But it's a change I'm afraid we have to take - sui produces tons of data, because of its high throughput and object model. We don't want to go with the route to ask for higher and higher hardware requirement to run a node. So data pruning + offload the data serving to another server is our decision. That said, as I mentioned above, many people do value decentralization a lot, and if they operate business over critical data, they have the incentives to run and only trust their own nodes.
  2. community-driven: while mystenlabs proposes RPC 2.0 and writes the sui-indexer code, it by no means says sui-indexer is the only indexer/data server people can use. On the contrary, I'd love to see various implementations from other builders, extensions on top of sui-indexer or even different specs. I believe this is well understandable. Will touch on this again later.
  3. usability & customizability: As the personas lay out above, we want to offer the best usability for each group, hopefully by giving them multiple options to make the trade-off. We also look to provide highly customized/flexible solutions to meet as many requirements as possible. As of today, we still have a long way to go on this item, RPC 2.0 is a small and big step towards it.

The High Level Architecture

Now I want to share my thoughts on what the indexer should look like, why and how it differs from your recommendation. Note we will only talk about write path (indexing) here and omit the read path, as the former is the focus in your proposal.

(a.) Sui-indexer

The graph below shows the diagram of running a sui-inexer. Sui-indexer gets checkpoint data from fullnodes via REST APIs. There are a few Handlers in sui-indexer that process the data by different logic, collectively generating "indexed data", including Transactions, Events, Objects, Balance Changes, so on and so forth. Then sui-indexer stores them somewhere for reads and future processing.
Who will run sui-indexer? Professional RPC operators will, builders may.
image

(b.) custom indexer

The graph below shows a custom indexer.

image

This custom indexer could be written in any language (sui-analytic-indexer is an example of custom indexer, which happens to be written in Rust). Essentially it reads checkpoint data from fullnodes, using the same rest apis, runs their own customized logic and produces the indexed code. The logic can be as niche as possible, e.g. tracking whale wallets, monitoring NFT trading activities or defi TVL changes. There are two things I'd like to stress here:

  1. You mentioned "Handler" a few times. But as you can probably tell know, the handler is a minor thing in RPC 2.0. It's only relevant for people who write additional indexing logic in Rust. A custom indexer is not bounded or forced to implement it if they don't like to.

  2. We talked about highly customized data. There is no way to implement every possible logic in one single place, which means builders who need access to this kind of data have to build something on their own, if they don't want a deal with professional providers. IMO the same applies to the message/streaming approach in your proposal.

(c.) streaming/messaging (@sheldonkreger 's recommendation)

To make sure I fully understand your suggestion, I drew this:
image

Here is a fullnode, and an indexer that processes all transactions, outputs the indexed data and pushes them into a queue. The users will subscribe to interesting topic to consume the data. If I understand correctly, the fullnode and indexer is run by professional node operators so the builders could focus on using the data directly.

There are two highlighted points on my mind:

  1. as mentioned above, it's impossible to have an "all-in-one" handler that produces every piece of data people want. So "Since there is only one Handler producing data, this core code can be carefully crafted by the Mysten team to ensure minimal read volume. " is not exactly true.
  2. streaming is great in many cases but not in all cases. For example, streaming historical data could be tricky for users who want to access older information. Also having a message queue as the only end user facing system IMHO too restrictive compared to having a list of options including using RPC calls to get the data or bulk downloading the data from a data lake (yes, there is a data lake plan that we forgot to mention in RPC 2.0).

Does it mean I don't like message queue? Absolutely not, on the contrary again, I believe streaming offers arguably the best usability in many scenarios. Message queue definitely has a role to play in the blueprint: here comes another graph:

(d.) from a RPC Provider's point of view

image

Here's the north star (end game) we envisioned, consisting of several elements we discussed above, including message queue. A buidler/user have the flexibility to choose different data sources according to what they are looking for. Resync? use data lake. Keeping up to the network? use message queue. Looking for some niche data? maybe they can find it in a certain storage.

As I'm writing here, my realization is we didn't explain our end game plan as clearly as possible, and missed a few important puzzles here, including the data lake story. There is another route that we will go, which is a plugin system inside fullnode that allows you to operate the data (likely passing them to another server for heavylifting processing). This looks like Solana's Geysey if you are familiar with that.

All in all, we should provide more details, likely in a separate post to avoid dilution.

Answering Some Questions

I hope it's a bit more clarified at this point, now allow me to discuss some detailed comments in your proposal:

Developers working with this handler must deploy their own fullnode in order to access this data

Not necessarily, the developers could use a third party fullnodes if they can high tolerance of data integrity & centralization. And i believe it's the same group of people who will directly consume data from a streaming service or any data sources that drawn in graph (d).

what happens if that fullnode crashes? Or, even if there is a scheduled upgrade on the node? The developer must write code to account for missed events, writing complex "backfill" code and tracking which data has been ingested previously.

This is true with all data sources, if I'm a serious builder. Because even the data source itself (say, pulsar) guarantees exactly once delivery, I don't want to blindly trust the data producer hasn't missed any data at all. So, a watermark will be required, the exact definition would needed to be agreed upon separately.

Also, if one builds on top of sui-indexer, we guarantee no data would be missed by using strict watermark checking. This is handled in the shared code path so the builder does not have to. This is one of the place that takes care of it. (Please don't be scared by the huge PR, we're splitting them into a stack of smaller ones)

This is because the underlying Rust code operating on RocksDB often locks data during read operations.

It's very true but is less relevant in this case. Previously fullnode was heavily swamped by certain types of queries because of the rocksdb performance issue. It has been better after some major improvements but still not doing the best. However, this is mostly due to fullnode directly serving the data traffic, meaning that any queries will hit fullnode and cause the embedded rocksdb to suspend in the unlucky case. This is the exact thing we will avoid in RPC 2.0, that is separating the read traffic from fullnodes, into somewhere that read is more efficient, such as PostgresDB and many other alternatives.

You may wonder if the fetcher introduced here could add the burden for fullnode rocksdb? It's possible but I won't worry too much. Here is why:

  1. most of the time, fetcher would only ask for the latest data, which is very possible still in memtable. Furthermore, We can add heavy cache in front, to avoid disk lookup.
  2. as mentioned before, fullnode is destined to prune data to keep size manageable. So the historical data will be served by data lake (today it's dynamoDB).

Then who serves read quest? As written in RPC 2.0, the graphql server will.

Problem 4: Coupled Read and Write Code

I think this is somewhat implementation details, but usually we will recommend builders to separate the indexing code and commit code. For example, take this huge draft as an example again, here we have a fetcher task that fetches data from somewhere (e.g. fullnode), an indexer task that receives the raw data and indices it, and finally a committer task that writes the indexed data to storage, or to message queue for downstream subscribers. Each tasks would have their own error handling to make sure no data loss. If you are interested, I'm happy to dive deeper into how this is achieved.

Lastly I like many ideas you have around streaming (I'm more familiar with Kafka v.s. Pulsar) and configuration.

Looking forward to your thoughts on Postgres, GraphQL and others!

@gegaowp
Copy link
Contributor

gegaowp commented Sep 19, 2023

thanks @sheldonkreger for the detailed feedback!

Coupled Read and Write Code

the current indexer codes indeed has both writing and reading codes in the same trait, and I think we will want to separate them indeed. It's worth noting that in today's deployment, writer and readers are separately deployed to isolate the concerns.

message queues like Pulsar

a sweet-spot use case of MQ is like: a builder wants to read a subset of nearly fresh data, does ETL on the fly and renders sth on the frontend. As Lu mentioned above, 1) MQ is not great for bulk fetching, like fetching all first 1000 checkpoint data since genesis; 2) if the builder has to persist custom historical states, keeping the watermark is a minimum extension imo to ensure data integrity.

with this in mind, I am thinking of message queue as a plug-in of "data providers" including fullnodes and indexers, where these data providers can be configured to open up queues for various topics when the sweet-spot use cases show up; Mysten can do a Geyser like configurable plug-in on fullnode, and maybe a pub-sub interface on indexer as well.

Happy to chat more in the other proposal about the architecture!

@sblackshear
Copy link
Collaborator

@sheldonkreger, thanks so much for the thoughtful, balanced, and educational feedback. We are absorbing all of this.

@sheldonkreger
Copy link

Hello everybody - I haven't yet read through your comments, but I wanted to post my thoughts about the Postgres indexer and a few other things, since I've finished my review of that code. I will take a look and reply sometime this week. HUGE THANK YOU!

Note

This post touches the issues I see with the existing code base, the Postgres-backed "indexer", and briefly discusses a different solution for the GraphQL datastore. I belive that, as the project matures, more teams will need unique data representations, which will be best served by various database technologies, such as columnar, time series, graph, etc. But, a document database would be a good starting point, if GraphQL is the top priority.

As a reminder, I do not have full context of what work is planned, and some of the issues I bring up are probably well understood and being addressed later.

The Postgres Indexer

The fundamental thesis of this document is that the Postgres-backed indexer is unsuitable for production workloads for both current state and historical data.

Data Model

The sui-indexer crate has a new directory for migrations_v2, which shows the working schema. Although the schema inculdes several Sui types written to unique tables, such as event, packages, etc, we will focus on objects since they are the most numerous and highly queried entity. However, many of these tables suffer the same issues as objects.

Sui Objects are Unstructured

Relational databases are ideal for tightly structured data, where all fields are known at write-time. Each field can be written to a special column with a fixed data type, and relationships between entities across tables are managed with foreign keys.

Although Sui Objects all share some fields - such as object_id and object_version, the most meaningful data is stored in the object itself. The Sui Move programming language allows developers to deploy unique object types with fields of their choice. Because each Sui package may define several unique object types, there is no way to map object-level data into a pre-defined relational model. In short, Sui objects are unstructured and do not belong in a relational database.

Storage of Serialized Data

Looking at the schema for the objects table, we can see several fields defined as bytea data type. In Postgres, data in these fields cannot be directly used for filtering or indexing. Instead, an intermediary application must fetch all of the data in the entire column, deserialize it to its native type, and then manually scan the data inside application memory. This is not an effective way to perform database queries. GraphQL simply will not work in this schema.

Inadequate Index Design

Because the actual object data is stored as bytea in the object_digest field, there is no way to build indices on any object-level data. Database performance, especially while serving GraphQL, is highly contingent upon effective index design.

Examples of Impossible Queries

Consider the following queries, which are very useful for a wallet, game, or NFT marketplace:

  • Fetch all objects matching a value on a particular field.
  • Fetch all objects owned by an address.
  • Fetch all objects created by a particular package.

Because object data is stored as bytea, these queries cannot be implemented.

Operational Limitations of Postgres

Scaling Postgres

As more rows are added to tables in Postgres, queries require more resources and compute time to execute. Because Postgres does not support data sharding - or any form of distributed querying - all of the data in the database must fit on a single disk. Although it is possible to deploy read-only replicas to offload query traffic across several nodes, large tables are still cumbersome to query, especially without a strong index design.

In a multi-node Postgres cluster, there is a single leader node which handles all write operations. Writes are eventually persisted to each read-only replica across the cluster. Large tables suffer notoriously from replication lag.

Load Balancing and Primary Node Failover

Query traffic must be distributed using an external load balancer. If there is an outage on the primary node which handles writes, the load balancer must detect this issue and redirect writes to a different node. Additionally, the standby server must be reconfigured as the primary server before it will accept writes. Therefore, clients issuing writes must be prepared to throttle write operations until the node has been reconfigured. Otherwise, data loss is inevitable.

Materialized Views

One great feature of Postgres is materialized views. Briefly, a database operator may deploy a precalculated database query which reads data periodically and refreshes a value in another table. This is useful for replacing common queries which perform large aggreggations. However, data stored in bytea is ineligible for materialized views. Therefore, the current schema does not support meaningful materialized views.

Proposed Architecture: Document Database

Document databases are designed to handle unstructured data, and are therefore the best choice for Sui objects. In our experience, maintaining a small collection size in MongoDB resulted in higher query performance and lower compute costs. Therefore, a document database could be used to serve the 'live data' - the latest version of each Sui object. Historical data, aggregate data, and and so on, may be better suited to another database type (columnar, time series, graph etc) - or even separate MongoDB collections. This is yet another reason to embrace the event streaming architecture described previously.

Although there are several great document databases on the market, we will use MongoDB to exemplify the benefits of a document database. However, all document databases share most of these features, and selecting the best database is a different topic. This section is brief, only meant to propose the idea, rather than to architect the entire solution.

Data Model

Document databases do not enforce a schema at write time. Just like Sui objects, each document may contain unique fields and nested fields. These fields are instantly accessabile to query when they are written. There is no need to store serialized data in a document database. However, the serialized data can also be stored, if it is useful. In the Huracan Indexer, we stored all object data in MongoDB as its native data type, plus the serialized data for each object, in the same document.

Index Design

Because documents are unstructured, document databases rely heavily on indices for query performance. Any fields commonly used for filtering in queries should be indexed. Indices must be explicitly created, although MongoDB does have some utilities to automatically detect which fields should be indexed. In the Huracan Indexer, we created indices for object_id, object_type, owner, among others. Tightly structured types, such as coin, could be placed in a unique collection with particular indices. But, the schema design is beyond the scope of this document.

Query Examples

Consider the following queries, which are very useful for a wallet, game, or NFT marketplace:

  • Fetch all objects matching a value on a particular field.
  • Fetch all objects owned by an address.
  • Fetch all objects created by a particular package.

Because object data is stored in the document, these queries are all possible.

Operational Benefits

All modern document databases are horizontally scaleable, where documents are sharded across several nodes. Data is stored in a primary replica and one or more secondary replicas, in a leader-follower scheme. In MongoDB, the query engine will automatically balance read requests across all nodes in the cluster. If read volume becomes too great, more replicas may be added to serve requests. One challenge of MongoDB architecture is ensuring that collections are sized optimally - again, a topic for another discussion.

Although writes are sent to an individual leader node, clients may be configured to connect to all nodes in the cluster. If the leader node goes down, an election is held and a new node is selected as the new leader. Clients can be configured to automatidcally detect when this happens and quicky send write operations to the new leader.

Polymporphism is the Wrong Pattern for Serving Sui Data

In the sui-graphql-rpc crate, there is a trait called DataProvider. This allows developers to implement serve the same data through different databases. While this may feel "correct" from an acedemic software engineering perspective, it will cause a number of problems, including poor query performance, operational overhead, and increased compute costs.

Because each DataProvider must serve the same queries, each database must contain the data in such a way that it can serve all of these queries. Rather than designing unique data models to leverage the unique strengths of different databases, polymorphism at this layer will force engineers to model data suboptimally in every data sink.

For example, a columnar database like ClickHouse is designed to serve aggregatate queries rapidly. Fetching individual rows is much slower than it would be in a relational database, like Postgres. If the DataProvider abstraction is forcing both ClickHouse and Postgres to serve aggregate queries and individual rows, there will be no way to optimize queries or the data model to fully leverage the underlying technology. Database performance will also be excruciatingly bad.

Instead, each DataProvider should only serve particular queries which leverage the database technology, schema, and indexing architecture provided specifically by that system. For example, GraphQL queries should all be served by a database optimized for the workload, and analylitcal queries should all be served by a different database.

Example - GraphQL via RPC Invocations

The first data provider implemented in the sui-graphql-rpc crate is the sui_sdk_data_provider. This code is a perfect example of why polymorphism is counterproductive in this situation. The GraphQL server is initialized using a DataProvider which fetches all the data by making countless RPC invocations. Because the data is not stored in a database and modeled for GraphQL query optimization, this code is going to be extremely slow and computationally expensive. In its current state, I suspec this code is unusable, and it will only linger as technical debt once a specific datastore is implemented for GraphQL queries.

@oxade
Copy link
Contributor

oxade commented Sep 19, 2023

In the sui-graphql-rpc crate, there is a trait called DataProvider. This allows developers to implement serve the same data through different databases. While this may feel "correct" from an acedemic software engineering perspective, it will cause a number of problems, including poor query performance, operational overhead, and increased compute costs.

Thanks for your feedback on this. I should flag that it's more useful to focus on the design proposed and not on the current code which is still experimental pending our data infra changes.
For example DataProvider is a temporary/PoC approach to allowing us try out different data sources (which are still being designed), and this paradigm not be present in prod RPC 2.0 release. Also sui_sdk_data_provider is also a simple way to test parts the GraphQL flow by using the legacy SDK. This will also not be used in prod.
In summary I wouldn't use the current state of the GraphQL internal logic as an indication of what's coming. We're currently still in dev and not yet optimizing for perf. We should do a better job of commenting that certain parts of the code are experimental/temporary.
Please keep the feedback coming, and happy to chat more!

@longbowlu
Copy link
Contributor

@sheldonkreger , wow you again fascinated me with your deep thinking and rich experiences with different systems! I want to contextualize you so perhaps it's easier to see where we attempt to land and why. I'd break it down into three sections: Cost efficiency, Extensibility and Storage.

Cost Efficiency

Your comment on unstructured data is 100% right. It's a pretty powerful on-chain feature that Sui uniquely has. However our realization is, it's prohibitively expensive to provide an unlimited set of indices on an unlimited data set. To give a quantitative sense, today on mainnet, there are over 205M live objects, not to mention the their older version and deleted objects. This number will grow indefinitely. As you said, the objects internally could be highly unstructured with fancy nested fields. Allowing this level of indexing has two major issues according to our past experiences, which help guide the new indexer architecture design:

  1. too many indices slows down ingestion. As a high throughput blockchain, it's totally possible we will reach 5 or 6 digits of records to update per second (we already hit 4 digits TPS in Quest 1). In fact, the schemas you see in migrations_v2 contain the least number of indices we consider absolutely necessary to support the majority use case.
  2. indices are expensive to keep. Depending on the use cases, the indices size could be more than 20% of the data itself.

One may argue that 4-6 digits per second is nothing as big tech internally runs much higher throughput, why are we so concerned? This is one of the major differences in blockchain data. In data rich companies, all the data in their ETL pipelines are valuable for them. But in an open blockchain, this is not the case. A gaming studio only cares about gaming metrics rather than Defi data. An NFT team have no desire to pay for social media app data. As a result, the professional RPC providers has no incentives to keep around data that is not going to generate business value. We want them to be profitable by operating on Sui, to make this a better ecosystem. In fact, this is a core value we honor throughout the decision process, as also mentioned in my previous reply.

Let me give a concrete example on not having narrowly needed data in the main flow. Currently, the analytics data in Sui Explorer is served by indexer. This is baked in the main flow of indexer V1. It turns out to be not great, because active addresses and popular packages are not useful for most of the people, and it becomes a burden to maintain it in the main flow. That's why you see sui-analytics-indexer, now taking the responsibility to move the data to a data warehouse/OLAP platform where such computation are much better suited.

To sum up, we don't plan to have object fields indexed in the major tables in migrations_v2 because of high data volume and resulting financially infeasibility. Now, if a builder is interested in indexing some of these niche data, what options do they have? This leads into the next section.

Extensibility

The new indexer design aims to provide high customizability and extensibility. As touched in my previous reply, you can have separated handlers or a custom indexer that indices data with your own logic.

  1. for the team/builder persona, they could subscribe to the message queue (chart d) for the basic indexed data, and then keep the part they care about and process them with their favorite language & storage system.
  2. for the professional node operator persona, they may run custom indexer to index NFT data as it's a big category. They may work with a builder team to add special indices if the requirement is niche. They may transform the data into a different from for analytics purposes etc etc. There is no limit here at all because it's where their edge comes from.

This is also where we hope community can come together and participate in building composable infrastructure on Sui.

DB Selection

At this point you could even guess what i'm going to say :). We may end up having different data layers eventually. In (chart d) in my previous reply, the storage could be a relational DB, k/v store, document db and even a data lake. We are not going to implement everything on day 1, so we start from relational DB.

To add some context, even though the syntax is Postgresql, the underlying databases could be anything as long as it's psql compatible. For example, today we are running in Aurora (Postgresql) because our old vendor has a hard limit in disk size (~5TB). Some offers would avoid the drawbacks you mentioned about the vanilla Postgres.

In fact we are not even royal to Postgresql compatibility in the relational space. Ge @gegaowp is leading an effort to evaluate TiDB and CockroachDB as the new home. We also considered No-SQL databases briefly to start with, but it's hard to reason about consistency there, something we want to preserve in read.

Regardless, I believe different data deserves different storage. With what you see in migration_v2, it's within Postgresql's capability. If a RPC providers offers rich object data indexing, then +1 with you document DB could be much better.

Thanks again for your feedback and time to reading!

@amnn
Copy link
Contributor Author

amnn commented Sep 20, 2023

Thanks again for the detailed feedback @sheldonkreger ! @oxade and @longbowlu have already shared most of what came to my mind, but I'll add a couple of extra notes here and there:

Data Model

It's useful to start by describing our goals for the base RPC implementation. I don't think I did a good enough job describing that in the proposal, that's my mistake. We're aiming to offer an API that gives anyone visibility over the state of the entire network: If it happened on chain, you can find out about it through RPC, in near real-time.

In previous iterations of the RPC, we introduced more and more expressive ways to query this data, but as the volume of transactions on the network increased, we found that some of these decisions hurt our ability to scale to handle that volume (both keeping up with throughput and storing all the data) and to reflect a consistent view of the network, which were factors we couldn't compromise on.

When we revisited the schema for v2, this was top of mind for us, so we've made many decisions that limit expressivity to leave us room to scale while preserving consistency. The set of impossible queries you mentioned is a good case study to explore the implications of this:

Examples of Impossible Queries
Consider the following queries, which are very useful for a wallet, game, or NFT marketplace:

The fact that you identified particular kinds of application (wallets, games, marketplaces) is interesting here. We also went through a similar process, but what we aimed to do was support the common denominators really well (the queries that everyone would need), and then rely on the various extension points to offer a way for:

  1. RPC providers to differentiate and offer domain-specific data using storage that's suited to it, but exposed through the same interface as everything else.
  2. Individual apps to subscribe to a restricted set of data from RPC, and then post-process it further for their own needs (which could involve providing access to more expressive queries on a subset of the data, for their own needs).

In the three verticals you mentioned, we would expect:

  • Wallets to be served well by the base RPC (similarly for explorers),
  • Games to follow extension strategy 2, potentially with the help of some hosted solution where their post-processor and final database/query API can be managed by a service that also runs a shared, generic indexer and RPC.
  • Marketplaces to benefit from extension strategy 2, where RPC providers generally understand what extra information is required to run a marketplace, and offer this extra data, for a fee.

Because object data is stored as bytea, these queries cannot be implemented.

  • Fetch all objects matching a value on a particular field.

The reason we don't support queries against object data is not because PG-compatible DBs cannot index this data -- indeed they offer the notion of a GIN (Generalized Inverted Index), which is similar to the posting list indices that document DBs use. We use this in other places, like indices on transactions, and we could have stored object data as JSON with a GIN to offer queries on fields, but we chose not to because of scalability concerns and not seeing a clear demand for this kind of functionality in the publicly operated RPC service.

  • Fetch all objects owned by an address.

This query is currently supported, using the objects_owner index:

CREATE INDEX objects_owner ON objects (owner_type, owner_id) WHERE owner_type BETWEEN 1 AND 2 AND owner_id IS NOT NULL;

  • Fetch all objects created by a particular package.

This query is also supported by the schema, using a GIN on tx_indices (+ a join to get the created objects from the transactions table):

CREATE INDEX tx_indices_package ON tx_indices USING GIN(packages);

This is not to say that MongoDB couldn't do all this (I don't know it well enough to say if it can or not), but hopefully it helps identify the properties that we wanted to maintain (e.g. consistency, scalability) while improving expressivity, so we can discuss MongoDB (or other NoSQL/Document DB solutions) in those terms as well.

DataProvider for GraphQL

To add to what @oxade mentioned on why we have this API, it allows us to create a POC of the GraphQL API (just the queries it offers) without being blocked on the implementation work for the Indexer and its data layer. I.e. we know that we can't build a POC of data loading this way, we just needed a way to expose meaningful data from the API, so that we could see it working and test it. In fact there are some crucial differences between how RPC 1.0 and 2.0 work that mean that we could never write an efficient RPC 2.0 implementation that serves data that originated from RPC 1.0!

Once the schema for the data layer is implemented, and we can serve data from a database that has that schema, we will move to a data loading implementation that is specialised to it (although it may go through a transition phase where we serve a read-only snapshot of that data through the current interface in order to get it in SDK builders' hands sooner).

@sheldonkreger
Copy link

longbowlu Thanks for your response! I'll get to everybody, starting with yours.

Thank you for the clarification about all of your plans. It really seems like the team is going in the right direction and that we generally share a strong vision for data access on Sui. All of the pros an cons you mentioned are insightful and I am in agreement with your conclusions. The personas and the values you mentioned are excellent, as well. Additionally, your understanding of my proposed architecture is pretty close, but allow me to clarify with a diagram. Apologies for not including this previously.

Architectural Diagram

image
The blue areas I envision as part of RPC provider infrastructure, and the green I envision as an independent development team.

The key difference is that I suggest using an event streaming system as an intermediary between the data source (handlers using RPC invocations) and all downstream data sinks. This would allow for a single implementation of the handlers, and allow small, compartmentalized consumers for each data sink. Decoupling this code ensures resiliency, and it makes it easier to maintain custom transform/enrich logic for each data sink. Each consumer group can be updated or fail independently, and pick up where it left off in the stream. All of the other consumer groups remain unaffected.

Backfilling via Checkpoints and RPC Invocations - Pain and Suffering

For the Huracan indexer, we implemented our own backfilling code which leverages countless RPC invocations.

https://github.com/capsule-craft/huracan/blob/main/main/src/etl.rs

Concurrent ingest workers are implemented with Tokio channels, which allows us to backfill several checkpoints concurrently. Because we are ingesting full object data as JSON and writing to MongoDB, we have a multi-step process:

  1. Fetch checkpoints.
  2. For each checkpoint, fetch transaction blocks.
  3. For each transaction block, gather object IDs + version for mutated objects.
  4. For each object, fetch the object via RPC.
  5. Write to MongoDB in batches, using a separate Tokio channel.

In addition to the complex Tokio channel code, we also have to share state between channels to avoid re-reading data in flight in another channel. We use RocksDB to track checkpoints, transaction blocks, and objects that are in-flight with pending RPC requests. There's also some ugly pagination handling for te transaction blocks. We also have to handle graceful shutdowns using RocksDB. Each Tokio channel can be configured with a capacity to limit our RPC invocations and frequency of writes to MongoDB.

You can see a demo of the configuration and deployment here:
https://www.youtube.com/watch?v=J2OnQpS0FNw

All of this was very complex to write. It will have to be re-written if there are changes to the RPC spec. Observability - despite my best efforts - is pretty difficult in Tokio. Managing state between threads is tough in any language. Frankly, I hope nobody ever has to write code like this ever again. It would be much easier to simply consume the data from an event stream, with the assurance that all of the data has been published by the sui-indexer system.

Backfilling via Event Stream: EZ Mode

Rather than expecting every team to develop and maintain code like ours, something similar should be moved into Sui core. The Sui Indexer should include a small API which accepts requests for backfilling data, where the data is published to the event stream.

image

In the Sui Indexer code, the backfilling data could be pulled from the data lake rather than via RPC. Or, the Geyser-style plugin you mentioned. Of course, builders could work directly with the data lake, but due to the volume of data they may need to request, I think it would be easier to manage on an event stream. Working with the data lake, builders would still need to handle query syntax for the data lake, pagination, and error handling during reads. An event stream avoids these complications.

This is actually very similar to what you mentioned here:

There is another route that we will go, which is a plugin system inside fullnode that allows you to operate the data (likely passing them to another server for heavylifting processing).
I did work with Geyser while I was with Metaplex. We forwarded data out of Geyser into an event stream using flatbuffers, and then ingested with a consumer written in Rust.
https://github.com/metaplex-foundation/digital-asset-rpc-infrastructure

Addressing Some Concerns

This is true with all data sources, if I'm a serious builder. Because even the data source itself (say, pulsar) guarantees exactly once delivery, I don't want to blindly trust the data producer hasn't missed any data at all. So, a watermark will be required, the exact definition would needed to be agreed upon separately.

Very true! Having written custom backfilling code, I can say this is "easier said, than done." It seems like if I trust my RPC provider for RPC invocations, I should be able to trust them for data in their other APIs as well. But, this is a very valid concern.

Previously fullnode was heavily swamped by certain types of queries because of the rocksdb performance issue. It has been better after some major improvements but still not doing the best. However, this is mostly due to fullnode directly serving the data traffic, meaning that any queries will hit fullnode and cause the embedded rocksdb to suspend in the unlucky case. This is the exact thing we will avoid in RPC 2.0, that is separating the read traffic from fullnodes, into somewhere that read is more efficient, such as PostgresDB and many other alternatives.

100% agreed - Overall, offloading data reads is a huge win and even if developers are using the sui-indexer handler directly, I think this is much, much less stress on the system.

Questions and the pull requests

You mentioned "Handler" a few times. But as you can probably tell know, the handler is a minor thing in RPC 2.0. It's only relevant for people who write additional indexing logic in Rust. A custom indexer is not bounded or forced to implement it if they don't like to.

It sounds like there are some alternatives in progress: Data Lake, Geyser-style export system, and direct RPC invocations. Are there any others I'm missing?

I will take a look at the pull requests you mentioned separately. Big thank you!

@sheldonkreger
Copy link

gegaowp and sblackshear - Thanks for your reply. It sounds like you are thinking deeply about the challenges I mentioned already. Looking forward to more work together. Very impressed with the Mysten team.

@sheldonkreger
Copy link

sheldonkreger commented Sep 20, 2023

@oxade

In summary I wouldn't use the current state of the GraphQL internal logic as an indication of what's coming. We're currently still in dev and not yet optimizing for perf. We should do a better job of commenting that certain parts of the code are experimental/temporary.

Thanks for the clarification. I sort of suspected it was POC. As I mentioned, I haven't been super involved on these conversations until recently, but I'm glad we're clarifying things now. I'm sure you're working on a more optimized solution, already. 👍

@sheldonkreger
Copy link

@amnn Thanks for your detailed response. I'm definitely seeing the 'bigger picture' with more clarity thanks to your post, and others. I think that one of the main challenges for the core Mysten team is deciding which data services to provide in the core, and which services should be handled by external vendors. Those kinds of details are VERY important in driving the decision about database technology. A really good example of this is where I suggest serving object-level fields, which you've considered and decided against it. You're obviously thinking about this with full perspective, and I have a lot of respect now that I see how much thought has gone into it.

There's also the important consideration of operational complexity. Sure, a distributed database like MongoDB or Elasticsearch may offer a better query engine, but it's not necessarily fair to expect RPC providers to operate these in production. There's a reason why custom indexing services are a big industry in the blockchain world - things like this are "easier said than done."

I also learned a thing or two - I was definitely wrong about Postgres support for nested field queries. Storing Sui objects as JSON and then creating a GIN is not something I had considered. In fact, it's my first time hearing of GIN in Postgres! I knew I had to be wrong about at least something in my lengthy rants 😆

That being said - With the Huracan indexer, I am already indexing into MongoDB and allowing some of these complex queries / filters, plus a GraphQL API. If you are ever interested, there are definitely some lessons I could share about the configuration of MongoDB, index design, and operational challenges of MongoDB. I've also operated massive ElasticSearch clusters, Kafka, Pulsar, InfluxDB, ScyllaDB, Graphite, among others. So, feel free to reach out if you ever have questions about operational techniques for distributed databases. It sounds like you've also spent a lot of time with these.

One thing we are all agreement on is that extracting data and moving it to a custom datastore is a big challenge right now. Having written custom RPC scraping code, I just want to emphasize that I think this is an area that deserves a lot of attention, because everybody will benefit.

As for the RPC-backed GraphQL setup, Oxade also mentioned that it's more of a POC. Thanks for the clarification.

@sheldonkreger
Copy link

sheldonkreger commented Sep 20, 2023

@longbowlu Responding to your comments about database tech here.

However our realization is, it's prohibitively expensive to provide an unlimited set of indices on an unlimited data set. To give a quantitative sense, today on mainnet, there are over 205M live objects, not to mention the their older version and deleted objects. This number will grow indefinitely. As you said, the objects internally could be highly unstructured with fancy nested fields.

I'm glad to see the team is thinking about this. Your points are 100% valid and this was a major issue while I was indexing all Sui Objects into MongoDB with the Huracan indexer. There is no effective way to put all of the objects into a single MongoDB collection. Even with the indices, MongoDB collections really need to fit in-memory to avoid massive performance issues with disk I/O during reads. That's why I shut down ingest on the demo site.

What I ended up doing was creating a whitelist/blacklist system for ingest, so you can select which package data is written to MongoDB. I build some general indices, but you'd also have to build more to fit your data.

https://github.com/capsule-craft/huracan/blob/main/main/config.yaml#L116

In such a system, each team would need to run their own Huracan installation. That's fine for Huracan, but not an option for Sui Core. It would also be possible to put each type into its own collection - but I'd have to think about how this would affect the GraphQL implementation.

Maybe the Huracan project isn't as dead as I thought it was when I saw the announcement about RPC 2.0. 😄

At this point you could even guess what i'm going to say :). We may end up having different data layers eventually. In (chart d) in my previous reply, the storage could be a relational DB, k/v store, document db and even a data lake. We are not going to implement everything on day 1, so we start from relational DB.

Based on your previous remarks and the explanation from @amnn , I think a relational DB is a great starting point. Now that I see the scope of what you're trying to accomplish, this makes a lot more sense. However:

To add some context, even though the syntax is Postgresql, the underlying databases could be anything as long as it's psql compatible. For example, today we are running in Aurora (Postgresql) because our old vendor has a hard limit in disk size (~5TB). Some offers would avoid the drawbacks you mentioned about the vanilla Postgres.

This definitely indicates that you should be considering a database that scales horizontally. But, you probably already know 👍

@kklas
Copy link
Contributor

kklas commented Sep 23, 2023

This is a very insightful discussion, thanks everyone who is contributing! I'm not going to comment on high level stuff and architecture because it seems like you all know what you're doing, but just point out a few small things that came to mind reading this.

Data Consistency
To enable this, the RPC’s indexer will need to commit writes as a result of checkpoint post-processing atomically, which increases latency (transactions that are final may take longer to show up in fullnode responses)

If I'm reading this correctly, data becomes available for reading on a checkpoint-by-checkpoint basis where writes commited within the newest checkpoint are not available until it gets fully committed? How does this work then with read-after-write consistency where I'm committing a transaction and then trying to read it? Will I have to wait for the checkpoint to get committed before I can issue a consistent read?

+1 on "Reinterpreting Package IDs" and "Package Upgrade History", I already have use cases for this.

Regarding devInspect / dryRun, will this new API make it easy to replay historical transactions? Currently you have to fetch the tx data and then replace any shared objects with their exact version it executed against from tx effects before running devInspect. This involves BCS marshaling / unmarshaling as tx data is returned in BCS and it's a bit cumbersome to do.

In the current RPC, when fetching objects (and other things like events) with showContent, some field types are special-cased to return in a human friendly format (e.g., ID, UID, String, Url...). Can we get some specifics around this? I think it's fine to special-case but at least we need to be more clear on which types are special and how they're handled. Perhaps add an option to turn it off because it creates issues for programmatic parsing like with code generators. Arguably you can rely on BCS encoded responses here but showContent can still be useful in environments where BCS isn't available.

@amnn
Copy link
Contributor Author

amnn commented Oct 15, 2023

How does this work then with read-after-write consistency where I'm committing a transaction and then trying to read it? Will I have to wait for the checkpoint to get committed before I can issue a consistent read?

There's two ways to approach this -- the first is just to wait for the checkpoint which is the simplest option but will decrease your throughput by introducing checkpoint propagation and indexer post-processing to your critical path. This may not be so bad if your transactions involve shared objects because they need to go through consensus and appear in a checkpoint to be final anyway.

The second option is to maintain the state you want locally, instead of relying on subsequent reads from the RPC. This approach is more complicated, but may be worthwhile if your transactions are otherwise operating in the single-owner fast path, and you care about your throughput. Typically what is important here is to know what the latest version of your owned objects are after having run some prior transactions on them, and this (the lamport version of a transaction accessing an owned object) is quite straightforward to forecast locally.

While not in the immediate plans, we have discussed adding to our SDKs to do this automatically for you (i.e. the SDK keeps track of the owned objects you use in your transactions, and the versions they went in as and came out as, acting as a cache to avoid having to read versions from RPC in your critical path).

Regarding devInspect / dryRun, will this new API make it easy to replay historical transactions?

Dry Run is not the ideal tool for running historical transactions -- we actually have a separate tool for this, called sui replay which we use internally but have not productionised yet (it is in the plans to do this). But the redesigned interface for this operation does help avoid some marshalling to/from BCS in the case where you've already got a transaction block and you want to then dry run it (which currently involves extracting the transaction kind from within it).

In the current RPC, when fetching objects (and other things like events) with showContent, some field types are special-cased to return in a human friendly format (e.g., ID, UID, String, Url...). Can we get some specifics around this? I think it's fine to special-case but at least we need to be more clear on which types are special and how they're handled.

Indeed, here's what the schema for Move Objects looks like:

type MoveValue {
type: MoveType!
data: MoveData
bcs: Base64
}

# Scalar representing the contents of a Move Value, corresponding to
# the following recursive type:
#
# type MoveData =
# { Number: BigInt }
# | { Bool: bool }
# | { Address: SuiAddress }
# | { UID: SuiAddress }
# | { String: string }
# | { Vector: [MoveData] }
# | { Option: MoveData? }
# | { Struct: [{ name: string, value: MoveData }] }
scalar MoveData

It offers BCS and also a structured format. It is generally much more geared towards programmatic use cases. There was also a proposal to introduce a slightly more human friendly option (a generic JSON format where structurs appear as JSON objects, vectors as JSON arrays, etc).

@kklas
Copy link
Contributor

kklas commented Oct 16, 2023

While not in the immediate plans, we have discussed adding to our SDKs to do this automatically for you (i.e. the SDK keeps track of the owned objects you use in your transactions, and the versions they went in as and came out as, acting as a cache to avoid having to read versions from RPC in your critical path).

This seems somewhat complex. Keep in mind that to have a consistent app state on the UI you need object data as well and, if I'm not mistaken, the RPC doesn't return that in transaction response (just versions). So it would not be possible to do this unless you have the RPC return the updated object data in the response or have some way of simulating tx execution on the client side. Maybe a crazy idea but compile Move to TS and simulate exeuction locally? Seems doable if you have consistent object data already in your cache, but you'd still have to fetch dynamic objects.

@amnn
Copy link
Contributor Author

amnn commented Oct 16, 2023

This seems somewhat complex. Keep in mind that to have a consistent app state on the UI you need object data as well and, if I'm not mistaken, the RPC doesn't return that in transaction response (just versions). So it would not be possible to do this unless you have the RPC return the updated object data in the response or have some way of simulating tx execution on the client side.

You're right, this is quite complex, which is why we wanted to abstract this away in the SDK as much as possible. The scenario I was describing (around tracking versions) primarily applies to backend services that might need to deal with a lot of concurrent transactions, but the case of optimistic updates to UIs is an interesting one (and it's a separate scenario).

Luckily, I think you don't have to go as far as running Move code as TS -- you can leverage dry run to get optimistic object updates (still not currently straightforward, because object changes are represented as raw BCS, but should be easier than transpiling Move to TS!) and this is not really extra work, because in most cases a transaction will need to be dry-run to estimate gas costs.

There are still going to be edge cases where this kind of optimistic update is not accurate (think of a transaction that relies on an oracle whose data changes often), but this is an inevitability.

@kklas
Copy link
Contributor

kklas commented Oct 16, 2023

and this is not really extra work, because in most cases a transaction will need to be dry-run to estimate gas costs.

I believe you might be underestimating how tricky these things are on the frontend. It's a delicate balance between code complexity and network code efficiency. You need to display same underlying data in different ways in different parts of the UI. React code has a hierarchical structure where data is fetched at top components and being passed down. So naively, you can be very network efficient and fetch every piece only once in one place but then code complexity will come from having to pass it around hierarchically unrelated components. Or, naively, you can have simple code and fetch data in any place it's needed. But then you're slamming the RPC with requests.

So what you do instead is have an underlying caching layer so that you can have simple code of "fetch data where it's needed", but have the fetch hit the cache.

Doing an optimistic update naively will not work well because any subsequent read on the RPC will overwrite the state with the old one if the checkpoint didn't have a chance to go through yet. This can cause flickering on the UI and old state to be displayed after the transaction confirmation message and crypto users are very sensitive to that because they get rugged all the time. Having to refresh the page all the time or wait 3 seconds is not great UX.

Existing frontend libraries don't handle this because they don't have to deal with this type of inconsistencies on the backend (read after write is normally consistent while here it's normally incosistent).
Also, you need to have the cache to be aware of object ids and versions and use that as cache keys. The cache libraries we have available like "React Query" or "RTK Query" won't work well for this because their caches are designed around endpoints and not ids and version.

My strong feeling is that you should prioritize having something in the SDK to help with this. I'm not optimistic about app developers dealing with this on their own.

@amnn
Copy link
Contributor Author

amnn commented Oct 17, 2023

and this is not really extra work, because in most cases a transaction will need to be dry-run to estimate gas costs.

I believe you might be underestimating how tricky these things are on the frontend. It's a delicate balance between code complexity and network code efficiency.

To clarify, my comment about whether this was extra work or not is purely in terms of whether this kind of solution requires an extra roundtrip or not, optimistic updates absolutely is a tricky thing to get right. Let me also cc @hayes-mysten and @Jordan-Mysten who are looking into dapp-kit etc which seems like a good home for patterns to solve this.

@hayes-mysten
Copy link
Contributor

This is definitely getting into interesting territory. The GraphQL ecosystem has a lot of tooling around various kinds of normalized cache for front end applications. Several GraphQL libraries have solutions for optimistic updates in the GraphQL cache, and usually support keys derived from multiple properties (id + version) with various invalidation strategies. The tricky piece here is that I think we want to avoid blessing a specific GraphQL solution in dapp-kit, because some graphql solutions end up dictating how your entire app is structured. There are a lot of tradeoffs to consider when choosing a GraphQL library.

I'm not sure how this will end up looking. We may be able to come up with some more general helpers that can be used with multiple libraries, or add several library specific utils allowing you to pick what you need. This is definitely something that will need more exploration before we can commit to a specific path.

@nikolai-aftermath
Copy link

nikolai-aftermath commented Feb 23, 2024

Hey! I've been exploring new RPC 2.0 and found only trivial examples that don't help me to solve the following non-trivial, yet seemingly simple task: to consistently load all Orders data from DeepBook's Pool. With the previous API, maintaining consistency was challenging due to rapid mutations.

I firmly believe that addressing such a challenge should be prioritized as one of the primary objectives for the new API. To facilitate this, could you please provide a GraphQL request example that demonstrates how to efficiently load all Orders from DeepBook in a single query?

For start, you can think of Pool as having the following struct:

use sui::table::Table;

struct Pool has key, store {
    asks: Table<u64, Table<u64, Order>>,   // Please note that `Order`s live at the second tier of dfs!
    bids: Table<u64, Table<u64, Order>>,
}

and we need to load all Orders (simultaneously from asks and bids) in json in no specific order with possible pagination.

@amnn
Copy link
Contributor Author

amnn commented Feb 23, 2024

Hi @nikolai-aftermath, the main thing that makes this complicated is the wrapping (the thing you've highlighted in your comment: "Please note that Orders live at the second tier of dfs!").

Today, the indices backing the GraphQL API do not look "inside" objects, which is what is preventing us from handling this kind of request in a single query. Their aim to give a general view of all data on-chain and we tend to recommend using custom indexers for domain-specific indexing needs (where the data being indexed lives inside the structure of an object), and that would be the most direct way to solve this problem. (And here, I mean "custom" from the perspective of the tables backing the GraphQL service -- it might still be something we offer as standard, given that deepbook is part of the system packages, it just would need us to do additional indexing).

In future, we will support wrapped object indexing, which might alleviate this requirement, if we exposed an ability to iterate over all the objects that are wrapped in another object, and then run sub-queries over them.

If I needed to solve this today, I would run three separate queries:

  • One to get the IDs of the asks and bids tables.
  • Another to get the prices in the respective order books, and the IDs of the order tables in each order book.
  • Then the paginated queries to go over the prices.
query GetPool($poolId: SuiAddress!) {
  object(address: $poolId) {
    contents { json }
  }
}

# Paginated query to get all elements of a table that must be used in two stages. Once to get all the prices,
# and another to get all the orders at those prices. We use the `owner` API to get the dynamic fields of a
# wrapped object (like the tables in question).
query IterateTable($tableId: SuiAddress!, $after: string) {
  owner(address: $tableId) {
    dynamicFields(after: $after) {
      pageInfo { hasNextPage endCursor }
      nodes {
        name { json }
        value { ... on MoveValue { json } }
      }
    }
  }
}

If you do this in a straightforward way, it will not necessarily be consistent, because when you start each iteration, it will take a snapshot of the query at the checkpoint the iteration started on. Technically speaking, you could carefully craft a before or after cursor to set the appropriate checkpoint, but this is relying on an implementation, detail. I don't see a problem with us exposing a parameter to control consistency for paginated queries. We don't have it yet, but supposing we did -- call it latestAt -- then you could use it as follows:

query GetPool($poolId: SuiAddress!) {
  checkpoint { sequenceNumber }
  object(address: $poolId) {
    contents { json }
  }
}

query IterateTable($tableId: SuiAddress!, $after: string, $checkpoint: Int) {
  owner(address: $tableId) {
    dynamicFields(after: $after, latestAt: $checkpoint) {
      pageInfo { hasNextPage endCursor }
      nodes {
        name { json }
        value { ... on MoveValue { json } }
      }
    }
  }
}

When you run the first query, to get the IDs of the Pool's Table's, you can also get the checkpoint sequence number you are reading at, and subsequently, when you are iterating over dynamic fields, you can supply the checkpoint to ensure you get the latest version of that dynamic field at the checkpoint you are iterating at.

@nikolai-aftermath
Copy link

Hey, @amnn, the hypothetical and unattainable future you describe is of course beautiful. It would be nice if we had some sort of method asObject for any SuiAddress to get one from any level of dynamic field hierarchy.

But even the worthless API you propose doesn't work at the moment:

query {
  owner(address: "0x029170bfa0a1677054263424fe4f9960c7cf05d359f6241333994c8830772bdb") {
    dynamicFields {
      pageInfo { hasNextPage endCursor }
      nodes {
        name { json }
        value { ... on MoveValue { json } }
      }
    }
  }
}

returns error

Request timed out. Limit: 15s

Here 0x029170bfa0a1677054263424fe4f9960c7cf05d359f6241333994c8830772bdb is the address of SUI-USDC pool bids leaves table.

@amnn
Copy link
Contributor Author

amnn commented Feb 27, 2024

Hi @nikolai-aftermath,

Thanks for the report -- we are working on addressing the timeouts at the moment (cc @wlmyng) -- there are some performance optimisations and query planning improvements that are still in progress.

Regarding the hypothetical asObject query to get object data at any level of the object hierarchy -- this requires us to index wrapped objects, which is in our backlog, but is behind the work that we need to do to get up to parity with JSON-RPC (performance optimisations, various improvements to querying transactions and effects, SDK support, subscriptions, etc)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants