Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Indexer] Full Data & Real-time Indexer #1028

Closed
7 of 8 tasks
Tracked by #1000
baichuan3 opened this issue Oct 24, 2023 · 1 comment
Closed
7 of 8 tasks
Tracked by #1000

[Indexer] Full Data & Real-time Indexer #1028

baichuan3 opened this issue Oct 24, 2023 · 1 comment
Assignees
Labels
feature New feature skill::rust Need the rust language skill to complete the issue status::design The issue need to do more detail design
Milestone

Comments

@baichuan3
Copy link
Collaborator

baichuan3 commented Oct 24, 2023

Typical scenario

  1. Which coins are registered in the system, the total amount of coin supply, and the number of currency holding addresses
  2. Transaction list of a certain address; list of all coins and coin balance
  3. Query event list by transaction

Goal

  1. Automatic indexing state and event data on Rooch
  2. Provide API to query Index data
  3. Provide SQL to query the Index data
  4. Provide the ability to customize Indexer?

Indexer solution

  1. Checkpoint regularly and then write in batches
  2. Listen the database file, generate Transaction Stream, and trigger writing
  3. Automatically parse SMT nodes, parse Type types, and automatically generate Table Schema
Option Data Mode Create Table Schema
Option 1: Base on Checkpoint Offchain Manual
Option 2: Listen Database And Genarate Transaction Stream Offchain Manual
Option 3: Auto generate table schema Onchain Auto

Automatic table creation solution

Architecture
image

Core Process

  1. Parse SMT+NodeStore and get the AnnotatedMoveValue corresponding to State and Event.
  2. Parse AnnotatedMoveValue and generate Table Schema Metadata and Data Schema Metadata
  3. Rely on Table Schema Template to convert Table Schema Metadata and Data Schema Metadata into Table Schema and Data Schema
  4. Determine whether the Table Schema is created. If not, create the Table first, then create the index; then write the data.
  5. Repeat the above process

How to parse

  1. Automatically parse SMT leaf nodes and obtain the Type type. Problem: crate dependency problem, need to rely on statedb

Entry: statedb module
Call the method AnnotatedStateReader::view_value

fn view_value(&self, ty_tag: &TypeTag, blob: &[u8]) -> 
Result<AnnotatedMoveValue> {
    let annotator = MoveValueAnnotator::new(self);
    annotator.view_value(ty_tag, blob)
}
  1. Parse TableChange in StateChangeSet at the storage layer. The data structure is <Vec, Op> and get the State type.
pub struct State {
    /// the bytes of state
    pub value: Vec<u8>,
    /// the type of state
    pub value_type: TypeTag,
}

state entry statedb module
distinguish:
resource
module

  1. event entry event_store/mod module

  2. Parse AnnotatedMoveValue, reference
    https://github.com/rooch-network/rooch/blob/encrypt_keystore/crates/rooch-types/src/framework/coin.rs

Core Table Schema

  • transaction
Field Type Description
tx_hash varchar the hash of the transaction
transaction_type varchar the type of the transaction
chain_id int the chain id
auth_validator_id int the auth validator id of the authenticator info
payload blob the authenticator info payload
encode_data blob the transaction encode data
created_at timestamp when the row was created
updated_at timestamp when the row was updated
sender_address varchar user address who emit the event

Primary key: tx_hash pk
Index:

  • transaction_sequence_info
Field Type Description
tx_order bigint the sequencer order of the transaction
auth_validator_id int the auth validator id of the tx order signature
payload blob tx order signature payload
tx_accumulator_root varchar the tx accumulator root after the tx is append to the accumulator
created_at timestamp when the row was created
updated_at timestamp when the row was updated
sender_address varchar user address who emit the event

Primary key: tx_order pk
Index:

  • transaction_execute_info
Field Type Description
tx_hash varchar the hash of the transaction
state_root varchar the root hash of Sparse Merkle Tree describing the world state at the end of this transaction.
event_root varchar the root hash of Merkle Accumulator storing all events emitted during this transaction
gas_used int the amount of gas used
status varchar the vm status
created_at timestamp when the row was created
updated_at timestamp when the row was updated
sender_address varchar user address who emit the event

Primary key: tx_hash pk
Index:

  • coin_info
Field Type Description
coin_type varchar coin type of the coin
name varchar name of the coin
symbol varchar symbol of the coin
decimals smallint decimals of the coin
supply bigint supply of the coin
created_at timestamp when the row was created
updated_at timestamp when the row was updated

Primary key: symbol unique pk
Index:

  • coin_store
Field Type Description
address varchar user address
coin_type varchar coin type of the coin
balance numeric balance of a specific coin type
frozen bool freeze status
created_at timestamp when the row was created
updated_at timestamp when the row was updated

Primary key: (address, coin_type) union pk
Index:

  • event
Field Type Description
event_handle_id varchar event handle id
event_seq int the number of messages that have been emitted to the path previously
type_tag varchar the type of the event data
event_data blob the data payload of the event
event_index int event index in the transaction events
created_at timestamp when the row was created
updated_at timestamp when the row was updated
tx_order bigint the sequencer order of the transaction
tx_hash varchar the transaction hash of the transaction
sender_address varchar user address who emit the event

Primary key: (event_handle_id, event_seq) union pk
Index:

Challenge in automatic table creation scheme solution

  1. Nested Struct, automatic table creation how to convert the problem. Create multiple tables? Or convert it into a single table through a template?
    For example
    /// The Balance resource that stores the balance of a specific coin type.
   struct Balance has store {
       value: u256,
   }
   
   /// A holder of a specific coin types.
   /// These are kept in a single resource to ensure locality of data.
   struct CoinStore has key {
       coin_type: string::String,
       balance: Balance,
       frozen: bool,
   }
  1. Primary key problem. Automatically create primary keys and use auto-increment primary keys? Or use business primary keys, such as address, transaction sequence order?
  2. How to automatically create indexes after creating a table? How to create a composite index to meet query scenarios?
  3. Table Schema adjustments brought about by updating Struct (for example, adding fields to Struct)
  4. Batch writing? Performance optimization issues

Convention

  • All Tables have created_at and updated_at fields by default, which are automatically filled in by Table Schema Template

TODO

  1. Auto parse and Table Schema Template
  2. Should we split the SMT's leaf value from the node?
  3. Sqlite ORM
  4. GraphQL Server

Appendix

SMT principle

Relative issues

@baichuan3 baichuan3 added status::design The issue need to do more detail design feature New feature skill::rust Need the rust language skill to complete the issue labels Oct 24, 2023
@baichuan3 baichuan3 added this to the Rooch v0.3 milestone Oct 24, 2023
@baichuan3 baichuan3 self-assigned this Oct 24, 2023
@baichuan3
Copy link
Collaborator Author

baichuan3 commented Oct 25, 2023

Discussion conclusion:

  • Transaction sender and other types are normalized. The sender of the transaction from Ethereum has Ethereum address and Rooch address. At least the Rooch address must be stored.
  • Transactions can be merged into a single table, and transaction data is triggered in RpcService::execute_tx;
  • Indexer trigger entry? Distinguish between transaction data, state data, and event data
  • It is difficult to expand by Struct, State table is mapped to Indexer table; try to use JSON to store Table V
    Expand by table type:
    Option 1: Create a global Object table: objectid, owner, JSON (V); create a table for other tables, and store V according to JSON. Implemented in the first phase
    Option 2: Object ID creates a table by type: objectid, hash(K), expand by Struct;
  • Indexer's created_at and updated_at cannot use indexer time. They need to use the time on the chain and support replay.
  • Create a Changeset table for data recovery and state sync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature skill::rust Need the rust language skill to complete the issue status::design The issue need to do more detail design
Projects
Status: Done
Development

No branches or pull requests

2 participants