Pre-indexing

graphprotocol · Feb 1, 2024 · 2f42cf3 · 2f42cf3
1 parent 6067090
commit 2f42cf3
Show file tree

Hide file tree

Showing 12 changed files with 5,647 additions and 6 deletions.
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/core/Cargo.toml b/core/Cargo.toml
@@ -19,7 +19,6 @@ graph-chain-near = { path = "../chain/near" }
 graph-chain-cosmos = { path = "../chain/cosmos" }
 graph-chain-substreams = { path = "../chain/substreams" }
 graph-chain-starknet = { path = "../chain/starknet" }
-lazy_static = "1.2.0"
 lru_time_cache = "0.11"
 semver = "1.0.21"
 serde = "1.0"
@@ -31,6 +30,13 @@ graph-runtime-wasm = { path = "../runtime/wasm" }
 cid = "0.11.0"
 anyhow = "1.0"
 
+borsh = { version = "1.3.1", features = ["derive"] }
+# The version already used in the project is very old and upgrade is currently blocked.
+once_cell = "1.18.0"
+ethabi = "17.0"
+substreams = "0.5.0"
+substreams-ethereum = "0.8.0"
+
 [dev-dependencies]
 tower-test = { git = "https://github.com/tower-rs/tower.git" }
 ipfs-api-backend-hyper = "0.6"

diff --git a/core/src/indexer/README.md b/core/src/indexer/README.md
@@ -0,0 +1,50 @@
+# Pre-Indexer for subgraphs
+
+## Design
+The pre indexer will traverse all the blocks, according to some filters which are currently defined
+per chain. For each block it will run the mappings with the block and a state kv as input and store
+the resulting triggers for the block, these will later be used as input for subgraphs instead of 
+raw blockchain data. 
+
+By exposing the state, it will allow users to maintain their own logic for handling dynamic data sources
+as well as any derived data. 
+
+The state is expected to be returned after every block and passed on to the next. The state will
+not be available for querying and only the latest version is kept between blocks and it will be 
+limited in size, through a mechanism we will defined later on.
+
+If state is not used then all the processing will happen in parallel until it reaches the chain head. 
+
+## State
+State refers to intermediate state between blocks (think the state for fold operations). Only the 
+latest data is kept and it is only queryable from the pre-indexer, subgraphs and graphql don't have 
+access to it.
+
+This state is necessary so that users can keep track of things like created contracts on ethereum.
+
+State is indexed by a string key and an optional tag and will store a Vec<u8>. This means that anything
+stored in the state should ideally use a serializable binary format like borsh or protobuf.
+
+The key is designed to be an ID or unique value and tag helps query items by tag. As an example:
+
+```
+  store.set("123", "token", ...)
+  store.set("321", "token", ...)
+  store.get_all("token") // Should yield both the previous values.
+```
+
+## Processing
+The pre-indexer will iterate over all the blocks coming from firehose/substreams, this means it is
+possible to apply filters to the incoming data so that the processing is quicker. The main note
+about the processing is that if state is not used, the entire block space being scanned can be 
+partitioned and handled in parallel.
+
+## Store
+The store is a mapping of BlockNumber to the list of triggers for that block, where the order will be 
+preserved. 
+
+## Transformations
+The transformations is similar to the mappings on subgraphs, they provide the code that performs the 
+data extraction from blocks. Transformations take as input the previous state and the block, returning
+the new state and a list of encoded triggers, that is, the values to be stored for the processed block,
+which are later used as inputs for subgraphs.