Skip to content

Commit

Permalink
Pre-indexing
Browse files Browse the repository at this point in the history
  • Loading branch information
mangas committed Feb 1, 2024
1 parent 6067090 commit 2f42cf3
Show file tree
Hide file tree
Showing 12 changed files with 5,647 additions and 6 deletions.
126 changes: 121 additions & 5 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 7 additions & 1 deletion core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ graph-chain-near = { path = "../chain/near" }
graph-chain-cosmos = { path = "../chain/cosmos" }
graph-chain-substreams = { path = "../chain/substreams" }
graph-chain-starknet = { path = "../chain/starknet" }
lazy_static = "1.2.0"
lru_time_cache = "0.11"
semver = "1.0.21"
serde = "1.0"
Expand All @@ -31,6 +30,13 @@ graph-runtime-wasm = { path = "../runtime/wasm" }
cid = "0.11.0"
anyhow = "1.0"

borsh = { version = "1.3.1", features = ["derive"] }
# The version already used in the project is very old and upgrade is currently blocked.
once_cell = "1.18.0"
ethabi = "17.0"
substreams = "0.5.0"
substreams-ethereum = "0.8.0"

[dev-dependencies]
tower-test = { git = "https://github.com/tower-rs/tower.git" }
ipfs-api-backend-hyper = "0.6"
Expand Down
50 changes: 50 additions & 0 deletions core/src/indexer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Pre-Indexer for subgraphs

## Design
The pre indexer will traverse all the blocks, according to some filters which are currently defined
per chain. For each block it will run the mappings with the block and a state kv as input and store
the resulting triggers for the block, these will later be used as input for subgraphs instead of
raw blockchain data.

By exposing the state, it will allow users to maintain their own logic for handling dynamic data sources
as well as any derived data.

The state is expected to be returned after every block and passed on to the next. The state will
not be available for querying and only the latest version is kept between blocks and it will be
limited in size, through a mechanism we will defined later on.

If state is not used then all the processing will happen in parallel until it reaches the chain head.

## State
State refers to intermediate state between blocks (think the state for fold operations). Only the
latest data is kept and it is only queryable from the pre-indexer, subgraphs and graphql don't have
access to it.

This state is necessary so that users can keep track of things like created contracts on ethereum.

State is indexed by a string key and an optional tag and will store a Vec<u8>. This means that anything
stored in the state should ideally use a serializable binary format like borsh or protobuf.

The key is designed to be an ID or unique value and tag helps query items by tag. As an example:

```
store.set("123", "token", ...)
store.set("321", "token", ...)
store.get_all("token") // Should yield both the previous values.
```

## Processing
The pre-indexer will iterate over all the blocks coming from firehose/substreams, this means it is
possible to apply filters to the incoming data so that the processing is quicker. The main note
about the processing is that if state is not used, the entire block space being scanned can be
partitioned and handled in parallel.

## Store
The store is a mapping of BlockNumber to the list of triggers for that block, where the order will be
preserved.

## Transformations
The transformations is similar to the mappings on subgraphs, they provide the code that performs the
data extraction from blocks. Transformations take as input the previous state and the block, returning
the new state and a list of encoded triggers, that is, the values to be stored for the processed block,
which are later used as inputs for subgraphs.
Loading

0 comments on commit 2f42cf3

Please sign in to comment.