-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic block range based on triggers per range #1370
Conversation
Use the amount of triggers found in the previous range to adjust the next range, trying to meet a configured target triggers per range. For dense subgraphs this allows the complexity of ethereum calls and memory usage to be kept under control, without compromising speed for cheap subgraphs or initial speed for subgraphs that are denser towards the end of the chain.
After the subgraph synced, it would log on every new block.
60bd146
to
df5ff6c
Compare
static ref ETHEREUM_BLOCK_RANGE_SIZE: u64 = ::std::env::var("ETHEREUM_BLOCK_RANGE_SIZE") | ||
.unwrap_or("10000".into()) | ||
/// Maximum number of blocks to request in each chunk. | ||
static ref MAX_BLOCK_RANGE_SIZE: u64 = std::env::var("GRAPH_ETHEREUM_MAX_BLOCK_RANGE_SIZE") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep this default in production? Right now the existing env var in community is set to 3000
and 10000
for customers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MAX_BLOCK_RANGE_SIZE
is essentially to prevent us from scanning the entire chain at once. Scanning 100k blocks at once may cause us to go way over the target in bad cases, but that will happen only once and then the range will adapt.
From my testing I believe these defaults should work out fine and we won't need to tinker with these variables, but if they turn out to be too aggresive we can always lower as a short term solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't reducing 100k to 10k have a negative performance impact on early block scanning of most subgraphs?
static ref ETHEREUM_BLOCK_RANGE_SIZE: u64 = ::std::env::var("ETHEREUM_BLOCK_RANGE_SIZE") | ||
.unwrap_or("10000".into()) | ||
/// Maximum number of blocks to request in each chunk. | ||
static ref MAX_BLOCK_RANGE_SIZE: u64 = std::env::var("GRAPH_ETHEREUM_MAX_BLOCK_RANGE_SIZE") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't reducing 100k to 10k have a negative performance impact on early block scanning of most subgraphs?
@@ -173,6 +180,10 @@ where | |||
start_blocks, | |||
templates_use_calls, | |||
metrics, | |||
|
|||
// A high number here forces a slow start, with a range of 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we want to start fast? 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting slow is more conservative. For example if we're coming from a large range size and suddenly hit a ton of triggers, causing the node to crash oom, starting slow gives us a chance to recover and find the correct range.
The range can increase by 10x on each scan, so it takes only 5 scans for this to hit the default maximum of 100k.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, like when a node restarts and the subgraphs pick up at a range with an enormous amount of triggers. Ok, yes, makes sense. I was only thinking about the start case / genesis block, but with the start block feature, you could start at any block and too large a range could cause trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes there's that as well, when you start you don't know what your previous range was.
chain/ethereum/src/block_stream.rs
Outdated
// - Scan 10 blocks: | ||
// 2 triggers found, 0.2 per block, range_size = 1000 / 0.2 = 5000 | ||
// - Scan 5000 blocks: | ||
// 500 triggers found, 0.1 per block, range_size = 500 / 0.2 = 2500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. If I scan 5000 blocks and I find 500 triggers but my target is 1000, shouldn't I increase the block range to 10000, not keep it at 500?
Similar with scanning 2500 blocks and finding 500 triggers. In that case I'd also think that I have to double my range size to get to the target amount of triggers in the range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right this is totally wrong, I think I fixed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the math in the comment adds up now. 😉
@@ -173,6 +180,10 @@ where | |||
start_blocks, | |||
templates_use_calls, | |||
metrics, | |||
|
|||
// A high number here forces a slow start, with a range of 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, like when a node restarts and the subgraphs pick up at a range with an enormous amount of triggers. Ok, yes, makes sense. I was only thinking about the start case / genesis block, but with the start block feature, you could start at any block and too large a range could cause trouble.
Use the amount of triggers found in the previous range to adjust the next range, trying to meet a configured target triggers per range. For dense subgraphs this allows the complexity of ethereum calls and memory usage to be kept under control, without compromising speed for cheap subgraphs or initial speed for subgraphs that are denser towards the end of the chain.
Testing with erc20, this was effective in keeping the trigger count around the default target of 1000. The memory usage sat at ~100mb, which seems ok for an actively syncing subgraph, and suggests that each trigger takes ~100kb in average. The range size was at about 3000 blocks, which is expected to decrease at higher block numbers.
Resolves #1350. The implementation is different from what I suggested in the issue in that the range steps are not tied to multiples of 10.