Dynamic block range based on triggers per range #1370

leoyvens · 2019-11-21T13:12:26Z

Use the amount of triggers found in the previous range to adjust the next range, trying to meet a configured target triggers per range. For dense subgraphs this allows the complexity of ethereum calls and memory usage to be kept under control, without compromising speed for cheap subgraphs or initial speed for subgraphs that are denser towards the end of the chain.

Testing with erc20, this was effective in keeping the trigger count around the default target of 1000. The memory usage sat at ~100mb, which seems ok for an actively syncing subgraph, and suggests that each trigger takes ~100kb in average. The range size was at about 3000 blocks, which is expected to decrease at higher block numbers.

Resolves #1350. The implementation is different from what I suggested in the issue in that the range steps are not tied to multiples of 10.

Use the amount of triggers found in the previous range to adjust the next range, trying to meet a configured target triggers per range. For dense subgraphs this allows the complexity of ethereum calls and memory usage to be kept under control, without compromising speed for cheap subgraphs or initial speed for subgraphs that are denser towards the end of the chain.

After the subgraph synced, it would log on every new block.

ghost · 2019-11-26T21:24:54Z

chain/ethereum/src/block_stream.rs

-    static ref ETHEREUM_BLOCK_RANGE_SIZE: u64 = ::std::env::var("ETHEREUM_BLOCK_RANGE_SIZE")
-        .unwrap_or("10000".into())
+    /// Maximum number of blocks to request in each chunk.
+    static ref MAX_BLOCK_RANGE_SIZE: u64 = std::env::var("GRAPH_ETHEREUM_MAX_BLOCK_RANGE_SIZE")


Should we keep this default in production? Right now the existing env var in community is set to 3000 and 10000 for customers

The MAX_BLOCK_RANGE_SIZE is essentially to prevent us from scanning the entire chain at once. Scanning 100k blocks at once may cause us to go way over the target in bad cases, but that will happen only once and then the range will adapt.

From my testing I believe these defaults should work out fine and we won't need to tinker with these variables, but if they turn out to be too aggresive we can always lower as a short term solution.

Won't reducing 100k to 10k have a negative performance impact on early block scanning of most subgraphs?

Jannis · 2019-11-27T12:16:55Z

chain/ethereum/src/block_stream.rs

-    static ref ETHEREUM_BLOCK_RANGE_SIZE: u64 = ::std::env::var("ETHEREUM_BLOCK_RANGE_SIZE")
-        .unwrap_or("10000".into())
+    /// Maximum number of blocks to request in each chunk.
+    static ref MAX_BLOCK_RANGE_SIZE: u64 = std::env::var("GRAPH_ETHEREUM_MAX_BLOCK_RANGE_SIZE")


Won't reducing 100k to 10k have a negative performance impact on early block scanning of most subgraphs?

Jannis · 2019-11-27T12:17:26Z

chain/ethereum/src/block_stream.rs

@@ -173,6 +180,10 @@ where
                start_blocks,
                templates_use_calls,
                metrics,
+
+                // A high number here forces a slow start, with a range of 1.


Why don't we want to start fast? 😉

Starting slow is more conservative. For example if we're coming from a large range size and suddenly hit a ton of triggers, causing the node to crash oom, starting slow gives us a chance to recover and find the correct range.

The range can increase by 10x on each scan, so it takes only 5 scans for this to hit the default maximum of 100k.

Right, like when a node restarts and the subgraphs pick up at a range with an enormous amount of triggers. Ok, yes, makes sense. I was only thinking about the start case / genesis block, but with the start block feature, you could start at any block and too large a range could cause trouble.

Ah yes there's that as well, when you start you don't know what your previous range was.

chain/ethereum/src/block_stream.rs

Jannis · 2019-11-28T11:54:33Z

chain/ethereum/src/block_stream.rs

+                            // - Scan 10 blocks:
+                            //   2 triggers found, 0.2 per block, range_size = 1000 / 0.2 = 5000
+                            // - Scan 5000 blocks:
+                            //   500 triggers found, 0.1 per block, range_size = 500 / 0.2 = 2500


Hmm. If I scan 5000 blocks and I find 500 triggers but my target is 1000, shouldn't I increase the block range to 10000, not keep it at 500?

Similar with scanning 2500 blocks and finding 500 triggers. In that case I'd also think that I have to double my range size to get to the target amount of triggers in the range.

Right this is totally wrong, I think I fixed it.

Jannis

I think the math in the comment adds up now. 😉

Jannis · 2019-11-28T14:01:25Z

chain/ethereum/src/block_stream.rs

@@ -173,6 +180,10 @@ where
                start_blocks,
                templates_use_calls,
                metrics,
+
+                // A high number here forces a slow start, with a range of 1.


Right, like when a node restarts and the subgraphs pick up at a range with an enormous amount of triggers. Ok, yes, makes sense. I was only thinking about the start case / genesis block, but with the start block feature, you could start at any block and too large a range could cause trouble.

leoyvens requested review from Jannis and a user November 21, 2019 13:12

leoyvens added 2 commits November 22, 2019 14:19

block_stream: Reduce noisiness of trigger count log

df5ff6c

After the subgraph synced, it would log on every new block.

leoyvens force-pushed the leo/dynamic-block-range branch from 60bd146 to df5ff6c Compare November 22, 2019 17:19

ghost reviewed Nov 26, 2019

View reviewed changes

Jannis requested changes Nov 27, 2019

View reviewed changes

block_stream: Improve comments about the block range

ee7eb3d

Jannis reviewed Nov 28, 2019

View reviewed changes

block_stream: Fix math in comment

38232d3

Jannis approved these changes Nov 28, 2019

View reviewed changes

leoyvens merged commit 3760bef into master Nov 28, 2019

leoyvens deleted the leo/dynamic-block-range branch November 28, 2019 14:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic block range based on triggers per range #1370

Dynamic block range based on triggers per range #1370

leoyvens commented Nov 21, 2019

ghost Nov 26, 2019

leoyvens Nov 27, 2019

Jannis Nov 27, 2019

Jannis Nov 27, 2019

Jannis Nov 27, 2019

leoyvens Nov 28, 2019

Jannis Nov 28, 2019

leoyvens Nov 28, 2019

Jannis Nov 28, 2019

leoyvens Nov 28, 2019

Jannis left a comment

Jannis Nov 28, 2019

Dynamic block range based on triggers per range #1370

Dynamic block range based on triggers per range #1370

Conversation

leoyvens commented Nov 21, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jannis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment