Indexing time instrumentation #1338

leoyvens · 2019-11-07T16:15:49Z

Resolves #1302.

This adds Prometheus metrics for measuring the total time to sync a subgraph, broken down by sections. This PR marks some sections and a convenient API is introduced for marking sections as needed. This also introduces a metric for measuring the impact of DB contention on performance. Grafana panels were added to display these metrics, which looks like this (timings for syncing erc20 up to about block 5400000):

See https://github.com/graphprotocol/instrumentation/pull/1 for the panels.

graph/src/components/metrics/stopwatch.rs

lutter · 2019-11-07T16:54:32Z

One thing I am wondering: it might be useful to also measure how much time we spend in block reversions. Would that be terribly hard to add?

leoyvens · 2019-11-07T17:12:48Z

@lutter I made this stop registering metrics when the subgraph becomes synced, I think what we care about measuring right now is how fast a subgraph becomes synced. Since reverts usually happen after the subgraph has synced, it doesn't fit with the theme of this PR, though it would be a good standalone metric to add. Shouldn't be hard to add, we have separate gauges for other things including number of blocks reverted.

Jannis · 2019-11-07T18:27:15Z

@leoyvens I'll review later but one quick comment: any chance we could benchmark the WASM memory allocation? You improved that recently but the time we spend there used to be a significant unknown between store.get returning and the mapping execution being able to use the value. Would be nice to know how long this takes now. What do you think?

lutter · 2019-11-07T21:56:39Z

@leoyvens what made me mention reverts is that I have seen some queries related to reverts take a long time (something like 2.5s) and having a better understanding of how much time we spend there would be great. But I agree with your point about time-to-sync being more interesting right now.

leoyvens · 2019-11-11T19:08:04Z

@Jannis I've added a section for store_get_asc_new, it's about 10% of the total store.get time. It might become something we need to optimize but not an issue right now.

lutter

LGTM

leoyvens · 2019-11-12T02:05:41Z

@Jannis do you want to take a look? Should I merge?

core/src/subgraph/instance_manager.rs

graph/src/components/metrics/stopwatch.rs

Jannis · 2019-11-12T10:33:49Z

graph/src/components/metrics/stopwatch.rs

+        let mut inner = StopwatchInner {
+            total_counter: *registry
+                .new_counter(
+                    format!("{}_total_secs", subgraph_id),


I'm not sure total_secs is the right term. Maybe {}_secs_to_synced or {}_syncing_secs? Who knows what we might use total for in the future, maybe all time spent on a subgraph?

Maybe this is fine actually, as the stopwatch metrics could be used in other places too. But then I'd maybe pass in a sync prefix here that allows this to become {}_sync_total_secs.

Interesting, I hadn't considered the reuse potential. Should all metrics be prefixed with _sync or just the one for the totals?

Initially, I thought we should prefix all but I think just the totals works. {subgraph}_total_secs is just too generic, but {subgraph}_store_get (or whatever) makes sense even if it's just capture during syncing for now. What do you think?

Agreed, I'll change just the total_secs to sync_total_secs.

graph/src/components/metrics/stopwatch.rs

runtime/wasm/src/module/mod.rs

store/postgres/src/store.rs

leoyvens · 2019-11-12T19:57:23Z

@Jannis thanks for the review, I've responded to all comments.

Jannis · 2019-11-13T17:39:56Z

@leoyvens Ok, just the one comment about _total_secs remains.

leoyvens · 2019-11-14T12:24:27Z

@Jannis renamed that.

Jannis

LGTM!

leoyvens requested a review from a team November 7, 2019 16:15

lutter reviewed Nov 7, 2019

View reviewed changes

graph/src/components/metrics/stopwatch.rs Show resolved Hide resolved

lutter approved these changes Nov 11, 2019

View reviewed changes

Jannis requested changes Nov 12, 2019

View reviewed changes

leoyvens added 14 commits November 14, 2019 09:08

metrics: Introduce stopwatch module for indexing time metrics

8e1851a

metrics: Simplify stopwatch

2f58801

metrics: Further simplify the stopwatch

0b8b897

metrics: Rename some methods

0653910

metrics: Add first measurements of indexing sections

c7b5eeb

runtime, metrics: Detail the sections per host export

e53e5ff

metrics: Add a counter for total syncing time

c23d826

store: Add metric for time spent waiting for connection

76cedd4

metrics: Disable stopwatch once the subgraph is synced

6ab368b

runtime, metrics: Add section store_get_asc_new

302846a

metrics: Add comment for StopwatchMetrics usage

02c382e

metrics/stopwatch: Improve comments

64dc410

metrics: rename get_conn to get_entity_conn

220b5df

metrics/stopwatch: Rename _total_secs to _sync_total_secs

97d47c4

leoyvens force-pushed the leo/indexing-time-instrumentation branch from 3866c0b to 97d47c4 Compare November 14, 2019 12:10

Jannis approved these changes Nov 14, 2019

View reviewed changes

leoyvens merged commit 107b768 into master Nov 14, 2019

leoyvens deleted the leo/indexing-time-instrumentation branch November 14, 2019 13:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing time instrumentation #1338

Indexing time instrumentation #1338

leoyvens commented Nov 7, 2019 •

edited

lutter commented Nov 7, 2019

leoyvens commented Nov 7, 2019 •

edited

Jannis commented Nov 7, 2019

lutter commented Nov 7, 2019

leoyvens commented Nov 11, 2019

lutter left a comment

leoyvens commented Nov 12, 2019

Jannis Nov 12, 2019

Jannis Nov 12, 2019

leoyvens Nov 12, 2019

Jannis Nov 13, 2019

leoyvens Nov 13, 2019

leoyvens commented Nov 12, 2019

Jannis commented Nov 13, 2019

leoyvens commented Nov 14, 2019

Jannis left a comment

Indexing time instrumentation #1338

Indexing time instrumentation #1338

Conversation

leoyvens commented Nov 7, 2019 • edited

lutter commented Nov 7, 2019

leoyvens commented Nov 7, 2019 • edited

Jannis commented Nov 7, 2019

lutter commented Nov 7, 2019

leoyvens commented Nov 11, 2019

lutter left a comment

Choose a reason for hiding this comment

leoyvens commented Nov 12, 2019

Jannis Nov 12, 2019

Choose a reason for hiding this comment

Jannis Nov 12, 2019

Choose a reason for hiding this comment

leoyvens Nov 12, 2019

Choose a reason for hiding this comment

Jannis Nov 13, 2019

Choose a reason for hiding this comment

leoyvens Nov 13, 2019

Choose a reason for hiding this comment

leoyvens commented Nov 12, 2019

Jannis commented Nov 13, 2019

leoyvens commented Nov 14, 2019

Jannis left a comment

Choose a reason for hiding this comment

leoyvens commented Nov 7, 2019 •

edited

leoyvens commented Nov 7, 2019 •

edited