-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing time instrumentation #1338
Conversation
One thing I am wondering: it might be useful to also measure how much time we spend in block reversions. Would that be terribly hard to add? |
@lutter I made this stop registering metrics when the subgraph becomes synced, I think what we care about measuring right now is how fast a subgraph becomes synced. Since reverts usually happen after the subgraph has synced, it doesn't fit with the theme of this PR, though it would be a good standalone metric to add. Shouldn't be hard to add, we have separate gauges for other things including number of blocks reverted. |
@leoyvens I'll review later but one quick comment: any chance we could benchmark the WASM memory allocation? You improved that recently but the time we spend there used to be a significant unknown between |
@leoyvens what made me mention reverts is that I have seen some queries related to reverts take a long time (something like 2.5s) and having a better understanding of how much time we spend there would be great. But I agree with your point about time-to-sync being more interesting right now. |
@Jannis I've added a section for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@Jannis do you want to take a look? Should I merge? |
let mut inner = StopwatchInner { | ||
total_counter: *registry | ||
.new_counter( | ||
format!("{}_total_secs", subgraph_id), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure total_secs
is the right term. Maybe {}_secs_to_synced
or {}_syncing_secs
? Who knows what we might use total
for in the future, maybe all time spent on a subgraph?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is fine actually, as the stopwatch metrics could be used in other places too. But then I'd maybe pass in a sync
prefix here that allows this to become {}_sync_total_secs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I hadn't considered the reuse potential. Should all metrics be prefixed with _sync
or just the one for the totals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially, I thought we should prefix all but I think just the totals works. {subgraph}_total_secs
is just too generic, but {subgraph}_store_get
(or whatever) makes sense even if it's just capture during syncing for now. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I'll change just the total_secs
to sync_total_secs
.
@Jannis thanks for the review, I've responded to all comments. |
@leoyvens Ok, just the one comment about |
3866c0b
to
97d47c4
Compare
@Jannis renamed that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Resolves #1302.
This adds Prometheus metrics for measuring the total time to sync a subgraph, broken down by sections. This PR marks some sections and a convenient API is introduced for marking sections as needed. This also introduces a metric for measuring the impact of DB contention on performance. Grafana panels were added to display these metrics, which looks like this (timings for syncing erc20 up to about block 5400000):
See https://github.com/graphprotocol/instrumentation/pull/1 for the panels.