Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

services/horizon: Add prometheus metrics to database queries #3597

Merged
merged 9 commits into from
May 17, 2021

Conversation

paulbellamy
Copy link
Contributor

@paulbellamy paulbellamy commented May 12, 2021

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

Add prometheus metrics for query count and duration to horizon database calls. Turns out the metrics we had before were actually missing the ingest and core database clients as well, so I've added the same metrics to those too.

The big code changes here are:

  • Actually use db.SessionInterface, instead of just using *db.Session everywhere. Nice cleanup anyway.
  • Have to pass the StateMiddleware and HistoryMiddleware into each route individually, as they have to happen after routing. Big OOF.

Why

So we can verify and really understand which endpoints are causing db load and why.

Fixes #3593

Known limitations

  • TODO: Needs a bit more testing, like firing it up and seeing it work.
  • I'm not sure if the register/unregister calls are in the right place. wdyt?
  • @jacekn Do these metrics and labels look right?

@paulbellamy paulbellamy requested review from jacekn and a team May 12, 2021 17:36
@ire-and-curses
Copy link
Member

Just want to confirm: are we getting raw numbers from these metrics? It would be great to be able to derive e.g. max and min as well as average for these calls.

@jacekn
Copy link
Contributor

jacekn commented May 12, 2021

@jacekn Do these metrics and labels look right?

Yeah they look good to me.
We're splitting one of them by route that that one will generate 225 new metrics so in total it should be 270 metrics which if my math is right. That's not too bad, it's less than 25% increase and those should be really useful

@jacekn
Copy link
Contributor

jacekn commented May 12, 2021

Just want to confirm: are we getting raw numbers from these metrics? It would be great to be able to derive e.g. max and min as well as average for these calls.

Averages we'll get out of the box - in prometheus we'll be able to divide total of all queries by number of queries to get average.
Min/max we won't get out of the box but it's doable too. We'll just need a small "recording rule" in prometheus (which is a common thing and very simple to set up). With the recording rule we'll be able to calculate min/max in any timerange we choose.

@paulbellamy paulbellamy merged commit 21155fc into stellar:master May 17, 2021
@paulbellamy paulbellamy deleted the paulb/db-metrics branch May 17, 2021 11:50
session, err := db.Open("postgres", databaseURL)
if err != nil {
log.Fatalf("cannot open Horizon DB: %v", err)
}

session.DB.SetMaxIdleConns(maxIdle)
session.DB.SetMaxOpenConns(maxOpen)
return session
return db.RegisterMetrics(session, "horizon", subsystem, registry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it changes metrics' names. Can we revert it? It will break Grafana dashboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

services/horizon: Add metrics on DB query rate, errors, and duration
5 participants