Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/pkgsite: select healthy DB per request #40444

jba opened this issue Jul 28, 2020 · 3 comments

x/pkgsite: select healthy DB per request #40444

jba opened this issue Jul 28, 2020 · 3 comments
FeatureRequest NeedsInvestigation pkgsite


Copy link

@jba jba commented Jul 28, 2020

Currently, our service processes connect to a single DB on startup. If that DB enters a bad state while running, the process continues to run, serving 500s.

Instead, we should pick from a set of healthy DBs on each request.

@jba jba self-assigned this Jul 28, 2020
@gopherbot gopherbot added this to the Unreleased milestone Jul 28, 2020
Copy link

@gopherbot gopherbot commented Jul 28, 2020

Change mentions this issue: internal/frontend: get a DataSource on each request

gopherbot pushed a commit to golang/pkgsite that referenced this issue Jul 28, 2020
Reorganize the server so that each request gets its own DataSource,
instead of using a single DataSource for every request.

Currently, the behavior doesn't change because we do in fact use
the same DataSource for every request. But this paves the way
to having a pool of health-checked DB connections, while still
having each request work with a single connection.

For golang/go#40444.

Change-Id: I717450593a8dcfd5689a8d28f634324776305042
Reviewed-by: Julie Qiu <>
@julieqiu julieqiu added the NeedsInvestigation label Jul 28, 2020
@julieqiu julieqiu removed this from the Unreleased milestone Aug 27, 2020
@julieqiu julieqiu added this to the pkgsite/unreleased milestone Aug 27, 2020
Copy link
Contributor Author

@jba jba commented Sep 11, 2020

Alternative to picking a healthy DB connection per request:

  1. On startup, we connect to one of the configured DBs. All requests use the same connection. (This is the way it works now).
  2. We implement a health check that pings the DB.

This will cause the health check watcher (AppEngine or the GKE load balancer) to quickly kill processes that are connected to a bad DB, and start new ones.

Copy link

@heschi heschi commented Sep 14, 2020

That approach will work well assuming a failure mode where one database becomes completely unavailable. If instead all the databases become 5% unavailable, e.g. because of a bug that overloads them all, they won't find a database that works all the time. Depending on the exact characteristics of the failure, how fast the instances get killed, and how fast they restart, that could end up making a minor outage into a very large one.

There's also some risk with pointing each instance to a single database; after an outage on one replica, all the frontend instances will be pointed at the other replica.

Since the failures we've seen so far are (AFAIK) total outages of one replica, this might be fine. But an approach that doesn't crash would be better in all respects, except perhaps for development ease.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
FeatureRequest NeedsInvestigation pkgsite
None yet

No branches or pull requests

4 participants