Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/pkgsite: improve fetch metrics and load-shedding #48010

Closed
jba opened this issue Aug 27, 2021 · 13 comments
Closed

x/pkgsite: improve fetch metrics and load-shedding #48010

jba opened this issue Aug 27, 2021 · 13 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. pkgsite
Milestone

Comments

@jba
Copy link
Contributor

jba commented Aug 27, 2021

Both load-shedding and fetch metrics apply only to getting the module from the proxy and processing it, not to inserting in the DB. But often that is where most of the time is spent.

Move them into the worker and include DB insertion. Also consider a DB metric as an additional load-shedding signal, like number of active queries or number of queries blocked on locks.

@jba jba added NeedsFix The path to resolution is known, but the work has not been done. pkgsite labels Aug 27, 2021
@jba jba added this to the pkgsite/unplanned milestone Aug 27, 2021
@jba jba self-assigned this Aug 27, 2021
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/346729 mentions this issue: internal/proxy: cache info and mod endpoints

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/346749 mentions this issue: internal/{worker/fetch}: include DB activity in load-shedding

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/346750 mentions this issue: internal/{fetch,worker}: include DB insertin in fetch metrics

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/346751 mentions this issue: internal/fetch: remove ZipSize from getters

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/346809 mentions this issue: internal/{fetch,worker}: fetch info covers DB insertion

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/346810 mentions this issue: cmd/pkgsite: remove dcensus

gopherbot pushed a commit to golang/pkgsite that referenced this issue Sep 1, 2021
Add more caching to the proxy client so we can call Info and Mod
multiple times during a fetch without worrying about wasted RPCs to
the proxy.

This will enable moving the load shedder, which requires its own info
call, out of the fetch logic and into the worker.

For golang/go#48010

Change-Id: I4e875b1fd5b968aae174cfb93f4cf3a9a2b7a577
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/346729
Trust: Jonathan Amsterdam <jba@google.com>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Julie Qiu <julie@golang.org>
gopherbot pushed a commit to golang/pkgsite that referenced this issue Sep 1, 2021
Move the load-shedding logic to the worker and have it span both the
fetch and processing of the module (as previously) as well as
inserting it into the database.

This is a more accurate estimation of load, since running a lot of
concurrent queries definitely slows down processing.

Most of the time this won't make much difference, but under high load,
such as when processing multiple large modules, it will reduce DB
contention and should result in greater throughput.

For golang/go#48010

Change-Id: I7d0922e02d00182e867fd3b29fc284c32ecab5ee
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/346749
Trust: Jonathan Amsterdam <jba@google.com>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Julie Qiu <julie@golang.org>
gopherbot pushed a commit to golang/pkgsite that referenced this issue Sep 1, 2021
Fetch latency and related metrics include the time spent inserting
into the DB, as well as the time to fetch and process the module.

For golang/go#48010

Change-Id: I1d685bd25f1b632b0b20de5b1bfac5003bff0caa
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/346750
Trust: Jonathan Amsterdam <jba@google.com>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Julie Qiu <julie@golang.org>
gopherbot pushed a commit to golang/pkgsite that referenced this issue Sep 1, 2021
A ModuleGetter no longer needs the ZipSize method, because
load-shedding has been moved into the worker, where it uses the proxy
client directly.

For golang/go#48010

Change-Id: I01eb0b88ac758e83be20333b73b4315985fc9d8e
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/346751
Trust: Jonathan Amsterdam <jba@google.com>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Julie Qiu <julie@golang.org>
gopherbot pushed a commit to golang/pkgsite that referenced this issue Sep 1, 2021
Change the FetchInfo data, used for the worker home page, to include
DB insertion.

For golang/go#48010

Change-Id: Id2ba42b96ebc0a93d7a13c7013ace0c9860a2e11
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/346809
Trust: Jonathan Amsterdam <jba@google.com>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Julie Qiu <julie@golang.org>
gopherbot pushed a commit to golang/pkgsite that referenced this issue Sep 1, 2021
Since the fetch code no longer records metrics, the pkgsite command
doesn't need to talk to OpenCensus.

For golang/go#48010

Change-Id: I2793046725cd8c66ddfb585370ba2bf4890a9366
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/346810
Trust: Jonathan Amsterdam <jba@google.com>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Julie Qiu <julie@golang.org>
@jba
Copy link
Contributor Author

jba commented Sep 9, 2021

Moved the load-shedding over. Not going to do the DB metric, no need for it at present.

@jba jba closed this as completed Sep 9, 2021
@jba
Copy link
Contributor Author

jba commented Sep 10, 2021

When we process many versions of the same module, we get a lot of "max serialization" errors, probably because the transactions are all waiting for the same lock or locks. These errors are harmless in a sense, since the fetches will be retried, but they complicate ongoing maintenance because they appear to be genuine errors. We'd rather prevent them from occurring in the first place.

We could just load-shed when we see another version of the same module being processed, but in an attempt to be more general, we will explore a load-shedding metric based on lock contention.

@jba jba reopened this Sep 10, 2021
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/348933 mentions this issue: internal/postgres: get DB user info

gopherbot pushed a commit to golang/pkgsite that referenced this issue Sep 10, 2021
Add a method for getting information about a DB user. We plan to use
this on the worker to see if we can make better load-shedding
decisions.

For golang/go#48010

Change-Id: I80f2811f657ac47d94446a47a38e00502ae29ae8
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/348933
Trust: Jonathan Amsterdam <jba@google.com>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Jamal Carvalho <jamal@golang.org>
TryBot-Result: kokoro <noreply+kokoro@google.com>
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/349309 mentions this issue: internal/worker: display DB info on home page

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/349310 mentions this issue: {cmd,internal}/worker: export DB info metrics

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/349312 mentions this issue: internal/worker: load shed based on DB lock contention

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/349311 mentions this issue: internal/worker: move load shedder into server

gopherbot pushed a commit to golang/pkgsite that referenced this issue Sep 13, 2021
Add DB process and lock information to the worker home page.

For golang/go#48010

Change-Id: Idab82180a33ce2d00350df0bbfeeb58b2a628ae8
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/349309
Trust: Jonathan Amsterdam <jba@google.com>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jamal Carvalho <jamal@golang.org>
Reviewed-by: Julie Qiu <julie@golang.org>
gopherbot pushed a commit to golang/pkgsite that referenced this issue Sep 13, 2021
Add metrics for the numbers of active and waiting DB processes.

For golang/go#48010

Change-Id: Ia3c14e492b29c07371ee903182c7ba55f04c584a
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/349310
Trust: Jonathan Amsterdam <jba@google.com>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jamal Carvalho <jamal@golang.org>
Reviewed-by: Julie Qiu <julie@golang.org>
gopherbot pushed a commit to golang/pkgsite that referenced this issue Sep 13, 2021
This will enable using DB metrics for load shedding.

For golang/go#48010

Change-Id: Ie61da82e833d376d36f74c55266a11346855c5ff
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/349311
Trust: Jonathan Amsterdam <jba@google.com>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jamal Carvalho <jamal@golang.org>
Reviewed-by: Julie Qiu <julie@golang.org>
gopherbot pushed a commit to golang/pkgsite that referenced this issue Sep 13, 2021
If many DB processes are waiting for locks, shed load.

This is in addition to the existing load-shedding rule based on zip
size.

I conducted an informal experiment to see how this worked. I queued up
82 versions of github.com/aws/aws-sdk-go for processing. That module
has a small zip, so no shedding occurs because of zip size, but it
takes some time to process.

Without lock-based load-shedding, there were 157 "max serialization"
errors. Most of the time there were many active fetches in progress,
almost all waiting for locks.

With lock-based load-shedding, there were only 48 "max serialization"
errors. Most fetches completed quickly.

For golang/go#48010

Change-Id: I0cee02b9c4085a8bc187d803eaca2f30ddd378b5
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/349312
Trust: Jonathan Amsterdam <jba@google.com>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Jamal Carvalho <jamal@golang.org>
Reviewed-by: Julie Qiu <julie@golang.org>
@jba jba closed this as completed Sep 13, 2021
@rsc rsc unassigned jba Jun 23, 2022
@golang golang locked and limited conversation to collaborators Jun 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. pkgsite
Projects
None yet
Development

No branches or pull requests

3 participants