-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/pkgsite: improve fetch metrics and load-shedding #48010
Comments
Change https://golang.org/cl/346729 mentions this issue: |
Change https://golang.org/cl/346749 mentions this issue: |
Change https://golang.org/cl/346750 mentions this issue: |
Change https://golang.org/cl/346751 mentions this issue: |
Change https://golang.org/cl/346809 mentions this issue: |
Change https://golang.org/cl/346810 mentions this issue: |
Add more caching to the proxy client so we can call Info and Mod multiple times during a fetch without worrying about wasted RPCs to the proxy. This will enable moving the load shedder, which requires its own info call, out of the fetch logic and into the worker. For golang/go#48010 Change-Id: I4e875b1fd5b968aae174cfb93f4cf3a9a2b7a577 Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/346729 Trust: Jonathan Amsterdam <jba@google.com> Run-TryBot: Jonathan Amsterdam <jba@google.com> TryBot-Result: kokoro <noreply+kokoro@google.com> Reviewed-by: Julie Qiu <julie@golang.org>
Move the load-shedding logic to the worker and have it span both the fetch and processing of the module (as previously) as well as inserting it into the database. This is a more accurate estimation of load, since running a lot of concurrent queries definitely slows down processing. Most of the time this won't make much difference, but under high load, such as when processing multiple large modules, it will reduce DB contention and should result in greater throughput. For golang/go#48010 Change-Id: I7d0922e02d00182e867fd3b29fc284c32ecab5ee Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/346749 Trust: Jonathan Amsterdam <jba@google.com> Run-TryBot: Jonathan Amsterdam <jba@google.com> TryBot-Result: kokoro <noreply+kokoro@google.com> Reviewed-by: Julie Qiu <julie@golang.org>
Fetch latency and related metrics include the time spent inserting into the DB, as well as the time to fetch and process the module. For golang/go#48010 Change-Id: I1d685bd25f1b632b0b20de5b1bfac5003bff0caa Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/346750 Trust: Jonathan Amsterdam <jba@google.com> Run-TryBot: Jonathan Amsterdam <jba@google.com> Reviewed-by: Julie Qiu <julie@golang.org>
A ModuleGetter no longer needs the ZipSize method, because load-shedding has been moved into the worker, where it uses the proxy client directly. For golang/go#48010 Change-Id: I01eb0b88ac758e83be20333b73b4315985fc9d8e Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/346751 Trust: Jonathan Amsterdam <jba@google.com> Run-TryBot: Jonathan Amsterdam <jba@google.com> Reviewed-by: Julie Qiu <julie@golang.org>
Change the FetchInfo data, used for the worker home page, to include DB insertion. For golang/go#48010 Change-Id: Id2ba42b96ebc0a93d7a13c7013ace0c9860a2e11 Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/346809 Trust: Jonathan Amsterdam <jba@google.com> Run-TryBot: Jonathan Amsterdam <jba@google.com> TryBot-Result: kokoro <noreply+kokoro@google.com> Reviewed-by: Julie Qiu <julie@golang.org>
Since the fetch code no longer records metrics, the pkgsite command doesn't need to talk to OpenCensus. For golang/go#48010 Change-Id: I2793046725cd8c66ddfb585370ba2bf4890a9366 Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/346810 Trust: Jonathan Amsterdam <jba@google.com> Run-TryBot: Jonathan Amsterdam <jba@google.com> Reviewed-by: Julie Qiu <julie@golang.org>
Moved the load-shedding over. Not going to do the DB metric, no need for it at present. |
When we process many versions of the same module, we get a lot of "max serialization" errors, probably because the transactions are all waiting for the same lock or locks. These errors are harmless in a sense, since the fetches will be retried, but they complicate ongoing maintenance because they appear to be genuine errors. We'd rather prevent them from occurring in the first place. We could just load-shed when we see another version of the same module being processed, but in an attempt to be more general, we will explore a load-shedding metric based on lock contention. |
Change https://golang.org/cl/348933 mentions this issue: |
Add a method for getting information about a DB user. We plan to use this on the worker to see if we can make better load-shedding decisions. For golang/go#48010 Change-Id: I80f2811f657ac47d94446a47a38e00502ae29ae8 Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/348933 Trust: Jonathan Amsterdam <jba@google.com> Run-TryBot: Jonathan Amsterdam <jba@google.com> Reviewed-by: Jamal Carvalho <jamal@golang.org> TryBot-Result: kokoro <noreply+kokoro@google.com>
Change https://golang.org/cl/349309 mentions this issue: |
Change https://golang.org/cl/349310 mentions this issue: |
Change https://golang.org/cl/349312 mentions this issue: |
Change https://golang.org/cl/349311 mentions this issue: |
Add DB process and lock information to the worker home page. For golang/go#48010 Change-Id: Idab82180a33ce2d00350df0bbfeeb58b2a628ae8 Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/349309 Trust: Jonathan Amsterdam <jba@google.com> Run-TryBot: Jonathan Amsterdam <jba@google.com> TryBot-Result: kokoro <noreply+kokoro@google.com> Reviewed-by: Jamal Carvalho <jamal@golang.org> Reviewed-by: Julie Qiu <julie@golang.org>
Add metrics for the numbers of active and waiting DB processes. For golang/go#48010 Change-Id: Ia3c14e492b29c07371ee903182c7ba55f04c584a Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/349310 Trust: Jonathan Amsterdam <jba@google.com> Run-TryBot: Jonathan Amsterdam <jba@google.com> TryBot-Result: kokoro <noreply+kokoro@google.com> Reviewed-by: Jamal Carvalho <jamal@golang.org> Reviewed-by: Julie Qiu <julie@golang.org>
This will enable using DB metrics for load shedding. For golang/go#48010 Change-Id: Ie61da82e833d376d36f74c55266a11346855c5ff Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/349311 Trust: Jonathan Amsterdam <jba@google.com> Run-TryBot: Jonathan Amsterdam <jba@google.com> TryBot-Result: kokoro <noreply+kokoro@google.com> Reviewed-by: Jamal Carvalho <jamal@golang.org> Reviewed-by: Julie Qiu <julie@golang.org>
If many DB processes are waiting for locks, shed load. This is in addition to the existing load-shedding rule based on zip size. I conducted an informal experiment to see how this worked. I queued up 82 versions of github.com/aws/aws-sdk-go for processing. That module has a small zip, so no shedding occurs because of zip size, but it takes some time to process. Without lock-based load-shedding, there were 157 "max serialization" errors. Most of the time there were many active fetches in progress, almost all waiting for locks. With lock-based load-shedding, there were only 48 "max serialization" errors. Most fetches completed quickly. For golang/go#48010 Change-Id: I0cee02b9c4085a8bc187d803eaca2f30ddd378b5 Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/349312 Trust: Jonathan Amsterdam <jba@google.com> Run-TryBot: Jonathan Amsterdam <jba@google.com> Reviewed-by: Jamal Carvalho <jamal@golang.org> Reviewed-by: Julie Qiu <julie@golang.org>
Both load-shedding and fetch metrics apply only to getting the module from the proxy and processing it, not to inserting in the DB. But often that is where most of the time is spent.
Move them into the worker and include DB insertion. Also consider a DB metric as an additional load-shedding signal, like number of active queries or number of queries blocked on locks.
The text was updated successfully, but these errors were encountered: