Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all: Go compiler/runtime performance monitoring system #48803

Open
prattmic opened this issue Oct 5, 2021 · 13 comments
Open

all: Go compiler/runtime performance monitoring system #48803

prattmic opened this issue Oct 5, 2021 · 13 comments

Comments

@prattmic
Copy link
Member

@prattmic prattmic commented Oct 5, 2021

The runtime/compiler team is investigating the creation of a performance monitoring system, with a primary goal of reducing release toil by catching performance regressions as early as possible.

Requirements are still being discussed, but highlights include:

  • Automation of performance data generation.
    • Benchmark execution solution that tracks commits to Go (e.g., x/build coordinator scheduling benchmark runs).
  • Storage and visualization of performance data over time.
  • Key set of "application-level" and "feature-focused" benchmarks.
  • Active monitoring of changepoints.
  • Easy debugging/reproduction.

cc @aclements @dr2chase @mknyszek @jeremyfaller @golang/release

@mknyszek mknyszek added this to the Unreleased milestone Oct 5, 2021
@mvdan
Copy link
Member

@mvdan mvdan commented Oct 5, 2021

Loading

@odeke-em
Copy link
Member

@odeke-em odeke-em commented Oct 5, 2021

cc @odeke-em

Thank you for the tag @mvdan!

Indeed, at Orijtech Inc we've been working on such a product "Bencher" for the past 2 quarters of 2021 and produced it recently https://twitter.com/odeke_et/status/1428650768135974919?s=21: we have an integration on the GitHub marketplace at https://github.com/marketplace/gobencher and we actually have something on our roadmap to add a Gerrit integration during Q4 2021 or Q1 2022. I had raised and shown the product to @Sajmani, @spf13, @ianlancetaylor and @ianthehat and others :-)

For example, take a look at one of our public users Entgo https://dashboard.github.orijtech.com/benchmark/2bb34b8e166747d2974f76825819acbd
image
image

We'd be delighted to work with y'all to bring this first class to the Go tooling ecosystem!

Kindly cc-ing my colleagues @cuonglm @kirbyquerby @willpoint @jhusdero

Loading

@mengzhuo
Copy link
Contributor

@mengzhuo mengzhuo commented Oct 6, 2021

I have some thoughts about this site/app:

  1. Transparent. All results should be available to public like benchsave instead of internal site from Google.
  2. Informative. Regression notice to Gerrit/owner of CL.
  3. Easy to compare with different baselines like major versions of Go.

Loading

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Oct 6, 2021

@mengzhuo All three of those things are already on our radar. :) Thanks.

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 6, 2021

Change https://golang.org/cl/353909 mentions this issue: cmd/coordinator: don't snapshot in dev mode

Loading

gopherbot pushed a commit to golang/build that referenced this issue Oct 6, 2021
In dev mode, we have no GCS storage client, so attempting to write a
snapshot will panic.

For golang/go#48803

Change-Id: Id37264ebd765f914a55acf2fd18274020850331f
Reviewed-on: https://go-review.googlesource.com/c/build/+/353909
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
@prattmic
Copy link
Member Author

@prattmic prattmic commented Oct 6, 2021

@odeke-em We are still evaluating our options. We've looked at several different off-the-shelf options, including Bencher, as well as extending the existing http://build.golang.org infrastructure with what we need. We are currently leaning towards extending http://build.golang.org, as we already maintain it and it supports much of what we need already (fully open source, precise control over dedicated hardware used for benchmarking, etc).

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 7, 2021

Change https://golang.org/cl/354629 mentions this issue: cmd/buildlet: add revdial test

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 7, 2021

Change https://golang.org/cl/354630 mentions this issue: all: handle revdial redirects on connect

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 7, 2021

Change https://golang.org/cl/354638 mentions this issue: cmd/coordinator: find work in dev mode

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 7, 2021

Change https://golang.org/cl/354637 mentions this issue: cmd/coordinator: make listen address configurable

Loading

gopherbot pushed a commit to golang/build that referenced this issue Oct 15, 2021
This replaces the broken test in internal/coordinator/pool.

For golang/go#48803

Change-Id: I7bd9265bba555562ffa7d59169a9c8792ed97d3c
Reviewed-on: https://go-review.googlesource.com/c/build/+/354629
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
gopherbot pushed a commit to golang/build that referenced this issue Oct 15, 2021
In dev mode, the coordinator uses hostPathHandler, which redirects
/reverse and /revdial to /farmer.golang.org/reverse, etc.

Establishing a revdial connection chokes on this redirect, as there it
expects to complete the protocol switch in a single request.

Add rudimentary redirect support via a helper so this works in dev mode.

Note that linkRewriter must implement http.Hijacker as
revdial.ConnHandler type-asserts the http.ResponseWriter to a Hijacker.

For golang/go#48803

Change-Id: I191fa6ff17bbd334203430f3c1f2c5e03db407ff
Reviewed-on: https://go-review.googlesource.com/c/build/+/354630
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
gopherbot pushed a commit to golang/build that referenced this issue Oct 15, 2021
Though it is a good default, it is cumbersome for dev mode to listen
only on localhost when developing over SSH. Add a -listen flag to allow
overriding this with any address.

For golang/go#48803

Change-Id: If01a0b44926a33f2aa01548319508166592936e3
Reviewed-on: https://go-review.googlesource.com/c/build/+/354637
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
gopherbot pushed a commit to golang/build that referenced this issue Oct 15, 2021
Start the main findWorkLoop in dev mode, to discover work to run from
the dashboard. In dev mode, was also replace the linux-amd64 builder
config with one using host-linux-amd64-localdev reverse buildlets.

This provides a complete lifecycle to test out builds with a local dev
coordinator and buildlet.

For golang/go#48803

Change-Id: I8ade6c8bccf3bc51437ca9e7d11c232753fe7465
Reviewed-on: https://go-review.googlesource.com/c/build/+/354638
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
@shawn-xdji
Copy link
Contributor

@shawn-xdji shawn-xdji commented Nov 17, 2021

Curious to know will benchmark fluctuation be taken into account, and how? Thanks.

Loading

@prattmic
Copy link
Member Author

@prattmic prattmic commented Nov 17, 2021

@shawn-xdji By fluctuation, I assume you mean noise in the results. We are taking a few approaches up front:

  • Running benchmarks on a low-noise machines. For now we plan to use "sole-tenant" VMs on GCE, though we may move to physical machines if necessary.
  • Rerunning the baseline benchmarks each time we test a new commit to help account for long term changes (e.g., OS upgrade changes performance).
  • Changepoint detection will need to statistical handle a small amount of noise.
  • We are only planning to track a curated set of benchmarks. Benchmarks that a fundamentally noisy will either be excluded or fixed.

Finally, all of this is something we'll need to learn on as we go and find out how big the issues are.

Loading

@shawn-xdji
Copy link
Contributor

@shawn-xdji shawn-xdji commented Nov 18, 2021

Thanks @prattmic, that answers my question, anticipate you will share some best practices later.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants