New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: observability using distributed tracing and metrics #26779

Open
odeke-em opened this Issue Aug 2, 2018 · 2 comments

Comments

Projects
None yet
3 participants
@odeke-em
Member

odeke-em commented Aug 2, 2018

I am coming here from https://groups.google.com/forum/#!msg/golang-dev/MdwFiAx5-PU/UiUvY-8_DwAJ

The OpenCensus project https://opencensus.io/ provides observability into distributed systems(monoliths and microservices alike) by providing mechanisms to record traces and metrics. Those signals help provide insight into the states of a distributed system.

I presented a talk about OpenCensus at GoSF on 18th July 2018(about 3 weeks ago) and I posted the accompanying slides here https://cdn.rawgit.com/orijtech/talks/master/2018/07/18/gosf/gosf.htm#1 or better https://github.com/orijtech/talks/blob/master/2018/07/18/gosf/gosf.slide for the Go present slide

The value of it

Traces can help give play-by-play action/visibility into the state of sampled requests e.g. we can see that invoking os/exec took this long while fetching metadata from Google Cloud Storage took this long https://cdn.rawgit.com/orijtech/talks/master/2018/07/18/gosf/gosf.htm#14

The metrics that are collected are useful to actively check the health of the system e.g. send alerts to the x/build authors when a trybot run takes say 8 minutes or when overall the p99th latency hits 10 minutes.

Maintenance and technical debt

In regards to maintenance, the OpenCensus Go implementation https://github.com/census-instrumentation/opencensus-go implements the tracer, metrics, and we just use the packages to instrument our code e.g excerpted from my slides https://cdn.rawgit.com/orijtech/talks/master/2018/07/18/gosf/gosf.htm#13

func search(w http.ResponseWriter, r *http.Request) {
    ctx, span := trace.StartSpan(r.Context(), "Search")
    defer span.End()

    // Use the context and the rest of the code goes below
    _ = ctx
}

To extract out data, we just need to add an "exporter"/liason-to-our-backend of choice in a main function for example to send traces to Stackdriver

package main

import (
    "log"

    "contrib.go.opencensus.io/exporter/stackdriver"
    "go.opencensus.io/trace"
)

func main() {
    sd, err := stackdriver.NewExporter(stackdriver.Options{ProjectID: "census-demos"})
    if err != nil {
        log.Fatalf("Failed to register Stackdriver Trace exporter: %v", err)
    }
    trace.RegisterExporter(sd)
}

Maintenance work is detached from the Go project, since the OpenCensus project is staffed already with collaborators from a wide range of companies. The Go project only needs to import the respective libraries, start and stop traces as well as record metrics and finally create exporters of the desired backend e.g. Prometheus, Zipkin, AWS X-Ray, Jaeger, Stackdriver Tracing and Monitoring, SignalFx etc.

Next steps

I finally got some dev cycles this quarter to help work on improving our build system but I also would be delighted to delegate/work with people in the community too -- hence why I am filing this right now.

/cc @basvanbeek @ramonza @bogdandrutu @rakyll @kevinburke

@gopherbot gopherbot added this to the Unreleased milestone Aug 2, 2018

@gopherbot gopherbot added the Builders label Aug 2, 2018

@gopherbot

This comment has been minimized.

gopherbot commented Sep 29, 2018

Change https://golang.org/cl/138522 mentions this issue: cmd/coordinator: use OpenCensus for Stackdriver metrics

@gopherbot

This comment has been minimized.

gopherbot commented Sep 30, 2018

Change https://golang.org/cl/138523 mentions this issue: cmd/coordinator: initial tracing and metrics using OpenCensus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment