Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
LICENSE
README.md
helpers.go
main.go
types.go

README.md

golang-gatekeeper

Overview

This is an example application modeled after Honeycomb's API server (codenamed Gatekeeper). To see real data output by an app like this, and to learn about a real issue surfaced by exploring production data with Honeycomb, check out our Gatekeeper Tour Guide.

This app implements the /1/events/ endpoint of our public API This app was written in order to show how instrumentation moves through a more mature application - specifically addressing concepts like:

  • Context propagation
  • Adding complex objects to instrumentation
  • How timers fit in larger blocks of logic

Download or Build

You can download the code by cloning the repository.

Build it with cd golang-gatekeeper; go get ./...; go build -o gatekeeper

Run the server

Before you launch the server, set an environment variable HONEYCOMB_WRITEKEY to your team API key, available from your account page. This will configure the server to send instrumentation events to Honeycomb in a dataset called gatekeeper

Running the gatekeeper server will start it listening on port 8080. Send HTTP POSTs to the /1/events/<dataset-name> endpoint with an X-Honeycomb-Team header containing one of the hardcoded api keys and with a body containing a JSON object. The server will process the POST and if everything looks good, eventually write it to /tmp/api#.log, where # is a partition number (between 1-5).

Sending sample traffic to the server

example of sending an event to the golang-gatekeeper server:

➜  curl -v localhost:8080/1/events/wade \
    -H "X-Honeycomb-Team: abcd123EFGH" \
    -H "X-Honeycomb-Event-Time: 2018-07-03T20:59:08.832016791Z" \
    -d '{"foo":3}'

If you want to just see the instrumentation events on the command line instead of sending them to Honeycomb, change the beeline.Init function in main() to uncomment the STDOUT: true line. After rebuilding and restarting, instrumentation events will appear as line delimited JSON objects on STDOUT.

Hard-coded values for X-Honeycomb-Team (required header):

  • abcd123EFGH
  • ijkl456MNOP
  • qrst789UVWX

Hard-coded values for dataset names (required in the URL):

  • wade
  • james
  • helen
  • peter
  • valentine
  • andrew

Expected values for X-Honeycomb-Samplerate (optional): positive integers. If absent, defaults to 1

Expected values for X-Honeycomb-Event-Time (optional): RFC3339 formatted timestamps. If absent, defaults to the current time

Any values other than those listed above will cause a HTTP status 401 (for api keys) or 400 (for dataset names) to be returned by the server along with an error message. Errors can also be generated by sending malformed JSON or broken expected headers

Instrumentation events generated

Example output instrumentation events you'll see in Honeycomb:

{
  "data": {
    "app.X-Honeycomb-Event-Time": "2018-07-03T20:59:08.832016791Z",
    "app.X-Honeycomb-Samplerate": "",
    "app.X-Honeycomb-Team": "ijkl456MNOP",
    "app.chosen_partition": 3,
    "app.dataset": {
      "ID": 1,
      "Name": "wade",
      "PartitionList": [1,2,3]
    },
    "app.event_columns": 1,
    "app.event_time": "2018-07-03T20:59:08.832016791Z",
    "app.event_time_delta_sec": 246435.65574261,
    "app.gzipped": false,
    "app.hitSchemaCache": false,
    "app.sample_rate": 1,
    "app.team": {
      "ID": 2,
      "Name": "b&w",
      "WriteKey": "ijkl456MNOP"
    },
    "app.timer.get_headers_dur_ms": 0.051346,
    "app.timer.get_partition_dur_ms": 0.001852,
    "app.timer.get_schema_dur_ms": 46.559893,
    "app.timer.resolve_dataset_dur_ms": 0.005554,
    "app.timer.unmarshal_json_dur_ms": 0.107933,
    "app.timer.validate_writekey_dur_ms": 0.001789,
    "duration_ms": 49.061776,
    "gorilla.vars.datasetName": "wade",
    "handler.fnname": "main.handleEvent",
    "handler.route": "/1/events/{datasetName}",
    "meta.beeline_version": "0.1.0",
    "meta.local_hostname": "cobbler.local",
    "meta.memory_inuse": 3275928,
    "meta.num_goroutines": 3,
    "meta.process_uptime_sec": 10,
    "meta.type": "http",
    "name": "main.handleEvent",
    "request.content_length": 9,
    "request.header.user_agent": "curl/7.54.0",
    "request.host": "",
    "request.http_version": "HTTP/1.1",
    "request.method": "POST",
    "request.path": "/1/events/wade",
    "request.remote_addr": "[::1]:52046",
    "response.status_code": 200,
    "trace.span_id": "24e437d5-db44-4d56-9595-9a5352928287",
    "trace.trace_id": "7bb7f70d-0ed8-45f5-818a-dd947fbb2309"
  },
  "time": "2018-07-06T10:26:24.486657235-07:00"
}

Discussion of the instrumentation

Overview

Before getting into instrumentation, let's talk about the app itself: what does it do? The sample app is a stripped down version of the Honeycomb API server. It accepts events, validates them, and writes them to disk. While the actual Honeycomb API server has more endpoints, does some more management, and writes events to Kafka instead of a local disk, but this framework is enough to give a feel for how to instrument a real production service for improved observability.

We started with the Honeycomb Beeline for Go to gather the most common aspects of the incoming HTTP requests (remote IP, url, HTTP method and version, etc.); then add custom user metadata fields, custom transaction metadata fields, and a handful of useful timers. (We especially love timers around calls out to third-party services!)

This sample app doesn't create additional spans to form a trace through the code and simply emits events to Honeycomb for visualization. If the app was more complex, we may benefit from real traces (for example, if it actually called out to a database, that'd be a great child span) -- but as it is, the app is sufficiently simple that a series of well-named timers are enough.

The Path of Execution

Once it receives an event in the front door (at /1/events/<dataset_name>), it steps through a number of small tasks:

  • get HTTP header values to be used for event metadata
  • authenticate the event's write key
  • fetch the dataset object to which this event will be added
  • parse the JSON body of the event
  • validate the schema in the database against what's in the event
  • hand off the validated event downstream

Each of these tasks has a timer around it so we know how the total time spent working on the request is broken up into pieces. This is a common question when looking at a service: "It's slow! Why?!" It's useful to preemptively measure spots in the code that you know are going to be slow, but sometimes you miss interesting bits. That's OK! Instrumentation is an ongoing relationship with your code.

Details

Ok, so let's step through a few bits.

Initialization

Our server uses gorilla to route HTTP requests, which makes it a great target for middleware. After we do our basic initialization (see code):

beeline.Init(beeline.Config{/* config here */}}

we'll add our Beeline's gorilla middleware (see code):

r.Use(hnygorilla.Middleware)

and wrap our router in our Beeline's HTTP wrapper, which is useful for catching unhandled routes (see code):

log.Fatal(http.ListenAndServe(":8080", hnynethttp.WrapHandler(r)))

Propagating context

The Beeline relies on the request context (see godoc) on each incoming request to propagate the instrumentation event. In order to give the entire app access to the event in order to add custom fields, we pass the context throughout the application.

In functions that interact with the request itself (such as getHeaders - see code), passing the entire request is sufficient.

if err := getHeaders(r, ev); err != nil { /* ... */ }

In any function that does not need access to the request, we only pass down the context (such as validateWritekey - see code).

team, err := validateWritekey(r.Context(), ev.WriteKey)

Passing the context around as the first parameter to functions is a standard way of handling extra context in Go, and you'll see it echoed in many other packages. It is important to pass it along both so that all your code can easily add fields to your instrumentation events, but also to enable propagating tracing information to downstream services. By using the context and threading it through your application, you're setting yourself up to make integrating with tracing easier in the future.

Timers

Each interesting function gets a timer added. The defer function in Go is a beautiful way of ensuring that the timer always measures the entirety of the function being measured—no matter how the function exits (via an early return, a normal return, or even a panic!) the timer will still get recorded and added to the event. Combining these timers with the presence or lack of error fields and other attributes on the event will let you understand not just how long each function took but why it took that long.

Here's an example of capturing the length of time necessary to unmarshal JSON (see code):

// add a timer around unmarshalling json
defer func(t timer.Timer) {
  dur := t.Finish()
  beeline.AddField(r.Context(), "timer.unmarshal_json_dur_ms", dur)
}(timer.Start())

The astute observer will notice that most of these timers are measured in milliseconds - but there are a few fields added to the event that are measured in seconds: event_time_delta_sec and meta.process_uptime_sec. This is an example of the author adding some additional understanding of the application into the instrumentation. We know that these processes are long running, so measuring process uptime in seconds (or even minutes or hours!) makes it easier to interact with the values in that field. In a similar way, most events arrive at the API server in with a delay in the range of a half second to five minutes. Assigning fields the most appropriate unit to allow the people looking at that instrumentation later on the easiest path to comprehension makes the entire experience more comfortable for all involved.

Sanitization

The userFacingErr function is used to separate the information handed back to the client of the API server (aka the customer) and the operator. In many cases it's valuable to sanitize internal errors within a server before handing them back to the client in order to keep the internal implementation of a service hidden while still providing useful language to the consumer of the application. It's very useful for the operator of the service to have access to both (the user's version for responding to customer requests and the detailed version to understand the internal problem). To ride this balance we provide multiple fields for each error - the actual error returned by the code and then some context about that error.