# [Honeycomb](https://www.honeycomb.io/)
Fast exploration over high cardinality data.

## Getting started


Check the [quick start tutorial](https://docs.honeycomb.io/quick-start) to learn about Honeycomb UI capabilities. 
Then join the LSST SQuaRE team using [this invitation](https://ui.honeycomb.io/join_team/lsst-square), the datasets created in this notebook are shared with the team.



## What is observability?

The Honeycomb [observability manifesto](https://www.honeycomb.io/blog/2018/03/observability-a-manifesto/) is a good place to start. Also check the introduction to observability [here](https://docs.honeycomb.io/thinking-about-observability/intro-to-observability/).

NOTE: KPMs are high level metrics, they are not the kind of 'value' that one would want for observability. However,
in this notebook we'll send KPMs to Honeycomb just because KPMs are the only data that we have in SQuaSH so far. That's still useful to illustrate the use of `libhoney` and `beeline-python` integrations.

In my opinion, the benefit of this tool will become clearer by sending more *context data* from our DM pipeline tasks, and ultimately as an analysis tool for the Engineering and Facilities Database (EFD). 


## Events and Datasets
Honeycomb data is a series of **Events**, each of which represents something in your environment worth tracking. When you send events to Honeycomb, you collect related or recurring events under a single **Dataset**.

An **Event** can be anything:

- an HTTP request to your app
- An SQL query
- A job is submitted to a queue
- An execution of a LSST DM pipeline task
- A detection of particular astronomical object 

**Datasets** are used to partition your data into separate and queryable sets. See [best practices for defining datasets](https://docs.honeycomb.io/getting-data-in/datasets/best-practices/) in Honeycomb. 

Let's start by identifying a `validate_drp` run as the `Event` containing the metrics measured by each run. 

For illustration purposes we'll collect those metrics from the SQuaSH API:

In [None]:
SQUASH_API_URL = "https://squash-restful-api-demo.lsst.codes/"

In [None]:
import requests
r = requests.get(SQUASH_API_URL + "/job/1").json()
    
    
data = {'id': r['id'],
        'date_created': r['date_created'],
        'filter_name': r['meta']['filter_name'],
        'dataset': r['ci_dataset']}
            
# events should be flat dict
for meas in r['measurements']:
    data[meas['metric']] = meas['value']
    
data

This looks like an structured log entry, it also looks like a property set or it may recall something else...

### Sending events to Honeycomb

1. When honeycomb gets a request to add an event to a dataset that doesn't exist yet, it creates the dataset.
2. The dataset schema is inferred automatically. It also will infer automatically if you add new fields to your events.

3. Events are added to the dataset for querying.

NOTE: In order to execute this cell you will need the HONEY_API_KEY. It is available from "Team Settings" in the UI once you join the `lsst-square` team.

In [None]:
import libhoney

HONEY_API_KEY = ""  # Obtained from the UI under "Team Settings"

libhoney.init(writekey=HONEY_API_KEY, dataset="single-event-demo")
libhoney.send_now(data)

That's it! Check the new dataset at https://ui.honeycomb.io/lsst-square/datasets

NOTE: this demo dataset may already exist from previous executions of this notebook, make sure you delete it before running the notebook if you want to start from an empty dataset. If you don't, note that they will be inserted in the existing dataset.

### Now, let's send all squash data to Honeycomb
Actually, just the scalar metrics and some context information.

Create the `squash-demo` dataset:

In [None]:
libhoney.init(writekey=HONEY_API_KEY, dataset="squash-demo")
builder = libhoney.Builder()

Now loop over all jobs:

In [None]:
from datetime import datetime
    
jobs = requests.get(SQUASH_API_URL + "/jobs").json()

for job_id in jobs['ids']:

    r = requests.get(SQUASH_API_URL + "/job/{}".format(job_id)).json()
    
    if r['ci_dataset'] == 'unknown' or r['ci_dataset'] == 'decam':
        continue
    
    print('Sending event for job {}...'.format(job_id))

    
    # Spawn a new event and override the timestamp
    event = builder.new_event()
    event.add_field('id', job_id)
    event.add_field('filter_name', r['meta']['filter_name'])
    event.add_field('dataset', r['ci_dataset'])
    
    for meas in r['measurements']:
        event.add_field(meas['metric'], meas['value'])
    
    event.created_at = datetime.strptime(r['date_created'], "%Y-%m-%dT%H:%M:%SZ")
    
    event.send()

## Querying the dataset

Querying a Honeycomb dataset in a particular way will produce a series of events (or a time series). The query builder is great tool for fast exploration from the Honeycomb UI. It is also possible to [specify queries programatically](https://docs.honeycomb.io/api/query-specification/).

The SQuaSH `validate_drp.AM1` metric on a given dataset and filter can be obtained from this query:

In [None]:
query = {
    "breakdowns": [
        "dataset", "filter_name"
    ],
    "calculations": [
        {"column": "validate_drp.AM1", "op": "AVG"}
    ],
    "filters":[
      {"column": "dataset", "op": "=", "value": "validation_data_hsc"},
      {"column": "filter_name", "op": "=", "value": "HSC-R"}
    ],
    "filter_combination": "AND"
}

## Markers
Markers are annotations over the time series plot. Markers are defined per dataset and can be created programatically via the [UI, a CLI](https://docs.honeycomb.io/working-with-data/markers/), or the [Markers API](https://docs.honeycomb.io/api/markers/).


In [None]:
headers = {'X-Honeycomb-Team': HONEY_API_KEY}

marker={"message": "Testing marker API"}

r = requests.post("https://api.honeycomb.io/1/markers/squash-demo", json=marker, headers=headers)
r.json()

All the Markers in a dataset may be retrieved by:

In [None]:
r = requests.get("https://api.honeycomb.io/1/markers/squash-demo", headers=headers)
r.json()

## Triggers

Given a time series and a threshold one can use the [Triggers API](https://docs.honeycomb.io/api/triggers/) for setting alerts when the values pass a threshold. Triggers are also defined per dataset. Honeycomb provides integration with Slack for alert notification. 

Let's set an alert for the `validate_drp.AM1` metric. We'll get the design specification for that metric from the SQuaSH API:

In [None]:
spec = requests.get(SQUASH_API_URL + "spec/validate_drp.AM1.design" ).json()
spec

Note that the operator in the `lsst.verify` specifications is such that "measurement `op` spec" is True if the measurement passes the specification. But we need the opposite when configuring alerts. In order to do that we created this mapping: 

In [None]:
inverse_operation = { '>': '<=', '>=': '<', '<': '>=', '<=': '>'}

Also note that triggers do not support equality and inequality, so if converting a `lsst.verify` spec to a Honeycomb trigger and the spec uses an equality match, you will have to decide whether the appropriate trigger comparison is greater than or less than the value.

In [None]:
trigger = {"name": "AM1 alert",
           "query": query,
           "threshold": {"op": inverse_operation[spec["threshold"]["operator"]], 
                         "value": spec["threshold"]["value"]},
           "frequency": 1800
          }
trigger

In [None]:
r = requests.post("https://api.honeycomb.io/1/triggers/squash-demo", json=trigger, headers=headers)
r

Finnaly, all Triggers in a dataset may be retrieved by:

In [None]:
r = requests.get("https://api.honeycomb.io/1/triggers/squash-demo", headers=headers)
r.json()