# Sending SQuaSH data to InfluxDB

In this example notebook we will sync the SQuaSH production data with our InfluxDB instance, so that we can visualize SQuaSH metrics using [Chronograf](https://chronograf-demo.lsst.codes/). See also [this notebook](https://github.com/lsst-sqre/influx-demo) for a quick introduction on InfluxDB concepts.

We are using the [InfluxDB HTTP API](https://docs.influxdata.com/influxdb/v1.6/tools/api/) and Python `requests` module for sending data. The [InfluxDB python client](https://github.com/influxdata/influxdb-python) could also be an option.

In [1]:
SQUASH_API_URL = "https://squash-restful-api-demo.lsst.codes/"
INFLUXDB_API_URL = "https://influxdb-demo.lsst.codes"

We start by creating a new database. Note that if the database already exists nothing is done, the existing data is preserved, and an status code 200 (OK) is returned.

In [None]:
import requests
import json

DB = "squash-sandbox"

params={'q': 'CREATE DATABASE "{}"'.format(DB)}
r = requests.post(url=INFLUXDB_API_URL + "/query", params=params)
r.status_code

Here we get a list of the existing verification jobs from the SQuaSH API.

In [None]:
jobs = requests.get(SQUASH_API_URL + "/jobs").json()
print("Loading {} verification jobs from SQuaSH...".format(len(jobs['ids'])))

Data in influxDb is organized as "time series". Each time series has points, one for each discrete sample of the metric. Points consist of:

* timestamp
* measurement : which conceptually matches the idea of a table in a relational database
* tags : key-value pairs in order to store index values, usually metadata.
* fields : key-value pairs, containing the value itself, non indexed.

null values aren’t stored. The structure is of the data is:

```#<measurement>[,<tag_key>=<tag_value>[,<tag_key>=<tag_value>]] <field_key>=<field_value>[,<field_key>=<field_value>] [<timestamp>]```


The following cell will grab data from the SQuaSH API, and write it in that format, also known as [line protocol](https://docs.influxdata.com/influxdb/v1.6/write_protocols/line_protocol_tutorial/)):

As you run this notebook you might follow the data ingestion using the [Data Explorer](https://chronograf-demo.lsst.codes/sources/2/chronograf/data-explorer) tool in Chronograf.

In [None]:
from pytz import UTC
from datetime import datetime
from dateutil.parser import parse

EPOCH = UTC.localize(datetime.utcfromtimestamp(0))

params = {'db': DB}

for job_id in jobs['ids']:

    r = requests.get(SQUASH_API_URL + "/job/{}".format(job_id)).json()

    # Skip datasets we don't want 
    if r['ci_dataset'] == 'unknown' or r['ci_dataset'] == 'decam':
        continue

    print('Sending line for job {}...'.format(job_id))

    # The datamodel for SQuaSH in InfluxDB maps each verification package to 
    # a different InfluxDB measurement, all job metadata to InfluxDB tags and 
    # all metrics to fields.
    
    # Here we basically put the fields on their corresponding measurements.
    
    fields = {}
    for meas in r['measurements']:
        
        # parse the verification package, eventually it could figure out as 
        # another field in the SQuaSH API /measurements
        
        influxdb_measurement = meas['metric'].split('.')[0]
        
        if influxdb_measurement not in fields:
            fields[influxdb_measurement] = []
        fields[influxdb_measurement].append("{}={}".format(meas['metric'], meas['value']))
        
    tags = []
    # skip package info for now 
    del r['meta']['packages']
   
    # add ci_dataset as metadata
    r['meta']['dataset'] = r['ci_dataset']
     
    # delete rest of the env info for now
    del r['meta']['env']
    
    for key, value in r['meta'].items():
    
        # tag values cannot have blank spaces
        if type(value) == str:
            value = value.replace(" ", "_")
            tags.append("{}=\"{}\"".format(key, value))
        else:
            tags.append("{}={}".format(key, value))

    timestamp = int((parse(r['date_created']) - EPOCH).total_seconds()*1e9)

    # create an InlfuxDB line for each measurement and send 
    for measurement in fields.keys():
    
        line = "{},{} {} {}".format(measurement, ",".join(tags), ",".join(fields[measurement]), timestamp)

        post = requests.post(url=INFLUXDB_API_URL + "/write", params=params, data=line)

        print(post.status_code)
        print(post.text)
