## Accessing DM-EFD data


Trivial change to trigger review nb.  In this notebook we demonstrate how to extract data from the DM-EFD using [aioinflux](https://aioinflux.readthedocs.io/en/stable/index.html), a Python client for InfluxDB, and proceed with data analysis using Pandas dataframes. 

This is complementaty to the [Chronograf](https://test-chronograf-efd.lsst.codes) interface which we use for time-series visualization.

In addition to `aioinflux`, you'll need to install `pandas`, `numpy` and `matplotlib` to run this notebook.

In [None]:
import matplotlib
%matplotlib widget
from matplotlib import pylab as plt
import aioinflux
import getpass
import pandas as pd
import numpy as np
import asyncio

from bokeh.plotting import figure, output_notebook, show
from bokeh.models import LinearAxis, Range1d, Span, Label
output_notebook()

We'll access the DM-EFD instance deployed at the AuxTel lab in Tucson. You need to be on site or connected to the NOAO VPN. 

If you are familiar with the AuxTel lab environment, you might be able to authenticate using the generic `saluser`. Ping me at Slack (`@afausti`) if you have any problem.

In [None]:
username = "saluser"
password = getpass.getpass(f"Password for {username}:")

The following configures the `aioinflux` Python client to connect to the DM-EFD InfluxDB instance. 

In [None]:
client = aioinflux.InfluxDBClient(host='summit-influxdb-efd.lsst.codes', 
                                  port='443', 
                                  ssl=True, 
                                  username=username, 
                                  password=password,
                                  db='efd')

We can configure the output to be a Pandas dataframe, which is very convenient for data analysis.  Specify a time range for data in `InfluxQL`.  This notebook looks at data from 7 days ago.

In [None]:
client.output = 'dataframe'
time_span = "time >= now() - 7d"

Query the relevant timestamp.  I believe the `sndStamp` is when the message is sent to DDS, so is the earliest timestamp we have for weather data.  The timestamp for when the measurement was recorded in influxDB is in the index of the returned data structure.

In [None]:
tstamps = await client.query(f'SELECT "private_sndStamp" FROM "efd"."autogen"."lsst.sal.Environment.weather" WHERE {time_span}')

Most operations work on `Timedelta` types, but not the `histogram` function, so we record the difference in seconds here.

In [None]:
deltas = []
for influx_stamp, snd_stamp in zip(tstamps.index.values, tstamps['private_sndStamp']):
    deltas.append((pd.Timestamp(influx_stamp, tz="GMT") - pd.Timestamp(snd_stamp, unit='s', tz="GMT")).total_seconds())

In [None]:
deltas = np.array(deltas)

In [None]:
median = np.median(deltas)
mean = deltas.mean()

Compute histogram

In [None]:
hist, edges = np.histogram(deltas, density=True, bins=np.linspace(0, 0.02, 500))

In [None]:
p = figure(title='Latency between influx and snd for the Environment_weather subsystem', background_fill_color="#fafafa")
p.yaxis.axis_label = "Number"
p.xaxis.axis_label = "Latency (s)"
p.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
       fill_color='navy', line_color='white', alpha=0.5)
annotation = Label(x=250, y=250, x_units='screen', y_units='screen',
                 text='mean=%.4fs median=%.4fs'%(mean, median),
                 border_line_color='black', border_line_alpha=1.0,
                 background_fill_color='white', background_fill_alpha=1.0)
p.add_layout(annotation)
show(p)

In [None]:
len(deltas)