## Advanced EFD Queries with `aioinflux`

We expect many user needs to be satisfied by the methods provided by the `EfdClient`, but there are situations when using the instance of the `aioinflux` client made by the `EfdClient` constructor directly is advantageous.
There are two specific cases that will be explored in this notebook:

1. Use InfluxQL functions and the `GROUP BY time` clause to return aggregated results faster
2. Use the chunked requests feature of the `aioinflux` client to speed up long baseline queries

### Setup

Construct an instance of the `EfdClient` pointing to the "stable EFD".

In [None]:
%matplotlib widget

from astropy.time import Time, TimeDelta
from matplotlib import pylab as plt
import numpy as np

from lsst_efd_client import EfdClient
efd = EfdClient('ldf_stable_efd')
cl = efd.influx_client

Set up a time window to use in the queries.
Timestamps on EFD topics should be in TAI.

In [None]:
t2_agg = Time('2021-03-15T12:00:00Z')
t1_agg = t2_agg - TimeDelta(5*24*3600, format='sec') # look back over 5 days of data

t2_chunk = Time('2021-03-04T12:00:00Z')
t1_chunk = Time('2021-02-04T12:00:00Z')

### Aggregation queries

It is trivial to [resample](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html) `pandas.DataFrame` objects returned by the `EfdClient`.
In cases where you know ahead of time that you won't need the full sampling of the data, resampling in the query can provide faster queries as well as making it practical to query on wider time windows.
We will be using the [`GROUP BY time`](https://docs.influxdata.com/influxdb/v1.8/query_language/explore-data/#basic-group-by-time-syntax) clause in the influxQL language.

In order to resample in this way, a function must be supplied for any field in the topic being returned.
Common functions to use will be [aggregations](https://docs.influxdata.com/influxdb/v1.8/query_language/functions/#aggregations) and [selectors](https://docs.influxdata.com/influxdb/v1.8/query_language/functions/#selectors).
For the purposes of this demonstration, we will use the `MEAN` function.

The time interval in the `GROUP BY time` must be specified as a [duration literal](https://docs.influxdata.com/influxdb/v1.8/query_language/spec/#durations).

➔ Note that influxQL is [picky](https://www.influxdata.com/blog/tldr-influxdb-tech-tips-july-21-2016/) about single vs. double quotes, so if you are getting errors, recheck your quoting.

➔ Also note that the timestamps in the query indicate the time is in UTC.
The timezone is required by influxQL and though we indicate UTC, the time index in influxDB is actually in TAI.
Since our times are specified in TAI, this should all work even though we are forced to misstate the timezone.

In [None]:
def make_query_str(beg, end, interval):
    return ("SELECT mean(\"ambient_temp\") as \"temp\", mean(\"pressure\") as \"pressure\", mean(\"humidity\") as \"humidity\" " +
            "FROM \"efd\".\"autogen\".\"lsst.sal.WeatherStation.weather\" " +
            f"WHERE time > '{beg.isot}Z' and time <= '{end.isot}Z' " +
            f"GROUP BY time({interval})")

In [None]:
def plot_result(result):
    fig, (ax0, ax1, ax2) = plt.subplots(ncols=3, nrows=1)
    ax0.plot(result.index, result['temp'])
    ax0.set_xlabel('Date')
    ax0.set_ylabel('Temperature (ºC)')
    ax1.plot(result.index, result['pressure'])
    ax1.set_xlabel('Date')
    ax1.set_ylabel('Pressure (mm Hg)')
    ax2.plot(result.index, result['humidity'])
    ax2.set_xlabel('Date')
    ax2.set_ylabel('Relative Humidity (%)')
    plt.gcf().autofmt_xdate()
    plt.subplots_adjust(wspace=0.50)

In [None]:
result = await cl.query(make_query_str(t1_agg, t2_agg, '30m'))

In [None]:
plot_result(result)

Let's do that query again, but this time use a much wider aggregation window.

In [None]:
result = await cl.query(make_query_str(t1_agg, t2_agg, '6h'))

In [None]:
plot_result(result)

### Chunked queries

At times, the full fidelity data stream is required for processing, but for various reasons fetching all the data at once is impractical.
This is where the [chunked query](https://aioinflux.readthedocs.io/en/stable/usage.html#chunked-responses) functionality of `aioinflux` domes in handy.
Note that chunked queries can also make large queries more reliable as very long queries sometimes get dropped, resulting in an exception.

The simple example here is a demonstration that we have not dropped any weather event for 1 month in early 2021.

The default chunk size is 1000.
We set it here for demonstration purposes only.

In [None]:
query_str = ("SELECT \"private_seqNum\" " +
            "FROM \"efd\".\"autogen\".\"lsst.sal.WeatherStation.weather\" " +
            f"WHERE time > '{t1_chunk.isot}Z' and time <= '{t2_chunk.isot}Z'")

In [None]:
chunks = await cl.query(query_str, chunked=True, chunk_size=500) # The default chunk size is 1000

In [None]:
seq_nums = []
offset = 0
async for c in chunks:
    for num in c['private_seqNum']:
        if seq_nums and num == 1:
            offset = seq_nums[-1] # when the subsystem is rebooted, the sequence starts over
        seq_nums.append(num + offset)

Now we can check to make sure the difference between sequence number is always 1 meaning there were no intervening messages that didn't get recorded.

In [None]:
seq_nums = np.array(seq_nums)
diffs = seq_nums[1:] - seq_nums[:-1] - np.ones(len(seq_nums) - 1)
print(f'This should be zero: {diffs.sum()}')