# DM-EFD latency characterization

This notebook shows how to get data from the InfluxDB API to characterize the total latency for a message from the time it is produced by SAL to the time it is written to InfluxDB.

In [None]:
%%capture packages
import sys
!{sys.executable} -m pip install matplotlib
!{sys.executable} -m pip install pandas

In [None]:
import requests
import matplotlib.pyplot as plt
import pandas as pd

## InfluxDB URL and database to read from

In [None]:
INFLUXDB_API_URL = "https://influxdb-efd-kafka.lsst.codes"
INFLUXDB_DATABASE = "efd"

In [None]:
import getpass
USERNAME = "admin"
PASSWORD = getpass.getpass(prompt='Password for user `{}`: '.format(username))

## Retrieving timestamps for a given topic
The following timestamps are available (the order reflects the actual message flow through the system) 

- **sal_ingested**: Timestamp when SAL ingested the message from the DDS bus.
- **sal_created**: Timestamp when SAL sends the message to the kafka brokers.
- **kafka_timestamp**: Timestamp right after the SAL transform step.
- **time**: Timestamp when the message is written to InfluxDB. Note that this timestamp depends on the InfluxDB Sink connector configuration. At the time of this writing the connector is configured to use the system current time as the InfluxDB timestamp. In the case that changes,  we'll create another timestamp to record the timestamp when the message is written to InfluxDB.


In [None]:
def get_timestamps(topic, past='15m'):
    
    query = 'SELECT "sal_created", "sal_ingested", "kafka_timestamp" FROM "{}\"."autogen"."{}" where time > now()-{}'
    params={'q': query.format(INFLUXDB_DATABASE, topic, past), 'epoch': 'ms', 'chunked': '200000', 'u': USERNAME, 'p': PASSWORD}
    
    r = requests.post(url=INFLUXDB_API_URL + "/query", params=params)
    
    return r.json()

In [None]:
data = get_timestamps("lsst.sal.MTM1M3_forceActuatorData")['results'][0]['series'][0]
df = pd.DataFrame.from_records(data['values'], columns=data['columns'])
df.head()

## Latency and time in seconds

In [None]:
df['latency'] = (df['time'] - df['sal_created'])/1000
df['time_seconds'] = (df['time']-df['time'][0])/1000

## Latency characterization

In [None]:
median = df.latency.median()
quantile99 = df.latency.quantile(.99)

In [None]:
p = df.plot(x='time_seconds', y='latency', figsize=(15,3))
p.set_xlabel("Time (s)")
p.set_ylabel("Latency (s)")
p.text(50,df.latency.max()-0.1,"Median={:.4f}s 99% percentile={:.2f}s".format(median, quantile99))