# Investigating aioinflux timeouts in chunked queries

We are seeing timeouts when querying the EFD with `aioinlfux` using chunked queries with small chunk sizes. 

One would expect that the smaller the chunk size the faster the async request would return, but that's not always the case. If the chunk size is too small, there's an overhead in generating too many requests to InfluxDB. What we see in this case is that some chunks take longer than 5 minutes (the `aiohttp` client default timeout) to return and are canceled.

In [None]:
import asyncio
import pandas

from astropy.time import Time, TimeDelta
from aioinflux import InfluxDBClient
from lsst_efd_client import EfdClient

Setting aioinflux logging level to DEBUG helps to the keep track of when the chunks are being returned, but be aware that the output is very large.

In [None]:
#import logging
#logging.basicConfig()
#logging.getLogger('aioinflux').setLevel(logging.DEBUG)

In [None]:
efd = EfdClient('ldf_stable_efd')

In [None]:
topic = 'lsst.sal.MTM2.position'

stop = Time('2021-11-10T12:00:00Z', format='isot', scale='utc')
start = stop - TimeDelta(12*24*3600, format='sec')
print(start, stop)

In [None]:
query_str = ("SELECT * " +
             f"FROM \"efd\".\"autogen\".\"{topic}\" " +
             f"WHERE time > '{start.isot}Z' and time <= '{stop.isot}Z'")
print(query_str)

A small `chunk_size` will produce a timeout error after ~5 minutes:

In [None]:
cursor = await efd.influx_client.query(query_str, chunked=True, chunk_size=500)
df = pandas.concat([i async for i in cursor])

Because `chunk_size` is so small there are too many async requests and that becomes inneficient. Some chunks take longer than 5 minutes (the `aiohttp` default timeout) to return and are canceled.

In [None]:
from aiohttp.client import DEFAULT_TIMEOUT
print(DEFAULT_TIMEOUT)

In fact by increasing the `chunk_size` we generate less async requests and the chunks finish in less than 5 minutes.

In [None]:
cursor = await efd.influx_client.query(query_str, chunked=True, chunk_size=10000)
df = pandas.concat([i async for i in cursor])

The `aioinflux` client implements the timeout option to [override the aiohttp default timeout](https://github.com/gusutabopb/aioinflux/blob/master/aioinflux/client.py#L134). We can expose this option to the EFD client as well.

Here we show that by setting a larger timeout we can use a smaller `chunk_size`. However the lesson learned is that a smaller `chunk_size` doesn't necessarily mean a more efficient query because of the overhead of generating too many async requests.  

NOTE: set the password for the `efdreader` user in the next cell.

In [None]:
aioinflux_client = InfluxDBClient(host="lsst-influxdb-efd.ncsa.illinois.edu", timeout=900, port=443, ssl=True, username="efdreader", password="", db="efd", mode='async', output="dataframe") 

In [None]:
cursor = await aioinflux_client.query(query_str, chunked=True, chunk_size=500)
df = pandas.concat([i async for i in cursor])