# Data Access InfluxDB

This document provides an introduction on how to access the [InfluxDB](https://docs.influxdata.com/influxdb) database used in Challenge 5.

First we need to import the necessary python modules (we assume that the necessary OS dependencies are already installed). 

In [1]:
# install the modules on the OS
!pip install influxdb

# import the modules
import pandas as pd
from influxdb import DataFrameClient



Next we define the database connection information.

In [2]:
# define the database connection string
DB_HOST = '86.119.36.94' 
DB_PORT = 8086
DB_DBNAME = 'meteorology'
stations = ['mythenquai', 'tiefenbrunnen']

print(DB_HOST +":"+str(DB_PORT))

86.119.36.94:8086


### DataFrameClient

InfluxDB provides a [DataFrameClient](https://influxdb-python.readthedocs.io/en/latest/examples.html#tutorials-pandas) that can query data and store the result in [Pandas Dataframe](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html).

In [3]:
client = DataFrameClient(host=DB_HOST, port=DB_PORT, database=DB_DBNAME)
client.switch_database(DB_DBNAME)

Let's query some data using [InfluxQL](https://docs.influxdata.com/influxdb/v1.7/query_language/data_exploration/) (InfluxDB's query language).

In [4]:
query = "SELECT COUNT(air_temperature) FROM \"{}\",\"{}\" tz('Europe/Zurich')".format(stations[0], stations[1])
result = client.query(query)
print(result[stations[0]])
print(result[stations[1]])

                            count
1970-01-01 01:00:00+01:00  634661
                            count
1970-01-01 01:00:00+01:00  628207


Internally, InfluxDB stores time in [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time). Depending on the season (daylight saving time), our timezone (Zurich) has an offset of +02:00 or +01:00. Therefore, it is important to set this offset in the query. Otherwise, InfluxDB's query engine would have to make assumptions on which time zone it should use.

In [5]:
start_time = '2019-07-23T00:00:00+02:00'
end_time = '2019-07-24T00:00:00+02:00'
query = "SELECT COUNT(air_temperature) FROM \"{}\",\"{}\" WHERE time >= '{}' AND time <= '{}' tz('Europe/Zurich')".format(stations[0], stations[1], start_time, end_time)
result = client.query(query)
print(result[stations[0]])
print(result[stations[1]])

                           count
2019-07-23 00:00:00+02:00    145
                           count
2019-07-23 00:00:00+02:00    145


In [6]:
query = "SELECT air_temperature FROM \"{}\",\"{}\" WHERE time >= '{}' AND time <= '{}' tz('Europe/Zurich')".format(stations[0], stations[1], start_time, end_time)
result = client.query(query)
print(result[stations[0]])
print(result[stations[1]])

                           air_temperature
2019-07-23 00:00:00+02:00             22.2
2019-07-23 00:10:00+02:00             22.1
2019-07-23 00:20:00+02:00             21.9
2019-07-23 00:30:00+02:00             21.8
2019-07-23 00:40:00+02:00             21.7
2019-07-23 00:50:00+02:00             21.5
2019-07-23 01:00:00+02:00             21.4
2019-07-23 01:10:00+02:00             21.2
2019-07-23 01:20:00+02:00             21.1
2019-07-23 01:30:00+02:00             20.9
2019-07-23 01:40:00+02:00             20.8
2019-07-23 01:50:00+02:00             20.7
2019-07-23 02:00:00+02:00             20.6
2019-07-23 02:10:00+02:00             20.6
2019-07-23 02:20:00+02:00             20.5
2019-07-23 02:30:00+02:00             20.4
2019-07-23 02:40:00+02:00             20.2
2019-07-23 02:50:00+02:00             20.1
2019-07-23 03:00:00+02:00             20.1
2019-07-23 03:10:00+02:00             20.0
2019-07-23 03:20:00+02:00             19.9
2019-07-23 03:30:00+02:00             19.9
2019-07-23 

### InfluxDB REST API

Alternatively, it is also possible to use InfluxDB's [Query REST API](https://docs.influxdata.com/influxdb/v1.7/tools/api/#query-http-endpoint) (quick intro on REST please see [here](https://www.restapitutorial.com/lessons/whatisrest.html) and [here](https://www.restapitutorial.com/lessons/httpmethods.html)). This can be helpful for programming languages for which InfluxDB does not provide a client (e.g. [R](https://community.influxdata.com/t/how-should-i-read-influxdb-from-r/860/2))

Import the necessary python modules (we assume that the necessary OS dependencies are already installed)

In [7]:
# install the modules on the OS
!pip install jsonpath-ng

# import the modules
import requests
import json
from jsonpath_ng import jsonpath, parse

# setup the base url
base_url = "http://{}:{}/query?db={}".format(DB_HOST, DB_PORT, DB_DBNAME)
print(base_url)

http://86.119.36.94:8086/query?db=meteorology


In [8]:
query = "SELECT COUNT(air_temperature) FROM \"{}\" tz('Europe/Zurich')".format(stations[0])
response = requests.get(base_url, params={"q": query})
if not response.ok:
    response.raise_for_status()

json_response = json.loads(response.content)
print(json_response)

{'results': [{'statement_id': 0, 'series': [{'name': 'mythenquai', 'columns': ['time', 'count'], 'values': [['1970-01-01T01:00:00+01:00', 634661]]}]}]}


Extract data using Python data structures (like dicts and arrays) directly.

In [9]:
values = json_response["results"][0]["series"][0]["values"]
print(values)

[['1970-01-01T01:00:00+01:00', 634661]]


Alternatively, use [JSONPath](https://pypi.org/project/jsonpath-ng) to access/query the data of interest.

In [10]:
accessor = parse('results[*].series[*].name|columns|values')
accessor_value = parse('results[*].series[*].values')
acc_result = [match.value for match in accessor.find(json_response)]
value_result = [match.value for match in accessor_value.find(json_response)]
print(acc_result)
print(value_result)

['mythenquai', ['time', 'count'], [['1970-01-01T01:00:00+01:00', 634661]]]
[[['1970-01-01T01:00:00+01:00', 634661]]]


Another query

In [11]:
query = "SELECT air_temperature FROM \"{}\" WHERE time >= '{}' AND time <= '{}' tz('Europe/Zurich')".format(stations[0], start_time, end_time)
response = requests.get(base_url, params={"q": query})
if not response.ok:
    response.raise_for_status()

json_response = json.loads(response.content)

In [12]:
values = json_response["results"][0]["series"][0]["values"]
print(values)

[['2019-07-23T00:00:00+02:00', 22.2], ['2019-07-23T00:10:00+02:00', 22.1], ['2019-07-23T00:20:00+02:00', 21.9], ['2019-07-23T00:30:00+02:00', 21.8], ['2019-07-23T00:40:00+02:00', 21.7], ['2019-07-23T00:50:00+02:00', 21.5], ['2019-07-23T01:00:00+02:00', 21.4], ['2019-07-23T01:10:00+02:00', 21.2], ['2019-07-23T01:20:00+02:00', 21.1], ['2019-07-23T01:30:00+02:00', 20.9], ['2019-07-23T01:40:00+02:00', 20.8], ['2019-07-23T01:50:00+02:00', 20.7], ['2019-07-23T02:00:00+02:00', 20.6], ['2019-07-23T02:10:00+02:00', 20.6], ['2019-07-23T02:20:00+02:00', 20.5], ['2019-07-23T02:30:00+02:00', 20.4], ['2019-07-23T02:40:00+02:00', 20.2], ['2019-07-23T02:50:00+02:00', 20.1], ['2019-07-23T03:00:00+02:00', 20.1], ['2019-07-23T03:10:00+02:00', 20], ['2019-07-23T03:20:00+02:00', 19.9], ['2019-07-23T03:30:00+02:00', 19.9], ['2019-07-23T03:40:00+02:00', 19.9], ['2019-07-23T03:50:00+02:00', 19.9], ['2019-07-23T04:00:00+02:00', 19.9], ['2019-07-23T04:10:00+02:00', 19.9], ['2019-07-23T04:20:00+02:00', 19.8], ['

In [13]:
acc_result = [match.value for match in accessor.find(json_response)]
value_result = [match.value for match in accessor_value.find(json_response)]
print(acc_result)
print(value_result)

['mythenquai', ['time', 'air_temperature'], [['2019-07-23T00:00:00+02:00', 22.2], ['2019-07-23T00:10:00+02:00', 22.1], ['2019-07-23T00:20:00+02:00', 21.9], ['2019-07-23T00:30:00+02:00', 21.8], ['2019-07-23T00:40:00+02:00', 21.7], ['2019-07-23T00:50:00+02:00', 21.5], ['2019-07-23T01:00:00+02:00', 21.4], ['2019-07-23T01:10:00+02:00', 21.2], ['2019-07-23T01:20:00+02:00', 21.1], ['2019-07-23T01:30:00+02:00', 20.9], ['2019-07-23T01:40:00+02:00', 20.8], ['2019-07-23T01:50:00+02:00', 20.7], ['2019-07-23T02:00:00+02:00', 20.6], ['2019-07-23T02:10:00+02:00', 20.6], ['2019-07-23T02:20:00+02:00', 20.5], ['2019-07-23T02:30:00+02:00', 20.4], ['2019-07-23T02:40:00+02:00', 20.2], ['2019-07-23T02:50:00+02:00', 20.1], ['2019-07-23T03:00:00+02:00', 20.1], ['2019-07-23T03:10:00+02:00', 20], ['2019-07-23T03:20:00+02:00', 19.9], ['2019-07-23T03:30:00+02:00', 19.9], ['2019-07-23T03:40:00+02:00', 19.9], ['2019-07-23T03:50:00+02:00', 19.9], ['2019-07-23T04:00:00+02:00', 19.9], ['2019-07-23T04:10:00+02:00', 19