# Data Access InfluxDB

This document provides an introduction on how to access the [InfluxDB](https://docs.influxdata.com/influxdb) database used in Challenge 5.

First we need to import the necessary python modules (we assume that the necessary OS dependencies are already installed). 

In [1]:
# install the modules on the OS (uncomment if needed)
#!pip install pandas
#!pip install influxdb

# import the modules
import pandas as pd
from influxdb import DataFrameClient

Next we define the database connection.

In [2]:
# define the database connection string
DB_HOST = '86.119.36.94' 
DB_PORT = '8086'
DB_DBNAME = 'meteorology'
stations = ['mythenquai', 'tiefenbrunnen']

client = DataFrameClient(host=DB_HOST, port=DB_PORT, database=DB_DBNAME)
client.switch_database(DB_DBNAME)

Let's [query](https://docs.influxdata.com/influxdb/v1.7/query_language/data_exploration/) some data:

In [3]:
query = "SELECT COUNT(air_temperature) FROM \"{}\",\"{}\"".format(stations[0], stations[1])
result = client.query(query)
print(result[stations[0]])
print(result[stations[1]])

                            count
1970-01-01 00:00:00+00:00  634661
                            count
1970-01-01 00:00:00+00:00  628207


Internally, InfluxDB stores time in [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time). Depending on the season (daylight saving time), our time zone (Zurich) has an offset of +02:00 or +01:00. Therefore, it is important to set this offset in the query. Otherwise, InfluxDB's query engine would have to make assumptions on which time zone it should use.

In [4]:
query = "SELECT COUNT(air_temperature) FROM \"{}\",\"{}\" WHERE time >= '2019-07-23T00:00:00+02:00' AND time <= '2019-07-24T00:00:00+02:00'".format(stations[0], stations[1])
result = client.query(query)
print(result[stations[0]])
print(result[stations[1]])

                           count
2019-07-22 22:00:00+00:00    145
                           count
2019-07-22 22:00:00+00:00    145


In [5]:
query = "SELECT air_temperature FROM \"{}\",\"{}\" WHERE time >= '2019-07-23T00:00:00+02:00' AND time <= '2019-07-24T00:00:00+02:00'".format(stations[0], stations[1])
result = client.query(query)
print(result[stations[0]])
print(result[stations[1]])

                           air_temperature
2019-07-22 22:00:00+00:00             22.2
2019-07-22 22:10:00+00:00             22.1
2019-07-22 22:20:00+00:00             21.9
2019-07-22 22:30:00+00:00             21.8
2019-07-22 22:40:00+00:00             21.7
2019-07-22 22:50:00+00:00             21.5
2019-07-22 23:00:00+00:00             21.4
2019-07-22 23:10:00+00:00             21.2
2019-07-22 23:20:00+00:00             21.1
2019-07-22 23:30:00+00:00             20.9
2019-07-22 23:40:00+00:00             20.8
2019-07-22 23:50:00+00:00             20.7
2019-07-23 00:00:00+00:00             20.6
2019-07-23 00:10:00+00:00             20.6
2019-07-23 00:20:00+00:00             20.5
2019-07-23 00:30:00+00:00             20.4
2019-07-23 00:40:00+00:00             20.2
2019-07-23 00:50:00+00:00             20.1
2019-07-23 01:00:00+00:00             20.1
2019-07-23 01:10:00+00:00             20.0
2019-07-23 01:20:00+00:00             19.9
2019-07-23 01:30:00+00:00             19.9
2019-07-23 