<h1><img src="https://3yecy51kdipx3blyi37oute1-wpengine.netdna-ssl.com/wp-content/uploads/2016/11/influxdata_400x200.png" style="height: 80px;"/></h1>

> InfluxDB is a time series database designed to handle timestamped data, including DevOps monitoring, application metrics, IoT sensor data, and real-time analytics.

## Key features

- CLI/HTTP write and query API.
- Expressive SQL-like query language
- Schemas don't have to be defined up front and schema preferences may change over time. 
- Tags allow series to be indexed for fast and efficient queries.
- Retention policies efficiently auto-expire stale data.
- Plugins support for other data ingestion protocols such as Graphite, collectd, and OpenTSDB.
- Continuous queries automatically compute aggregate data to make frequent queries more efficient.
- InfluxDB isn’t fully CRUD
- The open source edition of InfluxDB runs on a single node, high availability is only available in the InfluxDB Enterprise Edition.

## Data structure


Time series [key concepts](https://docs.influxdata.com/influxdb/v1.7/concepts/key_concepts/):
- `time` - a timestamp - is similar to a SQL primary key, 
- `tags`, zero to many key-values, contain any metadata about the value, **`tags` are indexed**
- at least one key-value `field set` (``field key` identifies the measured element while field value` are the measured value itself, e.g. “value=0.64”, or “temperature=21.2”). **fields` are not indexed**
    
It’s important to note that fields are not indexed. Queries that use field values as filters must scan all values that match the other conditions in the query. As a result, those queries are not performant relative to queries on tags. In general, fields should not contain commonly-queried metadata.

    
Further concepts:
- a `measurement` acts as a container for `tags`, `fields`, and the `time` column. Assimilable to a SQL table, where the primary index is always `time`. `tags` and `fields` are effectively columns in the table.
- a `series` is the collection of data that share the same retention policy, measurement, and tag set.
- a point represents a single data record that has four components: a measurement, tag set, field set, and a timestamp. A point is uniquely identified by its series and timestamp (similar to a row in a SQL database table)

<table>
<thead>
<tr>
<th align="left">Element</th>
<th align="left">Optional/Required</th>
<th align="left">Description</th>
<th align="left">Type<br>(See <a href="https://docs.influxdata.com/influxdb/v1.7/write_protocols/line_protocol_reference/#data-types">data types</a> for more information.)</th>
</tr>
</thead>

<tbody>
<tr>
<td align="left"><a href="https://docs.influxdata.com/influxdb/v1.7/concepts/glossary/#measurement">Measurement</a></td>
<td align="left">Required</td>
<td align="left">The measurement name. InfluxDB accepts one measurement per point.</td>
<td align="left">String</td>
</tr>

<tr>
<td align="left"><a href="https://docs.influxdata.com/influxdb/v1.7/concepts/glossary/#tag-set">Tag set</a></td>
<td align="left">Optional</td>
<td align="left">All tag key-value pairs for the point.</td>
<td align="left"><a href="https://docs.influxdata.com/influxdb/v1.7/concepts/glossary/#tag-key">Tag keys</a> and <a href="https://docs.influxdata.com/influxdb/v1.7/concepts/glossary/#tag-value">tag values</a> are both strings.</td>
</tr>

<tr>
<td align="left"><a href="https://docs.influxdata.com/influxdb/v1.7/concepts/glossary/#field-set">Field set</a></td>
<td align="left">Required. Points must have at least one field.</td>
<td align="left">All field key-value pairs for the point.</td>
<td align="left"><a href="https://docs.influxdata.com/influxdb/v1.7/concepts/glossary/#field-key">Field keys</a> are strings. <a href="https://docs.influxdata.com/influxdb/v1.7/concepts/glossary/#field-value">Field values</a> can be floats, integers, strings, or Booleans.</td>
</tr>

<tr>
<td align="left"><a href="https://docs.influxdata.com/influxdb/v1.7/concepts/glossary/#timestamp">Timestamp</a></td>
<td align="left">Optional. InfluxDB uses the server’s local nanosecond timestamp in UTC if the timestamp is not included with the point.</td>
<td align="left">The timestamp for the data point. InfluxDB accepts one timestamp per point.</td>
<td align="left">Unix nanosecond timestamp. Specify alternative precisions with the <a href="https://docs.influxdata.com/influxdb/v1.7/tools/api/#write-http-endpoint">InfluxDB API</a>.</td>
</tr>
</tbody>
</table>


## Data type

<table>
<thead>
<tr>
<th align="left">Datatype</th>
<th align="left">Element(s)</th>
<th align="left">Description</th>
</tr>
</thead>

<tbody>
<tr>
<td align="left">Float</td>
<td align="left">Field values</td>
<td align="left">IEEE-754 64-bit floating-point numbers. This is the default numerical type. Examples: <code>1</code>, <code>1.0</code>, <code>1.e+78</code>, <code>1.E+78</code>.</td>
</tr>

<tr>
<td align="left">Integer</td>
<td align="left">Field values</td>
<td align="left">Signed 64-bit integers (-9223372036854775808 to 9223372036854775807). Specify an integer with a trailing <code>i</code> on the number. Example: <code>1i</code>.</td>
</tr>

<tr>
<td align="left">String</td>
<td align="left">Measurements, tag keys, tag values, field keys, field values</td>
<td align="left">Length limit 64KB.</td>
</tr>

<tr>
<td align="left">Boolean</td>
<td align="left">Field values</td>
<td align="left">Stores TRUE or FALSE values.<br><br>TRUE write syntax:<code>[t, T, true, True, TRUE]</code>.<br><br>FALSE write syntax:<code>[f, F, false, False, FALSE]</code></td>
</tr>

<tr>
<td align="left">Timestamp</td>
<td align="left">Timestamps</td>
<td align="left">Unix nanosecond timestamp. Specify alternative precisions with the <a href="/influxdb/v1.7/tools/api/#write-http-endpoint">InfluxDB API</a>. The minimum valid timestamp is <code>-9223372036854775806</code> or <code>1677-09-21T00:12:43.145224194Z</code>. The maximum valid timestamp is <code>9223372036854775806</code> or <code>2262-04-11T23:47:16.854775806Z</code>.</td>
</tr>
</tbody>
</table>

Python module [documentation](https://influxdb-python.readthedocs.io/en/latest/index.html)


## Example

<div style="display:table;">
<div style="display:row;width=100%;">
<div style="display:table-cell;">
*census*:

<table>
<thead>
<tr>
<th>time</th>
<th><span title="Field key">butterflies</span></th>
<th><span title="Field key">honeybees</span></th>
<th><span title="Tag key">location</span></th>
<th><span title="Tag key">scientist</span></th>
</tr>
</thead>

<tbody>
<tr>
<td title="Timestamp">2015-08-18T00:00:00Z</td>
<td title="Field value">12</td>
<td title="Field value">23</td>
<td title="Tag value">1</td>
<td title="Tag value">langstroth</td>
</tr>

<tr>
<td>2015-08-18T00:00:00Z</td>
<td>1</td>
<td>30</td>
<td>1</td>
<td>perpetua</td>
</tr>

<tr>
<td>2015-08-18T00:06:00Z</td>
<td>11</td>
<td>28</td>
<td>1</td>
<td>langstroth</td>
</tr>

<tr>
<td>015-08-18T00:06:00Z</td>
<td>3</td>
<td>28</td>
<td>1</td>
<td>perpetua</td>
</tr>

<tr>
<td>2015-08-18T05:54:00Z</td>
<td>2</td>
<td>11</td>
<td>2</td>
<td>langstroth</td>
</tr>

<tr>
<td>2015-08-18T06:00:00Z</td>
<td>1</td>
<td>10</td>
<td>2</td>
<td>langstroth</td>
</tr>

<tr>
<td>2015-08-18T06:06:00Z</td>
<td>8</td>
<td>23</td>
<td>2</td>
<td>perpetua</td>
</tr>

<tr>
<td>2015-08-18T06:12:00Z</td>
<td>7</td>
<td>22</td>
<td>2</td>
<td>perpetua</td>
</tr>
</tbody>
</table>

</div>
    
<div style="display:table-cell;padding-left:5%;">

**8 field sets**

    butterflies = 12 honeybees = 23
    butterflies = 1 honeybees = 30
    butterflies = 11 honeybees = 28
    butterflies = 3 honeybees = 28
    butterflies = 2 honeybees = 11
    butterflies = 1 honeybees = 10
    butterflies = 8 honeybees = 23
    butterflies = 7 honeybees = 22


</div>
    
<div style="display:table-cell;padding-left:5%;">


**4 tag sets** (different combinations of all the tag key-value pairs)

    location = 1, scientist = langstroth
    location = 2, scientist = langstroth
    location = 1, scientist = perpetua
    location = 2, scientist = perpetua
</div>
</div>    
</div>

<table>
<thead>
<tr>
<th>Arbitrary series number</th>
<th>Retention policy</th>
<th>Measurement</th>
<th>Tag set</th>
</tr>
</thead>

<tbody>
<tr>
<td>series 1</td>
<td><code>autogen</code></td>
<td><code>census</code></td>
<td><code>location = 1</code>,<code>scientist = langstroth</code></td>
</tr>

<tr>
<td>series 2</td>
<td><code>autogen</code></td>
<td><code>census</code></td>
<td><code>location = 2</code>,<code>scientist = langstroth</code></td>
</tr>

<tr>
<td>series 3</td>
<td><code>autogen</code></td>
<td><code>census</code></td>
<td><code>location = 1</code>,<code>scientist = perpetua</code></td>
</tr>

<tr>
<td>series 4</td>
<td><code>autogen</code></td>
<td><code>census</code></td>
<td><code>location = 2</code>,<code>scientist = perpetua</code></td>
</tr>
</tbody>
</table>

In [1]:
from datetime import (datetime, timedelta)
from random import (choice, randint, random, uniform, lognormvariate)
from influxdb import (InfluxDBClient, DataFrameClient)

In [2]:
INFLUXDB_USER='telegraf'
INFLUXDB_USER_PASSWORD='secretpassword'

host='db.influxdb.app.com'
port=8086
"""Instantiate a connection to the InfluxDB."""
user = 'admin'
password = 'supersecretpassword'
dbname = 'example'

dbuser = 'telegraf'
dbuser_password = 'secretpassword'

client = InfluxDBClient(host, port, user, password, dbname)

# DB init

For creating the DB

Define a specific retention policy, drop after 30d, with replica factor of 3 and applied by default to new elements

In [None]:
client.create_retention_policy('custom_policy', '30d', 3, default=True)

In [5]:
client.switch_user(dbuser, dbuser_password)

In [6]:
def feed_db(n):
    json_body = [
        {
            "measurement": "starfleet_01",
            "tags": {
                "cmdt": choice(("Archer", "Kirk", "Kruge")),
                "region": choice(("Andoria", "Deep Space Nine", "Earth", "Genesis")),
                "spacecraft": f"NX-17{randint(1, 10):02d}"
            },
            "time": (datetime.now() - timedelta(seconds=2*(n - i))).strftime("%Y-%m-%dT%H:%M:%S"),
            "fields": {
                "speed": lognormvariate(10, 3),
                "consumption": uniform(0, 300),
                "pressure_a": 3 + random(),
                "status_b": choice((True, False))
            }
        } for i in range(n)
    ]
    client.write_points(json_body, batch_size=100_000)

In [7]:
def feed_db_array(n):
    json_body = [
        {
            "measurement": "starfleet_02",
            "tags": {
                "cmdt": choice(("Archer", "Kirk", "Kruge")),
                "region": choice(("Andoria", "Deep Space Nine", "Earth", "Genesis")),
                "spacecraft": f"NX-17{randint(1, 10):02d}"
            },
            "time": (datetime.now() - timedelta(seconds=2*(n - i))).strftime("%Y-%m-%dT%H:%M:%S"),
            "fields": {
                "curve_b[0]": 10 + random(),
                "curve_b[1]": choice((15 + random(), 4 + random())),
                "curve_b[2]": 20 + random(),
            }
        } for i in range(n)
    ]
    client.write_points(json_body, batch_size=100_000)
    


In [46]:
%timeit feed_db(100_000)

13.2 s ± 202 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Series can be deleted

In [47]:
client.query(query='select count(*) from starfleet_01')

ResultSet({'('starfleet_01', None)': [{'time': '1970-01-01T00:00:00Z', 'count_consumption': 1866653, 'count_pressure_a': 1866653, 'count_speed': 1866653, 'count_status_b': 1866653}]})

In [9]:
from IPython.core.magic import (register_line_cell_magic)

@register_line_cell_magic
def influxql(line, cell=None):
    "Magic that works both as %lcmagic and as %%lcmagic"
    if cell is None:
        sqlstr = line
    else:
        sqlstr = ";".join((op.strip(";") for op in cell.strip("\n").split("\n"))) + ";"
    return client.query(query=sqlstr)

In [10]:
%influxql SELECT speed FROM starfleet_01 WHERE time > now() - 1d LIMIT 10;

ResultSet({'('starfleet_01', None)': [{'time': '2019-08-19T20:52:05Z', 'speed': 774750.9698706511}, {'time': '2019-08-19T20:52:07Z', 'speed': 487121.20016297186}, {'time': '2019-08-19T20:52:09Z', 'speed': 18753.305850476325}, {'time': '2019-08-19T20:52:11Z', 'speed': 17151.44033960194}, {'time': '2019-08-19T20:52:13Z', 'speed': 34535.68231005546}, {'time': '2019-08-19T20:52:15Z', 'speed': 17274.816825933573}, {'time': '2019-08-19T20:52:17Z', 'speed': 31681.839234807274}, {'time': '2019-08-19T20:52:19Z', 'speed': 9354.756781762246}, {'time': '2019-08-19T20:52:21Z', 'speed': 6355.474730264167}, {'time': '2019-08-19T20:52:23Z', 'speed': 40375.16829113543}]})

In [11]:
result = %influxql SELECT speed FROM starfleet_01 WHERE time > now() - 1d LIMIT 10;

In [12]:
result

ResultSet({'('starfleet_01', None)': [{'time': '2019-08-19T20:52:07Z', 'speed': 487121.20016297186}, {'time': '2019-08-19T20:52:09Z', 'speed': 18753.305850476325}, {'time': '2019-08-19T20:52:11Z', 'speed': 17151.44033960194}, {'time': '2019-08-19T20:52:13Z', 'speed': 34535.68231005546}, {'time': '2019-08-19T20:52:15Z', 'speed': 17274.816825933573}, {'time': '2019-08-19T20:52:17Z', 'speed': 31681.839234807274}, {'time': '2019-08-19T20:52:19Z', 'speed': 9354.756781762246}, {'time': '2019-08-19T20:52:21Z', 'speed': 6355.474730264167}, {'time': '2019-08-19T20:52:23Z', 'speed': 40375.16829113543}, {'time': '2019-08-19T20:52:25Z', 'speed': 9.829304084029904}]})

In [43]:
%influxql SELECT COUNT(DISTINCT(pressure_a)) FROM starfleet_01;

ResultSet({'('starfleet_01', None)': [{'time': '1970-01-01T00:00:00Z', 'count': 1000000}]})

In [39]:
%influxql SELECT COUNT(pressure_a) FROM starfleet_01 GROUP BY time(28d), region LIMIT 1;

ResultSet({'('starfleet_01', {'region': 'Andoria'})': [{'time': '2019-07-11T00:00:00Z', 'count': 118099}], '('starfleet_01', {'region': 'Deep Space Nine'})': [{'time': '2019-07-11T00:00:00Z', 'count': 118282}], '('starfleet_01', {'region': 'Earth'})': [{'time': '2019-07-11T00:00:00Z', 'count': 117988}], '('starfleet_01', {'region': 'Genesis'})': [{'time': '2019-07-11T00:00:00Z', 'count': 117959}]})

In [38]:
%influxql SELECT MEAN(pressure_a) FROM starfleet_01 GROUP BY region

ResultSet({'('starfleet_01', {'region': 'Andoria'})': [{'time': '1970-01-01T00:00:00Z', 'mean': 3.499534039730786}], '('starfleet_01', {'region': 'Deep Space Nine'})': [{'time': '1970-01-01T00:00:00Z', 'mean': 3.4999780335201867}], '('starfleet_01', {'region': 'Earth'})': [{'time': '1970-01-01T00:00:00Z', 'mean': 3.5005520567253616}], '('starfleet_01', {'region': 'Genesis'})': [{'time': '1970-01-01T00:00:00Z', 'mean': 3.5013387093463155}]})

You can explain query

In [20]:
%influxql EXPLAIN SELECT * FROM starfleet_02 WHERE region = 'Andoria' LIMIT 1

ResultSet({'('results', None)': [{'QUERY PLAN': 'EXPRESSION: <nil>'}, {'QUERY PLAN': 'AUXILIARY FIELDS: cmdt::tag, "curve_b[0]"::float, "curve_b[1]"::float, "curve_b[2]"::float, region::tag, spacecraft::tag'}, {'QUERY PLAN': 'NUMBER OF SHARDS: 25'}, {'QUERY PLAN': 'NUMBER OF SERIES: 750'}, {'QUERY PLAN': 'CACHED VALUES: 0'}, {'QUERY PLAN': 'NUMBER OF FILES: 2160'}, {'QUERY PLAN': 'NUMBER OF BLOCKS: 2160'}, {'QUERY PLAN': 'SIZE OF BLOCKS: 5992395'}]})

In [21]:
%influxql EXPLAIN SELECT * FROM starfleet_02 WHERE pressure_a > 3.5 LIMIT 1

ResultSet({'('results', None)': [{'QUERY PLAN': 'EXPRESSION: <nil>'}, {'QUERY PLAN': 'AUXILIARY FIELDS: cmdt::tag, "curve_b[0]"::float, "curve_b[1]"::float, "curve_b[2]"::float, region::tag, spacecraft::tag'}, {'QUERY PLAN': 'NUMBER OF SHARDS: 25'}, {'QUERY PLAN': 'NUMBER OF SERIES: 3000'}, {'QUERY PLAN': 'CACHED VALUES: 0'}, {'QUERY PLAN': 'NUMBER OF FILES: 8640'}, {'QUERY PLAN': 'NUMBER OF BLOCKS: 8640'}, {'QUERY PLAN': 'SIZE OF BLOCKS: 23943525'}]})

In [22]:
res = %influxql SELECT * FROM "starfleet_01" WHERE pressure_a > 3.5 LIMIT 50

In [23]:
points = res.get_points(tags={"region": "Andoria"})

In [24]:
from pandas import DataFrame

In [25]:
df = DataFrame(points)

In [26]:
df.head()

Unnamed: 0,cmdt,consumption,pressure_a,region,spacecraft,speed,status_b,time
0,Archer,80.868742,3.764769,Andoria,NX-1707,4540769.0,False,2019-07-28T01:35:50Z
1,Archer,64.35161,3.784118,Andoria,NX-1707,854912.5,False,2019-07-28T01:36:00Z
2,Archer,60.765559,3.907481,Andoria,NX-1702,167946.3,False,2019-07-28T01:36:04Z
3,Archer,252.870551,3.800938,Andoria,NX-1706,14447.27,True,2019-07-28T01:36:06Z
4,Kruge,192.871755,3.930994,Andoria,NX-1702,4376.49,False,2019-07-28T01:36:08Z


In [27]:
df.groupby('spacecraft').max()

Unnamed: 0_level_0,cmdt,consumption,pressure_a,region,speed,status_b,time
spacecraft,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
NX-1701,Kirk,226.626952,3.518973,Andoria,698749.8,False,2019-07-28T01:38:44Z
NX-1702,Kruge,217.135955,3.952388,Andoria,673524.6,True,2019-07-28T01:37:38Z
NX-1704,Kruge,278.172741,3.976138,Andoria,27.70339,True,2019-07-28T01:37:54Z
NX-1705,Kirk,197.423732,3.964069,Andoria,43006.35,True,2019-07-28T01:37:42Z
NX-1706,Kruge,252.870551,3.800938,Andoria,14447.27,True,2019-07-28T01:37:02Z
NX-1707,Kruge,289.899273,3.913948,Andoria,4540769.0,True,2019-07-28T01:38:46Z
NX-1709,Kruge,272.085362,3.734222,Andoria,165350.0,True,2019-07-28T01:37:30Z
NX-1710,Kruge,297.309907,3.875039,Andoria,58878.14,True,2019-07-28T01:38:20Z


In [28]:
%influxql SHOW QUERIES;

ResultSet({'('results', None)': [{'qid': 2821, 'query': 'SHOW QUERIES', 'database': 'example', 'duration': '171µs', 'status': 'running'}]})

In [29]:
%influxql SHOW TAG KEYS;

ResultSet({'('starfleet_01', None)': [{'tagKey': 'cmdt'}, {'tagKey': 'region'}, {'tagKey': 'spacecraft'}], '('starfleet_02', None)': [{'tagKey': 'cmdt'}, {'tagKey': 'region'}, {'tagKey': 'spacecraft'}]})