<!DOCTYPE html>
<html>

<body>
    <h2><span style="color: lightblue;">Influx v3</span></h2>
    <h3><strong>Desc:</strong></h3>
    <ul>
        <li>Write metrics to Influx Database.</li>
        <li>Read Metrics from a bucket.</li>
    </ul>
    <h3>Comments:</h3>
    <ul>
        <li>This notebook is still in <strong>DRAFT</strong> mode.</li>
        <li>This notebook was developed for Influx cloud v3. Do NOT use it with v2</li>
    </ul>
</body>
</html>

## Pre Req
1. Open an Iflux DB cloud account. Can be the free one. https://www.influxdata.com/get-influxdb/    
2. Create a new Bucket. For ex. ```Tables_bucket```.   
   
The rest of the flow follows the built-in samples
1. Open the "Load Data" option of "Client" 
2. Select "Python"
3. Follow the instructions. Step "install dependencies" is pip install influxdb3-python, pip install pandas

## Get Token. 
Without a token, the client code can't call the API. To get a Token, log in to the cloud console, Load Data, API Tokens, Generate API Token. 

In [5]:
%%bash
export INFLUXDB_TOKEN=XE9AyZ-3y-HJNyupKWiLgzVo5JMew-Y31Vq7gbakekdP66wIkBslEdnyrCc-vQ0t9MGFj449z0LvFhepVOwFfw==

## Initialize Client
Copy the code. I named the organization in Influx "Dev"

Notice. The original code, proivded by Influx returns an error: ```[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)``` . To solve it, add the coomand ``` ssl_ca_cert=certifi.where()```. Resource: # https://stackoverflow.com/questions/69401104/influxdb-2-0-certificate-verify-failed-certificate-has-expired-ssl-c1129
   

In [6]:
import os, time
from influxdb_client_3 import InfluxDBClient3, Point
import certifi # we need it to support their certification problems. 

# Read from the OS or pass the parameter directly. 
token = os.environ.get("INFLUXDB_TOKEN")
token = "XE9AyZ-3y-HJNyupKWiLgzVo5JMew-Y31Vq7gbakekdP66wIkBslEdnyrCc-vQ0t9MGFj449z0LvFhepVOwFfw=="
print (token)
org = "Dev"
host = "https://us-east-1-1.aws.cloud2.influxdata.com"

client = InfluxDBClient3(host=host, token=token, org=org, ssl_ca_cert=certifi.where())

XE9AyZ-3y-HJNyupKWiLgzVo5JMew-Y31Vq7gbakekdP66wIkBslEdnyrCc-vQ0t9MGFj449z0LvFhepVOwFfw==


## Copy (Insert) Data  
The object "Bucket" in the UI called "database" in python.
The measurement called "Census". Notic the ```Point``` object uses it.  



In this data example, we have some important concepts:
- **measurement**: Primary filter for the thing you are measuring. Since we are measuring the sample census of insects, our measurement is "census".
- **tag**: Key-value pair to store metadata about your fields. We are storing the "location" of where each census is taken. Tags form part of your primary key.
- **field**:	Key-value pair that stores the actual data you are measuring.	We are storing the insect "species" and "count" as the key-value pair. Fields are not indexed and can be stored as integers, floats, strings, or booleans.

In [7]:
database="Tables_Bucket"

data = {
  "point1": {
    "location": "Klamath",
    "species": "bees",
    "count": 25,
  },
  "point2": {
    "location": "Portland",
    "species": "ants",
    "count": 32,
  },
  "point3": {
    "location": "Klamath",
    "species": "bees",
    "count": 28,
  },
  "point4": {
    "location": "Portland",
    "species": "ants",
    "count": 36,
  },
  "point5": {
    "location": "Klamath",
    "species": "bees",
    "count": 27,
  },
  "point6": {
    "location": "Portland",
    "species": "ants",
    "count": 43,
  },
}

for key in data:
  point = (
    Point("census")
    .tag("location", data[key]["location"])
    .field(data[key]["species"], data[key]["count"])
  )
  client.write(database=database, record=point)
  time.sleep(1) # separate points by 1 second

print("Complete. Return to the InfluxDB UI.")


Complete. Return to the InfluxDB UI.


## Insert data from a CSV
The CLI Code uses a CSV stored in S3: https://influx-testdata.s3.amazonaws.com/air-sensor-data-annotated.csv

In [58]:
%%bash
# influx write --bucket Tables_Bucket --url https://influx-testdata.s3.amazonaws.com/air-sensor-data-annotated.csv

## Execute a Simple Query - SQL 
v3 supports SQL. v2 couldn't.

In [9]:
query = """SELECT *
FROM 'census'
WHERE time >= now() - interval '168 hours'
AND ('bees' IS NOT NULL OR 'ants' IS NOT NULL)"""

# Execute the query
table = client.query(query=query, database="Tables_Bucket", language='sql') 

# Convert to dataframe
df = table.to_pandas().sort_values(by="time")
column_names = df.columns
print(column_names)
print(df)




Index(['ants', 'bees', 'location', 'time'], dtype='object')
    ants  bees  location                          time
6    NaN  25.0   Klamath 2023-09-23 20:35:34.764041824
9   32.0   NaN  Portland 2023-09-23 20:35:36.123439256
7    NaN  28.0   Klamath 2023-09-23 20:35:37.362155251
10  36.0   NaN  Portland 2023-09-23 20:35:38.603744675
8    NaN  27.0   Klamath 2023-09-23 20:35:39.766943693
11  43.0   NaN  Portland 2023-09-23 20:35:40.936207328
0    NaN  25.0   Klamath 2023-09-25 19:01:26.022984180
3   32.0   NaN  Portland 2023-09-25 19:01:27.187632481
1    NaN  28.0   Klamath 2023-09-25 19:01:28.451863646
4   36.0   NaN  Portland 2023-09-25 19:01:29.637631603
2    NaN  27.0   Klamath 2023-09-25 19:01:30.798999958
5   43.0   NaN  Portland 2023-09-25 19:01:31.976858112


## Advance Query - SQL 
Sep 17 - doesn't work. Not sure why as it is copy paste from the demo. 
I suspect the fact I had to use "sql" as a language to make it work, and not "influxql" is the root cause. The documentation says using ```import influxdb_client_3 as InfluxDBClient3``` with influxql but it doesn't work.    

Trying to implement ```GROUP BY``` using: https://docs.influxdata.com/influxdb/v1/query_language/explore-data/#the-group-by-clause 

In [25]:
## Execute Aggregate Queries. The first one works
query = """
SELECT  location, max(time), avg(census.ants)
FROM "census"
WHERE time >= now() - interval '1 hour'
AND (ants IS NOT NULL)
GROUP BY location
"""


# Execute the query
table = client.query(query=query, database="Tables_Bucket", language='sql') 

# Convert to dataframe
df = table.to_pandas()#.sort_values(by='time')
print(df)

   location              MAX(census.time)  AVG(census.ants)
0  Portland 2023-09-25 19:01:31.976858112              37.0


## Query the data - Group by time ranges
You can always use a good olf Flux to run a GROUP BY query. Group the data every 5 min. 
It the query doesn't return any data, that means that no data was inserted to this Influx Bucket. See one of the cells above how to insert data.   

- Example 1 - running using the Influx CLI ( ```brew install influxdb-cli```). Apparently it still works with Flux. 
- Exampel 2 - Running using Python. It can't use Flux anymore. Only InfluQL ( https://docs.influxdata.com/influxdb/v1/query_language/, supported languages now are only SQL or InfluxQL: https://docs.influxdata.com/influxdb/cloud-dedicated/reference/client-libraries/v3/python/#functions)

In [18]:
%%bash
influx query \
'from(bucket: "Tables_Bucket")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "airSensors" and r._field == "humidity")
  |> aggregateWindow(every: 15m, fn: max)'



Result: _result
Table: keys: [_start, _stop, _field, _measurement, sensor_id]
                   _start:time                      _stop:time           _field:string     _measurement:string        sensor_id:string                      _time:time                  _value:float
------------------------------  ------------------------------  ----------------------  ----------------------  ----------------------  ------------------------------  ----------------------------
2023-09-25T18:10:07.450678986Z  2023-09-25T19:10:07.450678986Z                humidity              airSensors                 TLM0100  2023-09-25T18:15:00.000000000Z             35.38761862106402
2023-09-25T18:10:07.450678986Z  2023-09-25T19:10:07.450678986Z                humidity              airSensors                 TLM0100  2023-09-25T18:30:00.000000000Z              35.6597007766778
2023-09-25T18:10:07.450678986Z  2023-09-25T19:10:07.450678986Z                humidity              airSensors                 TLM0100

In [59]:
# Resource: Query Data with InfluxQL: https://docs.influxdata.com/influxdb/cloud/query-data/influxql/
import pandas as pd
import plotly.express as px
from influxdb_client import InfluxDBClient, Point
from datetime import timedelta


# Define your InfluxDB connection details
#url = "http://localhost:8086"
#token = "your_influxdb_token"
#org = "your_organization"
bucket = "Tables_Bucket"

# Create an InfluxDB client instance
#client = InfluxDBClient(url=url, token=token, org=org)

# SQL Query (NOT Flux)
query = """
SELECT max(humidity)
FROM airSensors
WHERE time >= '2023-09-25T00:00:00Z'
GROUP BY time(60m)
"""

token = os.environ.get("INFLUXDB_TOKEN")
token = "XE9AyZ-3y-HJNyupKWiLgzVo5JMew-Y31Vq7gbakekdP66wIkBslEdnyrCc-vQ0t9MGFj449z0LvFhepVOwFfw=="
org = "Dev"
host = "https://us-east-1-1.aws.cloud2.influxdata.com"

client = InfluxDBClient3(host=host, token=token, org=org, ssl_ca_cert=certifi.where())
# You can bring only the schema to help troubleshooting
# schema = client.query(query=query, database="Tables_Bucket", mode="schema", language="influxql")
# print(schema)

table = client.query(query=query, database="Tables_Bucket", mode ="all", language="influxql")
dataframe = table.to_pandas() # This one automatically eliminitaes the NULL values. Not good. 
print (table)

# Create a dictionary from the data
data_dict = {
    "Measurement": table[0],
    "Time": table[1],
    "Max": table[2]
}

# Create a Pandas DataFrame
df = pd.DataFrame(data_dict)

print(df)

# Create a line plot using Plotly Express
fig = px.line(df, x="Time", y="Max", title="Max Values Over Time", labels={"Max": "Max Value"})

# Show the plot
fig.show()

# Close the client connection
client.close()


pyarrow.Table
iox::measurement: string not null
time: timestamp[ns]
max: double
----
iox::measurement: [["airSensors","airSensors","airSensors","airSensors","airSensors",...,"airSensors","airSensors","airSensors","airSensors","airSensors"]]
time: [[2023-09-25 00:00:00.000000000,2023-09-25 01:00:00.000000000,2023-09-25 02:00:00.000000000,2023-09-25 03:00:00.000000000,2023-09-25 04:00:00.000000000,...,2023-09-25 16:00:00.000000000,2023-09-25 17:00:00.000000000,2023-09-25 18:00:00.000000000,2023-09-25 19:00:00.000000000,2023-09-25 20:00:00.000000000]]
max: [[null,null,null,null,null,...,null,36.68608222448508,36.72902315153046,36.61669941198891,null]]
   Measurement                Time        Max
0   airSensors 2023-09-25 00:00:00        NaN
1   airSensors 2023-09-25 01:00:00        NaN
2   airSensors 2023-09-25 02:00:00        NaN
3   airSensors 2023-09-25 03:00:00        NaN
4   airSensors 2023-09-25 04:00:00        NaN
5   airSensors 2023-09-25 05:00:00        NaN
6   airSensors 2023-0

## Delete a bucket


In [None]:
import requests

# InfluxDB API endpoint
base_url = "http://localhost:8086"
org = "your_organization"  # Replace with your organization name
bucket = "your_bucket"      # Replace with the name of the bucket you want to delete

# Authentication token (if required)
token = "your_authentication_token"  # Replace with your authentication token, if needed

# Construct the URL for deleting the bucket
url = f"{base_url}/api/v2/buckets/{org}/{bucket}"

# Headers for the request (include the authentication token if required)
headers = {
    "Authorization": f"Token {token}" if token else "",
}

# Send the DELETE request to delete the bucket
response = requests.delete(url, headers=headers)

# Check the response status code
if response.status_code == 204:
    print(f"Bucket '{bucket}' deleted successfully.")
else:
    print(f"Failed to delete bucket '{bucket}'. Status code: {response.status_code}")
    print(response.text)
