Query performance is very slow #371

idubinets · 2021-11-29T20:01:37Z

Steps to reproduce:

write to the bucket 50k records:

with InfluxDBClient(url="http://influxdb:8086", token="XXXX", org="XXXXX") as client:
            write_api = client.write_api(write_options=WriteOptions(batch_size=500,
                                                      flush_interval=1_000,
                                                      jitter_interval=2_000,
                                                      retry_interval=5_000,
                                                      max_retries=5,
                                                      max_retry_delay=30_000,
                                                      exponential_base=2))

            count_records = 50_000
            for i in range (count_records):
                # we gonna write the same record multiple times, but with the different timestamp to avoid merge
                write_api.write("xxxxxxx", record=f'demo,tag_1=1,tag_2=1,tag_3=1,tag_4=1,tag_5=1,tag_6=1,tag_7=1 field1=100,field2=50,fileld3=50,fieldt=0 {time.time_ns()}')
            write_api.flush()
            write_api.close()

try to query them

query = f'from(bucket:"xxxxxxx") |> range(start: {test_start}) \
                                               |> filter(fn: (r) => r["_measurement"] == "demo" and r["tag_1"] == "1")'
start_time = time.time()
data_frame = query_api.query_data_frame(org='readytech', query=query)

# print(data_frame.to_string())            
print(f'data frame returned in: {(time.time() - start_time)} secs')

any ideas? I tried query_stream, query. only query_raw is fast enough

The text was updated successfully, but these errors were encountered:

bednar · 2021-11-30T08:27:58Z

Hi @idubinets,

thanks for using our client.

The query in your example returns 200 000 rows because you are miss a pivot function. This causes unnecessary amount of parsing, try to use something like:

import time

from influxdb_client import InfluxDBClient
from influxdb_client.client.write_api import WriteOptions

url = "http://localhost:8086"
token = "my-token"
org = "my-org"
bucket = "my-bucket"
with InfluxDBClient(url=url, token=token, org=org) as client:
    test_start = time.time_ns()
    count_records = 50_000
    with client.write_api(write_options=WriteOptions(batch_size=500,
                                                     flush_interval=1_000,
                                                     jitter_interval=2_000,
                                                     retry_interval=5_000,
                                                     max_retries=5,
                                                     max_retry_delay=30_000,
                                                     exponential_base=2)) as write_api:
        for i in range(count_records):
            # we gonna write the same record multiple times, but with the different timestamp to avoid merge
            write_api.write(bucket,
                            record=f'demo,tag_1=1,tag_2=1,tag_3=1,tag_4=1,tag_5=1,tag_6=1,tag_7=1 field1=100,field2=50,fileld3=50,fieldt=0 {time.time_ns()}')

    query = f'from(bucket:"{bucket}") |> range(start: -1h) ' \
            f'|> filter(fn: (r) => r["_measurement"] == "demo" and r["tag_1"] == "1") ' \
            f'|> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value") ' \
            f'|> drop(columns: ["_start", "_stop"])'
    start_time = time.time()
    data_frame = client.query_api().query_data_frame(org=org, query=query)

    print(f'data frame returned in: {(time.time() - start_time)} secs')
    print(data_frame)

Regards

bednar · 2021-12-14T07:34:06Z

This issue has been closed because it has not had recent activity. Please reopen if this issue is still important to you and you have additionally information.

idubinets · 2022-01-13T12:01:26Z

Thank you!

bednar added question Further information is requested wontfix This will not be worked on labels Nov 30, 2021

bednar mentioned this issue Nov 30, 2021

Python client is slow: takes 20min to query data that takes 6 seconds using a cURL command #72

Closed

bednar closed this as completed Dec 14, 2021

bednar added this to the 1.25.0 milestone Dec 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query performance is very slow #371

Query performance is very slow #371

idubinets commented Nov 29, 2021 •

edited

bednar commented Nov 30, 2021

bednar commented Dec 14, 2021

idubinets commented Jan 13, 2022

Query performance is very slow #371

Query performance is very slow #371

Comments

idubinets commented Nov 29, 2021 • edited

bednar commented Nov 30, 2021

bednar commented Dec 14, 2021

idubinets commented Jan 13, 2022

idubinets commented Nov 29, 2021 •

edited