Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InfluxDB: Inconsistent query handling for InfluxQL in alerting #58864

Closed
sysadmin1139 opened this issue Nov 16, 2022 · 5 comments
Closed

InfluxDB: Inconsistent query handling for InfluxQL in alerting #58864

sysadmin1139 opened this issue Nov 16, 2022 · 5 comments
Assignees
Labels
area/alerting Grafana Alerting datasource/InfluxDB stale Issue with no recent activity

Comments

@sysadmin1139
Copy link

What happened:
We're building alerts for cardinality issues, which InfluxDB is prone to. The queries are pretty simple:

show tag values cardinality from measure_name with key = key_name

This returns a table:

count: integer
  • Using this query in Explore shows a value, though you have to set the Format As to Table. Inspect shows the received data as a table with a single row.
  • Using this in a panel works fine as a Stat panel. You also have to set Format As to Table. Inspect shows the same behavior as Explore.
  • Using this in an alerting rule returns No Data even if Format As is set to Table. There is no Inspect here to see what is being returned.

Looking at InfluxDB query logging there is a slight difference in the submitted query. Here is the received query from an Explore or Panel (these return the same logging)

"POST /query?db=fullmetric&epoch=ms HTTP/1.1 {'q': 'show tag values cardinality from measure_name with key = key_name'}"

And the recorded query from Alerting:

"POST /query?db=fullmetric&epoch=ms HTTP/1.1 {'q': 'show tag values cardinality from measure_name with key = key_name;'}"

The extra ; is interesting, but does not seem to affect results. Adding the ; to the Explore and Panel queries does not cause them to go to NoData state. Only Alerting does this. It's clear that the query-path is substantially identical. Where the problem resides is in handling the returned data.

What you expected to happen:
We expected the Alert queries to behave the same as the Explore/Panel queries (also to have an Inspect function on the Alert Rule page, but that's beside the point).

How to reproduce it (as minimally and precisely as possible):
Requirement: An InfluxDB server using InfluxQL, with at least one database containing at least one measurement with data in it.

  1. Set up a datasource using InfluxDB.
    1. Set to use InfluxQL
    2. Set to use POST as HTTP Method
    3. Set Database to be an InfluxDB database with some data in it. You need at least one measurement already present.
  2. In an Explore query:
    1. Set the datasource to the one set up in 1.
    2. Change to manual query mode
    3. Set Format As to Table
    4. Use the query: show tag values cardinality from [measurement] with key = [key_name from measurement] using the existing measurement with data in it, and a key name from that measurement.
    5. Run the query. The "Table" view should show a Count value with a number.
  3. Create an Alert Rule
    1. Create a Grafana managed alert.
    2. Follow the steps from 2.i through 2.iv to set up Query A. We don't care about Query B (expression) here.
    3. Click Run Queries. Query A should show No Data for each of Stat, Table, and Time series.

Anything else we need to know?:

Environment:

  • Grafana version: 9.1.6
  • Data source type & version: InfluxDB built-in, Influx version 1.8
  • OS Grafana is installed on: Ubuntu 20.04
  • User OS & Browser: Firefox 107, Ubuntu 22.04
  • Grafana plugins:
    • Stock set
    • Amazon Timestream
    • Grafana Image Renderer
    • OpenSearch
  • Others:
@zuchka zuchka added datasource/InfluxDB area/alerting Grafana Alerting triage/needs-confirmation used for OSS triage rotation - reported issue needs to be reproduced labels Nov 19, 2022
@zuchka zuchka changed the title Inconsistent query handling for InfluxQL in alerting InfluxDB: Inconsistent query handling for InfluxQL in alerting Nov 19, 2022
@kylebrandt kylebrandt removed the triage/needs-confirmation used for OSS triage rotation - reported issue needs to be reproduced label Nov 28, 2022
@kylebrandt
Copy link
Contributor

kylebrandt commented Nov 28, 2022

I am not sure on the exact change to fix this yet, need to get a handle around trasformRows

Technically Why this isn't working:

  • With explore and dashboard queries, the queries are being built by the frontend with InfluxQL (as InfluxDB migration to backend data source #43076 isn't complete and the backend migration is still under feature flag influxdbBackendMigration).
  • When alerting is used (or an expression is added), queries are constructed by the backend and sent through SSE (Server Side Expressions).
  • Within the backend code for influxQL, with this change this query is returning an empty string field, and SSE logs WARN [11-28|12:13:23] ignoring InfluxDB data frame due to missing numeric fields logger=expr datasourceType=influxdb frame="&{Name:cpu Fields:[0xc001978780] RefID:A Meta:<nil>}

The frame returned from Influx to SSE looks like (query: show tag values cardinality from "cpu" with key = "cpu"):

Name: cpu
Dimensions: 1 Fields by 0 Rows
+----------------+
| Name: value    |
| Labels:        |
| Type: []string |
+----------------+
+----------------+

So once we fix transformRows in influx is returning a frame with a single numeric field (like it does via the FE) when queried through the backend, (match substring on the query is likely too fragile). SSE and Alerting should be able to handle a response with a numeric field then is zero or 1 row in length.

Aside, Hi there @sysadmin1139 !! Feels like a lifetime since we last connected around ServerFault and (usenix/lisa or one of those...?) :-)

@sysadmin1139
Copy link
Author

I was happily surprised when I saw who got this ticket, and those LISAs were great. Glad we have a theory for what's going on!

@kylebrandt
Copy link
Contributor

The extra ; is interesting

It is. So what the influxdb datasource is doing on the backend (in the case of influx ql) is taking multiple queries if there are multiple queries (e.g. "A", "B"), concatenating the raw queries with ;, and sending them all in one request. In then iterates over the responses and assigns the data or query error back to the refid.

Copy link
Contributor

This issue has been automatically marked as stale because it has not had activity in the last year. It will be closed in 30 days if no further activity occurs. Please feel free to leave a comment if you believe the issue is still relevant. Thank you for your contributions!

@github-actions github-actions bot added the stale Issue with no recent activity label Jan 12, 2024
@armandgrillet
Copy link
Contributor

Should be fixed by #68619.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/alerting Grafana Alerting datasource/InfluxDB stale Issue with no recent activity
Projects
Archived in project
Development

No branches or pull requests

4 participants