some LogQL v2 queries from dashboard panel never make it past query-frontend #2799

zswanson · 2020-10-24T01:21:42Z

Describe the bug
While playing with the new logql v2 features I found that some queries, mostly those with a lot of label series that would be returned, fail in a Grafana dashboard query even though they work fine in a Explore query.

My particular query (which isn't of any real usefulness except that I was specifically trying to see how it performed with lots of returned series) was: quantile_over_time(0.95,{unit="nginx"} | logfmt | unwrap request_time[5m]) by (app)
In Explore this query took around 7 seconds in my dev environment and had 20 label streams returned for app in the aggregation. The query stats reported 36,039 rows returned. A network inspection in chrome showed these 2 queries being executed:
https://my-grafana-server.com/api/datasources/proxy/1/loki/api/v1/query?query=quantile_over_time(0.95%2C%7Bunit%3D%22nginx%22%7D%20%7C%20logfmt%20%7C%20unwrap%20request_time%5B5m%5D)%20by%20(app)&time=1603456153000000000&limit=1000
and
https://my-grafana-server.com/api/datasources/proxy/1/loki/api/v1/query_range?direction=BACKWARD&limit=1000&query=quantile_over_time(0.95%2C%7Bunit%3D%22nginx%22%7D%20%7C%20logfmt%20%7C%20unwrap%20request_time%5B5m%5D)%20by%20(app)&start=1603452552000000000&end=1603456153000000000&step=2

I took the same query to a new dashboard/panel and pasted it into the query field. After a few seconds Grafana reports the error: "merging responses requires at least one response"
The network inspection from chrome showed only a single query posted to Loki.
https://my-grafana-server.com/api/datasources/proxy/1/loki/api/v1/query_range?direction=BACKWARD&limit=1000&query=quantile_over_time(0.95%2C%7Bunit%3D%22nginx%22%7D%20%7C%20logfmt%20%7C%20unwrap%20request_time%5B5m%5D)%20by%20(app)&start=1603434277000000000&end=1603455878000000000&step=20

An inspection of my query-frontend and querier logs shows that the queriers never seem to have received the work - their logs indicated no activity when I executed from the grafana dashboard panel. In contrast when I executed the query in Explore mode, the queriers logged that they were talking to ingesters and downloading files from the S3 store. In the failure case I see this log entry from the query-frontend.

level=warn ts=2020-10-23T12:38:10.701832654Z caller=logging.go:71 traceID=64cf10d8c0e9f025 msg="GET /loki/api/v1/query_range?direction=BACKWARD&limit=1000&query=quantile_over_time(0.95%2C%7Bunit%3D%22nginx%22%7D%20%7C%20logfmt%20%7C%20unwrap%20request_time%5B5m%5D)%20by%20(app)&start=1603435083000000000&end=1603456684000000000&step=20 (500) 7.382479682s Response: "merging responses requires at least one response\n" ws: false; Accept: application/json, text/plain, /; Accept-Encoding: gzip, deflate, br; Accept-Language: en-US,en;q=0.9; Cache-Control: no-cache; Dnt: 1; Pragma: no-cache; Sec-Fetch-Dest: empty; Sec-Fetch-Mode: cors; Sec-Fetch-Site: same-origin; User-Agent: Grafana/7.2.1; X-Amzn-Trace-Id: Self=1-5f92ceab-54f9ba8426cceb5173544322;Root=1-5f92ceab-163cf54f3b04c0770775e3c9; X-Forwarded-For: XXXXXXXXXX; X-Forwarded-Port: 443; X-Forwarded-Proto: https; X-Grafana-Org-Id: 1; "

This is puzzling because for smaller queries (though this doesn't seem like a particularly large query, and it works in Explore mode regardless) it seems to work fine in the dashboard panel. If I changed the query to aggregate by request_method (get, put, post, etc etc) then the query works as expected in the panel.

Loki edc6215

Expected behavior
Expected the query to behave the same in both Explore and Grafana Dashboard Panels.

Environment:

Infrastructure: ECS Fargate (AWS). Boltdb-shipper to S3. Running 2 ingester/distributors, 2 queriers, 1 query-frontend, 1 compactor
Deployment tool: AWS ECS Fargate Task Definitions

Screenshots, Promtail config, or terminal output
If applicable, add any output to help explain your problem.

The text was updated successfully, but these errors were encountered:

zswanson · 2020-10-24T16:09:39Z

Re-checked this with using master build 3f93b5b and #2796 seems to fix this issue.

emilmark-wowgroup · 2021-02-12T10:09:24Z

@zswanson could you please share your loki configuration? We are trying to run more or less the same infrastructure setup as you do, but we can't get the s3 + boltdb-shipper to work. We have seen many examples but they are contradicting each other. Grateful for any help you can provide!

zswanson closed this as completed Oct 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some LogQL v2 queries from dashboard panel never make it past query-frontend #2799

some LogQL v2 queries from dashboard panel never make it past query-frontend #2799

zswanson commented Oct 24, 2020

zswanson commented Oct 24, 2020

emilmark-wowgroup commented Feb 12, 2021 •

edited

Loading

some LogQL v2 queries from dashboard panel never make it past query-frontend #2799

some LogQL v2 queries from dashboard panel never make it past query-frontend #2799

Comments

zswanson commented Oct 24, 2020

zswanson commented Oct 24, 2020

emilmark-wowgroup commented Feb 12, 2021 • edited Loading

emilmark-wowgroup commented Feb 12, 2021 •

edited

Loading