Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some LogQL v2 queries from dashboard panel never make it past query-frontend #2799

Closed
zswanson opened this issue Oct 24, 2020 · 2 comments
Closed

Comments

@zswanson
Copy link

Describe the bug
While playing with the new logql v2 features I found that some queries, mostly those with a lot of label series that would be returned, fail in a Grafana dashboard query even though they work fine in a Explore query.

My particular query (which isn't of any real usefulness except that I was specifically trying to see how it performed with lots of returned series) was: quantile_over_time(0.95,{unit="nginx"} | logfmt | unwrap request_time[5m]) by (app)
In Explore this query took around 7 seconds in my dev environment and had 20 label streams returned for app in the aggregation. The query stats reported 36,039 rows returned. A network inspection in chrome showed these 2 queries being executed:
https://my-grafana-server.com/api/datasources/proxy/1/loki/api/v1/query?query=quantile_over_time(0.95%2C%7Bunit%3D%22nginx%22%7D%20%7C%20logfmt%20%7C%20unwrap%20request_time%5B5m%5D)%20by%20(app)&time=1603456153000000000&limit=1000
and
https://my-grafana-server.com/api/datasources/proxy/1/loki/api/v1/query_range?direction=BACKWARD&limit=1000&query=quantile_over_time(0.95%2C%7Bunit%3D%22nginx%22%7D%20%7C%20logfmt%20%7C%20unwrap%20request_time%5B5m%5D)%20by%20(app)&start=1603452552000000000&end=1603456153000000000&step=2

I took the same query to a new dashboard/panel and pasted it into the query field. After a few seconds Grafana reports the error: "merging responses requires at least one response"
The network inspection from chrome showed only a single query posted to Loki.
https://my-grafana-server.com/api/datasources/proxy/1/loki/api/v1/query_range?direction=BACKWARD&limit=1000&query=quantile_over_time(0.95%2C%7Bunit%3D%22nginx%22%7D%20%7C%20logfmt%20%7C%20unwrap%20request_time%5B5m%5D)%20by%20(app)&start=1603434277000000000&end=1603455878000000000&step=20

An inspection of my query-frontend and querier logs shows that the queriers never seem to have received the work - their logs indicated no activity when I executed from the grafana dashboard panel. In contrast when I executed the query in Explore mode, the queriers logged that they were talking to ingesters and downloading files from the S3 store. In the failure case I see this log entry from the query-frontend.

level=warn ts=2020-10-23T12:38:10.701832654Z caller=logging.go:71 traceID=64cf10d8c0e9f025 msg="GET /loki/api/v1/query_range?direction=BACKWARD&limit=1000&query=quantile_over_time(0.95%2C%7Bunit%3D%22nginx%22%7D%20%7C%20logfmt%20%7C%20unwrap%20request_time%5B5m%5D)%20by%20(app)&start=1603435083000000000&end=1603456684000000000&step=20 (500) 7.382479682s Response: "merging responses requires at least one response\n" ws: false; Accept: application/json, text/plain, /; Accept-Encoding: gzip, deflate, br; Accept-Language: en-US,en;q=0.9; Cache-Control: no-cache; Dnt: 1; Pragma: no-cache; Sec-Fetch-Dest: empty; Sec-Fetch-Mode: cors; Sec-Fetch-Site: same-origin; User-Agent: Grafana/7.2.1; X-Amzn-Trace-Id: Self=1-5f92ceab-54f9ba8426cceb5173544322;Root=1-5f92ceab-163cf54f3b04c0770775e3c9; X-Forwarded-For: XXXXXXXXXX; X-Forwarded-Port: 443; X-Forwarded-Proto: https; X-Grafana-Org-Id: 1; "

This is puzzling because for smaller queries (though this doesn't seem like a particularly large query, and it works in Explore mode regardless) it seems to work fine in the dashboard panel. If I changed the query to aggregate by request_method (get, put, post, etc etc) then the query works as expected in the panel.

Loki edc6215

Expected behavior
Expected the query to behave the same in both Explore and Grafana Dashboard Panels.

Environment:

  • Infrastructure: ECS Fargate (AWS). Boltdb-shipper to S3. Running 2 ingester/distributors, 2 queriers, 1 query-frontend, 1 compactor
  • Deployment tool: AWS ECS Fargate Task Definitions

Screenshots, Promtail config, or terminal output
If applicable, add any output to help explain your problem.
image

@zswanson
Copy link
Author

Re-checked this with using master build 3f93b5b and #2796 seems to fix this issue.

@emilmark-wowgroup
Copy link

emilmark-wowgroup commented Feb 12, 2021

@zswanson could you please share your loki configuration? We are trying to run more or less the same infrastructure setup as you do, but we can't get the s3 + boltdb-shipper to work. We have seen many examples but they are contradicting each other. Grateful for any help you can provide!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants