Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queries failing after 30s despite changing dataproxy.timeout to > 30s #27839

Closed
kahinton opened this issue Sep 26, 2020 · 4 comments · Fixed by #27841
Closed

Queries failing after 30s despite changing dataproxy.timeout to > 30s #27839

kahinton opened this issue Sep 26, 2020 · 4 comments · Fixed by #27841

Comments

@kahinton
Copy link
Contributor

What happened:
Long running queries to Azure Log Analytics are failed after 30 seconds with a 504 Gateway timeout issue despite dataproxy.timeout set above 30s when run behind a network proxy.

What you expected to happen:
We expect that queries should not appear to time out in less time than has been configured in the dataproxy.timeout setting.

How to reproduce it (as minimally and precisely as possible):
Run a computationally heavy query to Azure Log Analytics with the dataproxy.timeout set above 30 seconds. This appears to be occurring only when the query is run through a network proxy. When running locally on a machine not utilizing the proxy the issue does not seem to be present.

Anything else we need to know?:
I'm adding a pull request that will make the dataproxy keepalive and idle connection timeout configurable, while keeping the defaults the same as they currently are. It appears that the keepalive request always fails when this proxy is utilized, and having the ability to change the keepalive time should help us to resolve this. I also figured I would add the setting for idle connection timeout for anyone that may have a reason to change it. There seem to have been a few issues reported previously that I think may have been caused by similar issues, however they had managed to simply lower the query time.

Environment:

  • Grafana version: At least since 7.x.x
  • Data source type & version: Azure Monitor 0.3.0
  • OS Grafana is installed on: Running Docker 7.2.0-ubuntu
  • User OS & Browser: Reported across multiple OS and Browser types
  • Grafana plugins:
  • Others:
@marefr
Copy link
Member

marefr commented Sep 28, 2020

Interesting. To resolve your problem you have to change keepalive and idle connection timeout to what?

@kahinton
Copy link
Contributor Author

In our situation we need to set the keepalive to essentially match the dataproxy timeout. It seems that the network proxy in use isn't handling the keep alive message well and it leads to the connection being marked as closed. I added the setting for the idle connection timeout just for anyone that has a need to more tightly control open connections to a datasource.

@anarcher
Copy link

I have the same issue. dataproxy timeout is above 30s but always datasource slow queries (loki, >= 1m) are failed with bad gateway(http: proxy error: EOF) and there are no error logs from query-frontend in loki.

@aocenas aocenas added this to Inbox in Backend Platform Backlog via automation Sep 29, 2020
@marefr marefr removed this from Inbox in Backend Platform Backlog Oct 6, 2020
@marefr marefr added this to the 7.3.0-beta1 milestone Oct 12, 2020
@morvencao
Copy link

I'm using grafana v7.4.2, still have the same issue.

Although I have updated the timeout and keep_alive_seconds to 300 seconds

[dataproxy]
timeout = 300
keep_alive_seconds = 300
logging = true

The requests are always timeout after 30 seconds, see:
grafana-timeout

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment