Adjust the error message of "Your question took too long" #12423

flamber · 2020-04-29T16:48:02Z

Describe the bug
There's many benefits from the new connection handling in 0.35, like better pool handling, faster and can handle much higher load, but it also means that it's not sending a newline every second anymore, which before 0.35 kept the connection alive.

This mean it's now needed to adjust timeouts of reverse-proxy/load-balancer, when a question with a query time exceeds the timeout of the proxy.

Otherwise the proxy will close the connection, which will show the error in dashboard/question:

Your question took too long
We didn't get an answer back from your database in time, so we had to stop. You can try again in a minute, or if the problem persists, you can email an admin to let them know.

Workaround
Change the timeout of the reverse-proxy/load-balancer, so the connection between the user/browser and Metabase isn't closed before results are returned.
Unsure which proxy might be closing the connection, then use the browser developer Network-tab to see the response headers of the failing request.

To Reproduce

Setup Metabase behind a proxy with a timeout of 60 seconds (which is the default of Nginx and many other proxies)
Create a question, where the query will exceed 60 seconds (example Postgres select pg_sleep(65); or MySQL select sleep(65);)
Run the question (either directly or via a dashboard)
After 60 seconds, then the proxy closes the connection, and the error is shown in the interface.
There will also be an error in log, which says that Metabase has lost the connection to the user/browser, since the proxy in between has closed the connection:
ERROR async.streaming-response :: Error determining whether HTTP request was canceled

Expected behavior
Either of these would probably help:

The error message should probably be rephrased to reflect the behavior of 0.35
It would be great if headers could be validated, so if the answer doesn't come from Metabase, then show a different error message
If Metabase could send a keepalive-signal to the browser every X seconds (lower than 60), then that would be magic and probably avoid the needed proxy/lb timeout adjustments. But given all the issues with the old newline keepalive method, it might not be worth it.

Information about your Metabase Installation:
Metabase 0.35.3 behind a reverse-proxy with a timeout of 60 seconds

Additional context
Based on #12335 and https://discourse.metabase.com/t/your-question-took-too-long-0-35-1/9621
Giving it P2, since this seems like it might be a common problem
Related #11463

The Elastic BeanStalk image has hardcoded timeout of 600 seconds (10 minutes), which should probably be changed to a higher number, since that means it's not possible to run queries for longer than 10 minutes on EBS no matter if the load balancer has a higher timeout.
It is possible to manually change that, but every time the instance is upgraded, then those changes need to be manually applied again.

⬇️ Please click the 👍 reaction instead of leaving a +1 or update? comment

The text was updated successfully, but these errors were encountered:

viblo · 2020-06-25T11:12:48Z

Just to add another case: We have deployed Metabase on a Azure App Service, and App Services have a hard timeout of 230 seconds which is not possible to increase.

I wonder if a alternative solution would be to allow running queries in the background somehow? I have some queries that can take 10-20 minutes to run, and for that long queries I dont want to wait in the UI anyway. Instead I could start the query in some background/task list, and then come back later and list the queries, their status and if available their results. This would extend the type of queries possible to run through metabase, especially for analytics databases such as Snowflake or BigQuery.

flamber · 2020-06-25T11:42:31Z

@viblo Queuing is difficult. The old method had severe problems that could overload or crash Metabase. We're still investigating what the best approach would be. You might be interested in these issues too #10690 and #11328

mfpinhal · 2020-07-02T09:24:45Z

We are experiencing the same issue, due to Cloudflare's 100s limit (reference here).

dariusdev · 2020-07-23T10:54:24Z

It is not possible to change Cloudflare 100s limit without 'enterprise' plan.
It becomes impossible to run longer query. Would be nice to have option to keep sending newline and keep this connection active.

EnilPajic · 2021-05-27T08:53:12Z

Hello. Is there any progress on this?
Those "workarounds" mentioned by increasing LB/proxy timeout do not work if we do not control proxy, as it is in our case - we use cloudflare, and there is limit of 100s (mentioned by two previous comments too).

A single option "keep connection alive on long-running queries (send newline every 1s)" on metabase admin would be nice (also as mentioned in previous comment almost 11m before).

Limess · 2021-06-21T08:41:50Z

This is also causing difficulties for us:
We have:
Cloudflare -> ALB -> Nginx -> ALB -> Metabase

with the ALBs being shared between several services.

Increasing idle timeouts/read timeouts across the board doesn't really work for us here and makes Metabase a special case - instead we're just dealing with a hard limit.

If there was still an option of sending the keep alives we'd definitely enable it.

hopeswiller · 2021-07-30T19:20:34Z

I have come across this issue and followed the process.
I have set this in my Nginx config but the still get the message of "Question took too long" and times out

http{
   ...
   proxy_read_timeout 3600;
   proxy_connect_timeout 3600;
   proxy_send_timeout 3600;
   ...
}

Unsure which proxy might be closing the connection, then use the browser developer Network-tab to see the response headers of the failing request.

I'm not sure what I should be checking for in the response headers

Below is my diagnostic info

{
  "browser-info": {
    "language": "en-GB",
    "platform": "Win32",
    "userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36",
    "vendor": "Google Inc."
  },
  "system-info": {
    "file.encoding": "UTF-8",
    "java.runtime.name": "OpenJDK Runtime Environment",
    "java.runtime.version": "11.0.11+9",
    "java.vendor": "AdoptOpenJDK",
    "java.vendor.url": "https://adoptopenjdk.net/",
    "java.version": "11.0.11",
    "java.vm.name": "OpenJDK 64-Bit Server VM",
    "java.vm.version": "11.0.11+9",
    "os.name": "Linux",
    "os.version": "4.4.0-87-generic",
    "user.language": "en",
    "user.timezone": "GMT"
  },
  "metabase-info": {
    "databases": [
      "postgres",
      "mongo",
      "googleanalytics"
    ],
    "hosting-env": "unknown",
    "application-database": "postgres",
    "application-database-details": {
      "database": {
        "name": "PostgreSQL",
        "version": "11.8 (Ubuntu 11.8-1.pgdg16.04+1)"
      },
      "jdbc-driver": {
        "name": "PostgreSQL JDBC Driver",
        "version": "42.2.18"
      }
    },
    "run-mode": "prod",
    "version": {
      "date": "2021-07-14",
      "tag": "v0.40.1",
      "branch": "release-x.40.x",
      "hash": "ed8f9c8"
    },
    "settings": {
      "report-timezone": null
    }
  }
}

flamber · 2021-07-30T19:40:14Z

@hopeswiller

Please use the forum for questions and troubleshooting: https://discourse.metabase.com/

You're not writing how long time it takes before you're getting the timeout.
I cannot tell you which response header to look at, since not all proxies replaces the Server header. Post all response headers.

Remember to restart Nginx after making the change. 3600 seconds seems excessive. 600 or 1200 should be plenty (for most).

You are setting the timeouts for the http context (at a higher level). It does not show if you have other configurations at lower contexts like server or location, which might overrule the higher level.
See all your configuration with nginx -T

Czlenson95 · 2021-12-02T07:56:36Z

Is this problem will be addressed in future releases?
Workaround with increasing timeout on the proxy is not a solution for us.

kszarlej · 2022-03-02T07:59:46Z

Hello I also think that this should be addressed. For example AWS ALB max idle timeout is 4000 seconds. This essentially means that if metabase runs on AWS behind ALB (which is pretty standard setup) you cannot use metabase with queries whose runtime exceeds 4000 seconds (~1 hr and 6 minutes) and we need to query Redshift directly.

Also in AWS configuration of the timeout is possible only per load balancer. Typically you run multiple applications on the same load balancer and each gets unique target group. Since you cannot configure idle_timeout per target group you end up with treating metabase specially and you have to create another, dedicated ALB for it.

Would be good if optionally we could just switch metabase front to just poll for cached results on backend with short-lived connections, instead of waiting for those results on a long-running open connection.

seangibeault · 2023-11-06T20:28:40Z

Any chance this is on the roadmap?

j-ro · 2023-12-14T23:52:28Z

I'll add my voice in saying that the existing workarounds here are not sufficient, as sometimes you do not control the proxy (a la Cloudflare).

I'd typically try to break up concerns here, with the frontend kicking off long running background jobs like fetching a query and then polling regularly for the result. The long running background job in turn could listen for keepalives from the frontend and cancel if the heartbeat stops. But relying on very long running processes from the frontend all the way to the database and back seems like perhaps a mistake, and certainly is causing some pain for us because we cannot get around a hard 100 second timeout via Cloudflare.

davyzhang · 2024-01-22T00:43:35Z

I am experiencing the same problem changing cloudflare is not possible for me. Use websocket to send these long running query might be a solution

ketandoshi · 2024-02-23T06:59:06Z

We are experiencing the same issue of 524 timeout error after 100s with Metabase 0.46.8 & Cloudflare. Has anyone got any solution yet?

alice-telescoop · 2024-03-01T14:00:39Z

Same here, using metabase on another hosting provider that does not allow to control proxy. Is there any work in progress about this?

paoliniluis · 2024-03-01T16:36:54Z

@alice-telescoop, even if we adjust the message, the hosting provider will cut the connection and leave the user without the answer. Why can't you change the hosting provider if the queries are slow?

alice-telescoop · 2024-03-01T16:53:41Z

Our metabase is a tools for the few business developpers of our small team. It would take quite a lot of time to change the hosting provider that we don't necessarily have. I was actually directed to this issue by the hosting provider itself. I realize by reading your answer that the issue only aims at changing the message. Is there any issue or planned work on having some sorte of keep-alive system?

flamber added Type:Bug Product defects Priority:P2 Average run of the mill bug .Frontend Querying/ labels Apr 29, 2020

This was referenced Apr 29, 2020

Remove 60 second timeout from the BigQuery driver #12003

Closed

The question took too long on versions v0.35.0 and above #12460

Closed

flamber mentioned this issue May 11, 2020

Error processing query: null causing CPU 100% and freezing of service #12474

Closed

This was referenced May 24, 2020

Bug: Questions tab does not load, times out with the HTML of a "gateway timeout" response rendered on the page #6573

Closed

Your question took too long [using docker] #7587

Closed

flamber mentioned this issue May 31, 2020

Base memory usage increasing after sync #12060

Open

This was referenced Jun 15, 2020

Disable "Your question took too long" message #12706

Closed

Postgresql: ERROR: canceling statement due to user request #12740

Closed

dacort mentioned this issue Jun 27, 2020

Query timeout issue dacort/metabase-athena-driver#46

Closed

flamber mentioned this issue Aug 21, 2020

Receive error Your question took too long when executing a long query #13151

Closed

flamber mentioned this issue Oct 15, 2020

Your question took too long - Metabase v0.36.7 on MacOS Catalina against MySQL 8.0.21 #13493

Closed

flamber mentioned this issue Apr 9, 2021

Your question took too long | Set query timeout #15536

Closed

flamber mentioned this issue Jun 23, 2021

Question times out if the underlying query takes more than 60 seconds #16732

Closed

francesmcmullin mentioned this issue Jul 19, 2021

Elastic Beanstalk nginx config is not updated on latest EB docker images #17115

Closed

yv5125 mentioned this issue Sep 22, 2021

Distinct, non-empty sequence of Field clauses #11324

Closed

flamber mentioned this issue Sep 27, 2021

Remove 60 second timeout from the Druid driver #18078

Closed

ranquild assigned ranquild and unassigned ranquild Sep 29, 2021

flamber added Difficulty:Hard Priority:P1 Security holes w/o exploit, crashing, setup/upgrade, login, broken common features, correctness and removed Priority:P2 Average run of the mill bug .Frontend labels Nov 11, 2021

ranquild added the .Team/QueryProcessor :hammer_and_wrench: label Jun 2, 2023

This comment was marked as off-topic.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust the error message of "Your question took too long" #12423

Adjust the error message of "Your question took too long" #12423

flamber commented Apr 29, 2020 •

edited

viblo commented Jun 25, 2020

flamber commented Jun 25, 2020

mfpinhal commented Jul 2, 2020

dariusdev commented Jul 23, 2020

EnilPajic commented May 27, 2021 •

edited

Limess commented Jun 21, 2021 •

edited

hopeswiller commented Jul 30, 2021

flamber commented Jul 30, 2021 •

edited

Czlenson95 commented Dec 2, 2021

kszarlej commented Mar 2, 2022 •

edited

seangibeault commented Nov 6, 2023

j-ro commented Dec 14, 2023

This comment was marked as off-topic.

davyzhang commented Jan 22, 2024

ketandoshi commented Feb 23, 2024 •

edited

alice-telescoop commented Mar 1, 2024

paoliniluis commented Mar 1, 2024

alice-telescoop commented Mar 1, 2024

Adjust the error message of "Your question took too long" #12423

Adjust the error message of "Your question took too long" #12423

Comments

flamber commented Apr 29, 2020 • edited

viblo commented Jun 25, 2020

flamber commented Jun 25, 2020

mfpinhal commented Jul 2, 2020

dariusdev commented Jul 23, 2020

EnilPajic commented May 27, 2021 • edited

Limess commented Jun 21, 2021 • edited

hopeswiller commented Jul 30, 2021

flamber commented Jul 30, 2021 • edited

Czlenson95 commented Dec 2, 2021

kszarlej commented Mar 2, 2022 • edited

seangibeault commented Nov 6, 2023

j-ro commented Dec 14, 2023

This comment was marked as off-topic.

davyzhang commented Jan 22, 2024

ketandoshi commented Feb 23, 2024 • edited

alice-telescoop commented Mar 1, 2024

paoliniluis commented Mar 1, 2024

alice-telescoop commented Mar 1, 2024

flamber commented Apr 29, 2020 •

edited

EnilPajic commented May 27, 2021 •

edited

Limess commented Jun 21, 2021 •

edited

flamber commented Jul 30, 2021 •

edited

kszarlej commented Mar 2, 2022 •

edited

ketandoshi commented Feb 23, 2024 •

edited