Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metabase renders the result but the query does not finish in the DB #39018

Open
paoliniluis opened this issue Feb 21, 2024 · 2 comments · Fixed by #41470
Open

Metabase renders the result but the query does not finish in the DB #39018

paoliniluis opened this issue Feb 21, 2024 · 2 comments · Fixed by #41470
Assignees
Labels
Database/Redshift Priority:P1 Security holes w/o exploit, crashing, setup/upgrade, login, broken common features, correctness .Team/QueryProcessor :hammer_and_wrench: Type:Bug Product defects
Milestone

Comments

@paoliniluis
Copy link
Contributor

Describe the bug

Queries in the DB seem to continue in the background, while the query has already been finished for Metabase

To Reproduce

  1. add the Redshift dev DW to a Metabase (sample DB)
  2. do the following query:
SELECT
  "public"."random_data"."text_column1" AS "text_column1",
  FLOOR(("public"."random_data"."numeric_column1" / 125.0)) * 125.0 AS "numeric_column1",
  FLOOR(("public"."random_data"."numeric_column4" / 125.0)) * 125.0 AS "numeric_column4",
  FLOOR(("public"."random_data"."numeric_column5" / 125.0)) * 125.0 AS "numeric_column5",
  FLOOR(("public"."random_data"."numeric_column8" / 125.0)) * 125.0 AS "numeric_column8",
  FLOOR(("public"."random_data"."numeric_column9" / 125.0)) * 125.0 AS "numeric_column9",
  FLOOR(
    ("public"."random_data"."numeric_column10" / 125.0)
  ) * 125.0 AS "numeric_column10",
  FLOOR(
    ("public"."random_data"."numeric_column13" / 125.0)
  ) * 125.0 AS "numeric_column13",
  COUNT(*) AS "count"
FROM
  "public"."random_data"
GROUP BY
  "public"."random_data"."text_column1",
  FLOOR(("public"."random_data"."numeric_column1" / 125.0)) * 125.0,
  FLOOR(("public"."random_data"."numeric_column4" / 125.0)) * 125.0,
  FLOOR(("public"."random_data"."numeric_column5" / 125.0)) * 125.0,
  FLOOR(("public"."random_data"."numeric_column8" / 125.0)) * 125.0,
  FLOOR(("public"."random_data"."numeric_column9" / 125.0)) * 125.0,
  FLOOR(
    ("public"."random_data"."numeric_column10" / 125.0)
  ) * 125.0,
  FLOOR(
    ("public"."random_data"."numeric_column13" / 125.0)
  ) * 125.0
ORDER BY
  "public"."random_data"."text_column1" ASC,
  FLOOR(("public"."random_data"."numeric_column1" / 125.0)) * 125.0 ASC,
  FLOOR(("public"."random_data"."numeric_column4" / 125.0)) * 125.0 ASC,
  FLOOR(("public"."random_data"."numeric_column5" / 125.0)) * 125.0 ASC,
  FLOOR(("public"."random_data"."numeric_column8" / 125.0)) * 125.0 ASC,
  FLOOR(("public"."random_data"."numeric_column9" / 125.0)) * 125.0 ASC,
  FLOOR(
    ("public"."random_data"."numeric_column10" / 125.0)
  ) * 125.0 ASC,
  FLOOR(
    ("public"."random_data"."numeric_column13" / 125.0)
  ) * 125.0 ASC
  1. wait till it finishes and then...
  2. go to Redshift and check the status of select * from STV_INFLIGHT, you'll see that the query is still running while in Metabase it has finished

Expected behavior

The connection and query should finish if Metabase finishes

Logs

NA

Information about your Metabase installation

v48.6

Severity

P1

Additional context

No response

@paoliniluis paoliniluis added Type:Bug Product defects Priority:P1 Security holes w/o exploit, crashing, setup/upgrade, login, broken common features, correctness Database/Redshift labels Feb 21, 2024
@perivamsi perivamsi added the .Team/QueryProcessor :hammer_and_wrench: label Feb 22, 2024
@lbrdnk lbrdnk self-assigned this Apr 15, 2024
@lbrdnk
Copy link
Contributor

lbrdnk commented Apr 15, 2024

It seems that we actually close the connection, but it takes much longer than returning the results back to the browser. I believe its a result of limiting rows to 2k.

After starting the query execution from GUI, we are able to check its status eg. using select * from sys_query_history where end_time is null;

That way we could acquire its id. We can further check its status using select * from sys_query_history where query_id = <that id>;

Query completes in the GUI in less than 30 seconds. Then there is a 30+ seconds window until the query's end_time column gets its value -- query completes on the Redshift's side.

Also executing this query through psql takes those 60+ seconds.


Now why is that? My hyphotesis: In limit postprocessing middleware we reduce only the limit number of rows (limit.clj) -- in this case 2k. I believe that result set is batched, thus we are able to take those 2k rows and return those to the FE, while query is still running on the Redshift side.

Afterwards, during the qp machinery's return phase, result set is closed indeed, but the closing takes the delta between the time of seemingly completion in GUI and the actual completion on the Redshift side.

If I'm correct here, closing the result set (with-open's finally) in metabase.driver.sql-jdbc.execute/execute-reducible-query (execute.clj) blocks until the remaining results are computed, even though we reduce only first 2k.

To back that, measurements of metabase.driver.sql-jdbc.execute/execute-reducible-query follow:

"Elapsed time: 10037.283041 msecs"   -- timing of the statement execution
"Elapsed time: 513.398333 msecs"     -- timing of the respond call
"Elapsed time: 62449.817709 msecs"   -- timing of the call to with-open

Then, following is the log snippet, that's rendered after the with-open returns -- long after FE has the results and considers the query completed.

2024-04-15 15:54:17,190 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: canceled] 1.0 mins (2 DB calls) App DB connections: 2/10 Jetty threads: 4/50 (2 idle, 0 queued) (101 total active threads) Queries in flight: 0 (0 queued); redshift DB 2 connections: 0/2 (0 threads blocked) {:metabase-user-id 1}

What can we do about that? We could (1) either kill the query instead of an attempt to close the result set or (2) we could modify the limit middleware to be applied also to native queries (by eg. wrapping those in sub select) or (3) find a way to notify the user about still running query.

I'm leaning towards solution 2. If there's some blocker along the way, 1 could be also feasible.

@paoliniluis
Copy link
Contributor Author

according to the person who reported this, it keeps happening. I need time to reproduce this

@paoliniluis paoliniluis reopened this May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Database/Redshift Priority:P1 Security holes w/o exploit, crashing, setup/upgrade, login, broken common features, correctness .Team/QueryProcessor :hammer_and_wrench: Type:Bug Product defects
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants