Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested queries have limits which can defeat their purpose #41051

Merged
merged 5 commits into from
Apr 10, 2024

Conversation

dpsutton
Copy link
Contributor

@dpsutton dpsutton commented Apr 4, 2024

Fixes #24969

Consider a query

select count(*) from {{#199}}

This query should return the number of distinct rows in the query defined by 199. But it's actually limited by the excel limit of 1048575 (meta note, ignore that this is a link. Github is being weird). And that's because when the value of {{#199}} is expanded it has that limit applied as normal.

qp=> (let [card-id 199] ;; use a valid card id for you
       (-> {:database 1,
            :type :native,
            :native {:query "select count(*) from {{ref}}"
                     :template-tags {:ref {:card-id card-id
                                           :type :card
                                           :name "ref"
                                           :display-name "ref"}}}
            :middleware {:disable-max-results? true}} ;; note this middleware is not propagated to the subquery
           qp.compile/compile
           :query
           (metabase.db.query/format-sql )
           println))
select
  count(*)
from
  (
    SELECT
      "PUBLIC"."ORDERS"."ID" AS "ID",
      "PUBLIC"."ORDERS"."TOTAL" AS "TOTAL"
    FROM
      "PUBLIC"."ORDERS"
    LIMIT
      1048575         -- this is improper here
  )

I thought about trying to propagate that limit middleware, but i think it should not be the solution.

  • that limit is actually not honored for the containing sql query
  • we don't want to indiscriminately start adding that
  • we might still want a limit on the total query, but the inner query should be unbounded

When the parameter substitution code gets to the {{#199}} parameter tag, it compiles the query for card 199, which naturally includes the default excel limit. But we can suppress this limit specifically as a card parameter value when substituting a query inside yielding

select
  count(*)
from
  (
    SELECT
      "PUBLIC"."ORDERS"."ID" AS "ID",
      "PUBLIC"."ORDERS"."TOTAL" AS "TOTAL"
    FROM
      "PUBLIC"."ORDERS"
  )

Consider a query

```sql
select count(*) from {{#199}}
```

This query should return the number of distinct rows in the query
defined by 199. But it's actually limited by the excel limit of
1048575. And that's because when the value of `{{#199}}` is expanded it
has that limit applied as normal.

```clojure
qp=> (let [card-id 199] ;; use a valid card id for you
       (-> {:database 1,
            :type :native,
            :native {:query "select count(*) from {{ref}}"
                     :template-tags {:ref {:card-id card-id
                                           :type :card
                                           :name "ref"
                                           :display-name "ref"}}}
            :middleware {:disable-max-results? true}}
           qp.compile/compile
           :query
           (metabase.db.query/format-sql )
           println))
select
  count(*)
from
  (
    SELECT
      "PUBLIC"."ORDERS"."ID" AS "ID",
      "PUBLIC"."ORDERS"."TOTAL" AS "TOTAL"
    FROM
      "PUBLIC"."ORDERS"
    LIMIT
      1048575
  )
```

But we can suppress this limit when substituting a query inside yielding

```sql
select
  count(*)
from
  (
    SELECT
      "PUBLIC"."ORDERS"."ID" AS "ID",
      "PUBLIC"."ORDERS"."TOTAL" AS "TOTAL"
    FROM
      "PUBLIC"."ORDERS"
  )
```

And this is proper because we want to limit the _outer_ query, not
internal queries.
Copy link

replay-io bot commented Apr 4, 2024

Status Complete ↗︎
Commit 0338f12
Results
⚠️ 10 Flaky
2396 Passed

@dpsutton dpsutton added the backport Automatically create PR on current release branch on merge label Apr 4, 2024
@dpsutton dpsutton requested a review from a team April 4, 2024 22:00
Copy link
Contributor

@metamben metamben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/metabase/driver/common/parameters/values.clj Outdated Show resolved Hide resolved
@dpsutton dpsutton merged commit 2a8e19b into master Apr 10, 2024
106 checks passed
@dpsutton dpsutton deleted the remove-limits-on-nested-queries branch April 10, 2024 15:17
Copy link

@dpsutton Did you forget to add a milestone to the issue for this PR? When and where should I add a milestone?

dpsutton added a commit that referenced this pull request Apr 10, 2024
* Nested queries have limits which can defeat their purpose

Consider a query

```sql
select count(*) from {{#199}}
```

This query should return the number of distinct rows in the query
defined by 199. But it's actually limited by the excel limit of
1048575. And that's because when the value of `{{#199}}` is expanded it
has that limit applied as normal.

```clojure
qp=> (let [card-id 199] ;; use a valid card id for you
       (-> {:database 1,
            :type :native,
            :native {:query "select count(*) from {{ref}}"
                     :template-tags {:ref {:card-id card-id
                                           :type :card
                                           :name "ref"
                                           :display-name "ref"}}}
            :middleware {:disable-max-results? true}}
           qp.compile/compile
           :query
           (metabase.db.query/format-sql )
           println))
select
  count(*)
from
  (
    SELECT
      "PUBLIC"."ORDERS"."ID" AS "ID",
      "PUBLIC"."ORDERS"."TOTAL" AS "TOTAL"
    FROM
      "PUBLIC"."ORDERS"
    LIMIT
      1048575
  )
```

But we can suppress this limit when substituting a query inside yielding

```sql
select
  count(*)
from
  (
    SELECT
      "PUBLIC"."ORDERS"."ID" AS "ID",
      "PUBLIC"."ORDERS"."TOTAL" AS "TOTAL"
    FROM
      "PUBLIC"."ORDERS"
  )
```

And this is proper because we want to limit the _outer_ query, not
internal queries.

* Remove limit from test expectation

* stupid trailing space

* another subquery test

* Use helper function to disable limit middleware
dpsutton added a commit that referenced this pull request Apr 10, 2024
…41262)

* Nested queries have limits which can defeat their purpose

Consider a query

```sql
select count(*) from {{#199}}
```

This query should return the number of distinct rows in the query
defined by 199. But it's actually limited by the excel limit of
1048575. And that's because when the value of `{{#199}}` is expanded it
has that limit applied as normal.

```clojure
qp=> (let [card-id 199] ;; use a valid card id for you
       (-> {:database 1,
            :type :native,
            :native {:query "select count(*) from {{ref}}"
                     :template-tags {:ref {:card-id card-id
                                           :type :card
                                           :name "ref"
                                           :display-name "ref"}}}
            :middleware {:disable-max-results? true}}
           qp.compile/compile
           :query
           (metabase.db.query/format-sql )
           println))
select
  count(*)
from
  (
    SELECT
      "PUBLIC"."ORDERS"."ID" AS "ID",
      "PUBLIC"."ORDERS"."TOTAL" AS "TOTAL"
    FROM
      "PUBLIC"."ORDERS"
    LIMIT
      1048575
  )
```

But we can suppress this limit when substituting a query inside yielding

```sql
select
  count(*)
from
  (
    SELECT
      "PUBLIC"."ORDERS"."ID" AS "ID",
      "PUBLIC"."ORDERS"."TOTAL" AS "TOTAL"
    FROM
      "PUBLIC"."ORDERS"
  )
```

And this is proper because we want to limit the _outer_ query, not
internal queries.

* Remove limit from test expectation

* stupid trailing space

* another subquery test

* Use helper function to disable limit middleware

Co-authored-by: dan sutton <dan@dpsutton.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Automatically create PR on current release branch on merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect results from SQL query referencing saved question with more than 1,048,575 rows
2 participants