Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FE client incorrectly using [:field <string>] clauses to refer to fields inside MBQL source queries #19757

Closed
camsaul opened this issue Jan 18, 2022 · 1 comment
Labels
Difficulty:Medium .Frontend Querying/Nested Queries Questions based on other saved questions Querying/Parameters & Variables Filter widgets, field filters, variables etc. .Team/QueryProcessor :hammer_and_wrench: Type:Bug Product defects .Wanted: MLv2 Issues that will be fixed (or easier to fix, or possible to fix) when we have MLv2
Milestone

Comments

@camsaul
Copy link
Member

camsaul commented Jan 18, 2022

This is incorrect and has historically caused all sorts of bugs (#14809, #14810, #14811, #16389, and more) From my comments here #16389 (comment):

This MBQL Query sent by the frontend looks something like this:

{:database 1
 :type     :query
 :query    {:source-query {:source-table 2
                           :aggregation  [[:count]],
                           :breakout     [[:field 3 {:source-field 4, :temporal-unit :month}]]}
            :filter       [:time-interval [:field "created_at" {:base-type :type/DateTimeWithLocalTZ}] -30 :year]
            :aggregation  [[:sum [:field "count" {:base-type :type/Integer}]]],
            :breakout     [[:field "created_at" {:base-type :type/DateTimeWithLocalTZ}]],
            :limit        1}}

This query is incorrect.

  1. :field clauses with a String name tell the Query Processor to refer to a column in the source query that literally has that exact name. You can only know this exact name if you have query metadata, either from running a query or using it as a saved question. We haven't ran the query yet in this situation.

  2. The generated query is asking telling the Query Processor to literally breakout on the created_at column in the source query. Unfortunately, that is not necessarily alias actually be using for that column in the query we generate. You cannot assume the alias for a column will be the same as the Field :name. The generated alias depends on a variety of factors, including:

    The only way to know the appropriate string alias to use is to look at source query metadata.

  3. :field clauses with a String name should only be used to refer to columns in a native source query, or generated columns (i.e., the result of an aggregation or expression). There's no associated Field for native source queries or generated columns, so you have to refer to fields by the literal alias used in the source query. That's the only time you should use it. Use integer Field IDs otherwise! It's much easier for use to generate correct queries when we know as much information as possible about a Field, and integer Field IDs let us know everything we've synced from the application database.

  4. created_at isn't even the name of the original Field in this case. The Field is CREATED_AT (as you can see in the SQL below). Assuming Field names is one thing, but assuming that they will be lower-cased or something like that is even worse.

Here's the SQL we're generating:

SELECT 
  CAST("source"."created_at" AS date) AS "created_at",
  sum("source"."count") AS "sum" 
FROM 
  (
  SELECT 
    parsedatetime(formatdatetime("PRODUCTS__via__PRODUCT_ID"."CREATED_AT", 'yyyyMM'), 'yyyyMM') AS "PRODUCTS__via__PRODUCT_ID__CREATED_AT",
    count(*) AS "count" 
  FROM 
    "PUBLIC"."ORDERS" 
  LEFT JOIN "PUBLIC"."PRODUCTS" "PRODUCTS__via__PRODUCT_ID" ON "PUBLIC"."ORDERS"."PRODUCT_ID" = "PRODUCTS__via__PRODUCT_ID"."ID" 
  GROUP BY 
    parsedatetime(formatdatetime("PRODUCTS__via__PRODUCT_ID"."CREATED_AT", 'yyyyMM'), 'yyyyMM') 
  ORDER BY 
    parsedatetime(formatdatetime("PRODUCTS__via__PRODUCT_ID"."CREATED_AT", 'yyyyMM'), 'yyyyMM') ASC) "source" 
WHERE ("source"."created_at" >= parsedatetime(year(dateadd('year', CAST(-30 AS long), now())), 'yyyy') AND "source"."created_at" < parsedatetime(year(now()), 'yyyy')) 
GROUP BY 
  CAST("source"."created_at" AS date) 
ORDER BY 
  CAST("source"."created_at" AS date) ASC 
LIMIT 1

This query of course does not work, because there is no created_at column returned by source query.

How to fix?

We actually have logic on the backend already that attempts to detect and fix usage of :field clauses using String names where :field clauses with Integer IDs would have been appropriate. This only works if the :field name matches the actual Field name in the DB, however. Since the name here doesn't match, the fix doesn't work.

I am going to begrudgingly extend the auto-fix logic so it fixes this case too, but I wanted to bring these points up since these sorts of best-guess fixes can end up breaking other things. For example, queries that have native source queries whose column names match those of columns from joined tables are liable to break since [:field <name>] might mean source.<name>, or it might mean some_join.<name>.

At some point we'll actually have to fix this on the frontend instead of writing more and more code on the backend to try to work around broken queries. The fix on the frontend is easy: just use [:field <integer> ...] clauses to refer to Fields coming back from MBQL source queries. The backend QP can take it from there and do the right thing.

I've been trying to just figure out what the FE client meant and rewrite the query accordingly, first in #14812 and now in #19713; this is easier said than done because the FE client is choosing names different than either the actual name of the underlying Field or the actual name of the column as returned by the source query.

The frontend client should use the :field_ref that comes back with query results metadata (the Field ref is the canonical way to refer to a Field), or, failing that, just use the Field ID directly -- it's much easier for us to reconcile Fields when we actually know what Field it is we're talking about, and integer IDs are much easier to reconcile than lower-cased versions of Field names (as we saw in #19713).

If that proves too challenging, why are we even lower casing names on the frontend at all? #14812 would have been sufficient to prevent #16389 if the actual original name of the Field was used. 😖

@camsaul camsaul added Type:Bug Product defects .Needs Triage labels Jan 18, 2022
@camsaul camsaul changed the title FE client incorrectly using [:field <string>] clauses to refer to fields inside MBQL source quieres FE client incorrectly using [:field <string>] clauses to refer to fields inside MBQL source queries Jan 18, 2022
camsaul added a commit that referenced this issue Jan 25, 2022
@flamber flamber added .Frontend Querying/Nested Queries Questions based on other saved questions Querying/Parameters & Variables Filter widgets, field filters, variables etc. and removed .Needs Triage labels Jan 26, 2022
@ariya ariya self-assigned this Apr 26, 2022
@darksciencebase darksciencebase added the .Team/QueryProcessor :hammer_and_wrench: label Mar 22, 2023
@camsaul
Copy link
Member Author

camsaul commented May 24, 2023

Closing this out as fixed by the Metabase lib v2 project

Fixed by #28689

@camsaul camsaul closed this as completed May 24, 2023
@camsaul camsaul added this to the 0.47 milestone May 24, 2023
@darksciencebase darksciencebase added the .Wanted: MLv2 Issues that will be fixed (or easier to fix, or possible to fix) when we have MLv2 label May 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Difficulty:Medium .Frontend Querying/Nested Queries Questions based on other saved questions Querying/Parameters & Variables Filter widgets, field filters, variables etc. .Team/QueryProcessor :hammer_and_wrench: Type:Bug Product defects .Wanted: MLv2 Issues that will be fixed (or easier to fix, or possible to fix) when we have MLv2
Projects
None yet
Development

No branches or pull requests

4 participants