MBQL Refactor: Combine various Field clauses into one new clause #14897

camsaul · 2021-02-19T18:40:07Z

Part of the #14057 wish list. This change has been on my wish list for years now, but I have an actual reason to do it -- implementing #9934 is super difficult without being able to attach arbitrary info for Field clauses.

Background

The original version of MBQL (known as the "Structured Query" language at the time, which was a little confusing) just referred to all fields by integer ID. e.g.

{:filter ["STARTS_WITH" 1 "abc"]}

That made clauses like this ambiguous:

{:filter ["=" 1 2]} ; is 2 a Field, or the number 2?

To clear up ambiguity and to let you filter Fields against Fields, we added the :field-id clause to make it explicit that you were referring to a Field, not an Integer:

{:filter [:= [:field-id 1] [:field-id 2]]} ; Field 1 == Field 2

We soon added a couple of new types of Field clauses, :fk-> and :datetime-field, both of which wrap the original :field-id:

;; refer to Field 2 from a different Table, use Field 1 from the current Table to perform the join
[:fk-> [:field-id 1] [:field-id 2]]

;; bucket Field 3 by month
[:datetime-field [:field 3] :month]

So far things were still pretty reasonable, the worst you'd have to do is something like

[:datetime-field [:fk-> [:field-id 1] [:field-id 2]] :month]

Things started to get out-of-hand IMO when we continued to add more Field clause types that just wrapped everything else. We added :binning-strategy:

[:binning-strategy <field> :num-buckets 10]

then :field-literal, to refer to a Field from a nested native source query:

[:field-literal "my_field" :type/Text]

then we added :joined-field to specify the Table you're explicitly joining against that is the source of a Field:

[:joined-field "my_join" [:field-id 4]]

This ends up getting really hairy when you combine things. This is an actual real clause you can use right now:

[:binning-strategy
 [:datetime-field
  [:joined-field "my_join"
   [:field-literal "my_field" :type/DateTimeWithLocalTZ]]
  :month]
 :num-bins 10]

And even with all of those clauses, we still can't pass around arbitrary extra information in a way that would make it easy to implement performance optimizations. If we ever try to add another new clause, the whole house of cards is going to come crashing down.

The proposal

Combine all of the Field clauses into a single new :field clauses with the schema

[:field id-or-name options-map]

Here are some before & after examples:

[:field-id 1] 
=> 
[:field 1 nil]

[:field-literal "my_field" :type/Text] 
=> 
[:field "my_field" {:base-type :type/Text}]

[:datetime-field [:field 1] :month] 
=> 
[:field 1 {:temporal-unit :month}]

[:binning-strategy [:field 1] :num-bins 10] 
=>
[:field 1 {:binning {:strategy :num-bins, :num-bins 10}}]

[:joined-field "my_join" [:field 1]] 
=> 
[:field 1 {:join-alias "my_join}]

[:fk-> [:field 1] [:field 2]] 
=> 
[:field 2 {:source-field 1}]

[:binning-strategy
 [:datetime-field
  [:joined-field "my_join"
   [:field-literal "my_field" :type/DateTimeWithLocalTZ]]
  :month]
 :num-bins 10]
=>
[:field "my_field" {:base-type :type/DateTimeWithLocalTZ, :join-alias "my_join", :binning {:strategy :num-bins, :num-bins 10}}]

camsaul · 2021-02-19T18:41:43Z

It's easy enough to "upgrade" the old-style clauses to the new clause on the backend -- I have this all working. I still need to update a lot of QP middleware to handle the new clause and update a lot tests as well -- there are about 400 backend tests left to update (which would take another day or so)

Co-authored-by: Paul Rosenzweig <paulrosenzweig@users.noreply.github.com>

camsaul · 2021-02-25T17:48:37Z

@paulrosenzweig RE sniffing stuff: there's two new methods in the new FieldDimension class, isIntegerFieldId() and isFieldStringName() that handle all the sniffing for everyone else... I'll have to try to make sure everything is using that instead of doing things themselves. I'm working on abstracting out the MBQL representation from all the other code so nobody else has to know about it. e.g. if we wanted to revert all the MBQL changes tomorrow we'd only have to change FieldDimension and none of the other non-test code

camsaul · 2021-02-25T17:57:21Z

@paulrosenzweig I added normalization coming out of the DB for all Card & DashboardCard visualization_settings, and it's covering everything I have locally correctly. It walks the whole map and normalized every old-style clause it finds. :parameters and related columns are already normalized on the way out of the DB. I think everything should be covered, but we can deploy to stats and make sure nothing is broken... part of the reason I wanted to get this refactor in at the beginning of the release cycle

paulrosenzweig · 2021-02-25T21:09:57Z

It walks the whole map and normalized every old-style clause it finds.

The JSON-serialized field refs might trip this up. Column formatting and pivot table partitions are two places where that happens.

It's probably worth having tests around some of those cases specifically. I expect some saved questions out in the wild will never be updated and rely on this migration continuing to work for years.

camsaul · 2021-02-25T21:48:15Z

test/metabase/models/card_test.clj

+;; this is a separate function so we can use the same tests for DashboardCards as well
+(defn test-visualization-settings-normalization [f]
+  (testing "visualization settings should get normalized to use modern MBQL syntax"
+    (testing "Field references in column settings"
+      (doseq [[original expected] {[:ref [:field-literal "foo" :type/Float]]
+                                   [:ref [:field "foo" {:base-type :type/Float}]]
+
+                                   [:ref [:field-id 1]]
+                                   [:ref [:field 1 nil]]
+
+                                   [:ref [:expression "wow"]]
+                                   [:ref [:expression "wow"]]}
+              ;; also check that normalization of already-normalized refs is idempotent
+              original [original expected]
+              ;; frontend uses JSON-serialized versions of the MBQL clauses as keys
+              :let     [original (json/generate-string original)
+                        expected (json/generate-string expected)]]
+        (testing (format "Viz settings field ref key %s should get normalized to %s"
+                         (pr-str original)
+                         (pr-str expected))
+          (f
+           {:column_settings {original {:currency "BTC"}}}
+           {:column_settings {expected {:currency "BTC"}}}))))
+
+    (testing "Other MBQL field clauses"
+      (let [original {:map.type                 "region"
+                      :map.region               "us_states"
+                      :pivot_table.column_split {:rows    [["datetime-field" ["field-id" 807] "year"]],
+                                                 :columns [["fk->" ["field-id" 805] ["field-id" 808]]],
+                                                 :values  [["aggregation" 0]]}}
+            expected {:map.type                 "region"
+                      :map.region               "us_states"
+                      :pivot_table.column_split {:rows    [[:field 807 {:temporal-unit :year}]]
+                                                 :columns [[:field 808 {:source-field 805}]]
+                                                 :values  [[:aggregation 0]]}}]
+        (f original expected)))
+
+    (testing "Don't normalize non-MBQL arrays"
+      (let [original {:graph.show_goal  true
+                      :graph.goal_value 5.9
+                      :graph.dimensions ["the_day"]
+                      :graph.metrics    ["total_per_day"]}]
+        (f original original)))
+
+    (testing "Don't normalize key-value pairs in maps that could be interpreted as MBQL clauses"
+      (let [original {:field-id 1}]
+        (f original original)))))
+
+(deftest normalize-visualization-settings-test
+  (test-visualization-settings-normalization
+   (fn [original expected]
+     (mt/with-temp Card [card {:visualization_settings original}]
+       (is (= expected
+              (db/select-one-field :visualization_settings Card :id (u/the-id card))))))))


@paulrosenzweig yup, I already added tests for normalization of serialized Card/DashboardCard field references as keys in viz settings here (specifically the json/generate-string part at the top). Are the serialized keys used anywhere else besides visualization_settings.column_settings?

Are the serialized keys used anywhere else besides visualization_settings.column_settings?

Sorry slow responding to that. I had thought both conditional formatting and pivot tables did, but I just checked and neither do! So thankfully I guess it's just column settings.

camsaul · 2021-02-25T22:28:12Z

@paulrosenzweig this is green no so I'm going to go ahead and merge it in so we don't end up in merge-conflict hell since the PR is so big. Once it's up on stats I'll try to go test things and make sure everything is working properly and add more normalization test cases/fixes as needed

camsaul added 8 commits February 18, 2021 18:18

Basic queries are working

638259c

Fix auto-bucket-datetimes

afe5655

Mostly fix add-implicit-joins

7e8c4ac

271/340 => 611

cb335dd

236/309 => 545

855f6f6

224/306 => 530

f8d19b1

213/227 => 440

73acd99

209/212 => 421

7fc28e3

camsaul added 21 commits February 19, 2021 12:00

187/198 => 385

d5051a3

177/171 => 348

91d4ca1

173/131 => 304

aaf2acf

159/107 => 266

87980e5

158/98 => 256

f50736d

146/97 => 243

53a50af

145/91 => 236

c34d0f3

131/66 => 197

357db50

[WIP]

3712a08

106/73 => 179

f6fbc38

100/68 => 168

870cbde

94/41 => 135

d56cfe7

51/10 => 61

c9dd945

34/10 => 44

8de90f0

Convert more tests to the new style

7d753bf

14/1 => 15

7f1d946

7/1 => 8

f796626

5/1 => 6

7c6d5ee

2/1 => 3

f6e6c88

2/0 => 2

4a9b13e

ALL OSS TESTS ARE PASSING :partyparrot:

85b556d

Update frontend/src/metabase-lib/lib/Dimension.js

be163a1

Co-authored-by: Paul Rosenzweig <paulrosenzweig@users.noreply.github.com>

Test fix 🔧

4aa6738

camsaul added 4 commits February 25, 2021 12:07

Don't normalize fingerprints

7760f26

More tests

a658711

Test fix 🔧

7af3bc1

Try setting SQL Server memory limit

4a714f3

camsaul removed request for ariya and daltojohnso February 25, 2021 20:22

camsaul added 2 commits February 25, 2021 12:58

Recursive (empty) field options normalization

1ef440a

More PR feedback

440f585

More test fixes 🔧

9d9a838

camsaul commented Feb 25, 2021

View reviewed changes

camsaul added this to the 0.39 milestone Feb 25, 2021

camsaul merged commit 6bdd5e0 into master Feb 25, 2021

camsaul deleted the new-field-clause branch February 25, 2021 22:28

This was referenced Feb 25, 2021

Minor MBQL Improvement Proposals #14057

Closed

MongoDB & temporal filter query optimizations #14829

Closed

jeff303 added a commit that referenced this pull request Feb 26, 2021

Fix test assertions in light of changes from PR #14897

a329893

jeff303 mentioned this pull request Feb 26, 2021

Fix test assertions in light of changes from PR #14897 #14984

Merged

jeff303 added a commit that referenced this pull request Feb 26, 2021

Fix test assertions in light of changes from PR #14897 (#14984)

8d5dd19

camsaul mentioned this pull request Mar 9, 2021

Optimize SQL Server GROUP BY and ORDER BY with :year, :month, or :day #15099

Merged

This was referenced Apr 5, 2021

Choosing a binning option for a date breakout on a joined column should work without additional clicks #15446

Closed

Fix subdimensions for a join table #15506

Closed

pawit-metabase mentioned this pull request Sep 7, 2021

dimension list: hide sub-dimensions for fields that are FK #17783

Merged

This was referenced Nov 9, 2022

Errors combining datetime interval addition with datetime functions #26279

Merged

Remove legacy MBQL syntax in tests #26309

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MBQL Refactor: Combine various Field clauses into one new clause #14897

MBQL Refactor: Combine various Field clauses into one new clause #14897

camsaul commented Feb 19, 2021

camsaul commented Feb 19, 2021

camsaul commented Feb 25, 2021

camsaul commented Feb 25, 2021

paulrosenzweig commented Feb 25, 2021

camsaul Feb 25, 2021 •

edited

paulrosenzweig Feb 26, 2021

camsaul commented Feb 25, 2021

MBQL Refactor: Combine various Field clauses into one new clause #14897

MBQL Refactor: Combine various Field clauses into one new clause #14897

Conversation

camsaul commented Feb 19, 2021

Background

The proposal

camsaul commented Feb 19, 2021

camsaul commented Feb 25, 2021

camsaul commented Feb 25, 2021

paulrosenzweig commented Feb 25, 2021

camsaul Feb 25, 2021 • edited

Choose a reason for hiding this comment

paulrosenzweig Feb 26, 2021

Choose a reason for hiding this comment

camsaul commented Feb 25, 2021

camsaul Feb 25, 2021 •

edited