Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MBQL Refactor: Combine various Field clauses into one new clause #14897

Merged
merged 62 commits into from Feb 25, 2021

Conversation

camsaul
Copy link
Member

@camsaul camsaul commented Feb 19, 2021

Part of the #14057 wish list. This change has been on my wish list for years now, but I have an actual reason to do it -- implementing #9934 is super difficult without being able to attach arbitrary info for Field clauses.

Background

The original version of MBQL (known as the "Structured Query" language at the time, which was a little confusing) just referred to all fields by integer ID. e.g.

{:filter ["STARTS_WITH" 1 "abc"]}

That made clauses like this ambiguous:

{:filter ["=" 1 2]} ; is 2 a Field, or the number 2?

To clear up ambiguity and to let you filter Fields against Fields, we added the :field-id clause to make it explicit that you were referring to a Field, not an Integer:

{:filter [:= [:field-id 1] [:field-id 2]]} ; Field 1 == Field 2

We soon added a couple of new types of Field clauses, :fk-> and :datetime-field, both of which wrap the original :field-id:

;; refer to Field 2 from a different Table, use Field 1 from the current Table to perform the join
[:fk-> [:field-id 1] [:field-id 2]]

;; bucket Field 3 by month
[:datetime-field [:field 3] :month]

So far things were still pretty reasonable, the worst you'd have to do is something like

[:datetime-field [:fk-> [:field-id 1] [:field-id 2]] :month]

Things started to get out-of-hand IMO when we continued to add more Field clause types that just wrapped everything else. We added :binning-strategy:

[:binning-strategy <field> :num-buckets 10]

then :field-literal, to refer to a Field from a nested native source query:

[:field-literal "my_field" :type/Text]

then we added :joined-field to specify the Table you're explicitly joining against that is the source of a Field:

[:joined-field "my_join" [:field-id 4]]

This ends up getting really hairy when you combine things. This is an actual real clause you can use right now:

[:binning-strategy
 [:datetime-field
  [:joined-field "my_join"
   [:field-literal "my_field" :type/DateTimeWithLocalTZ]]
  :month]
 :num-bins 10]

And even with all of those clauses, we still can't pass around arbitrary extra information in a way that would make it easy to implement performance optimizations. If we ever try to add another new clause, the whole house of cards is going to come crashing down.

The proposal

Combine all of the Field clauses into a single new :field clauses with the schema

[:field id-or-name options-map]

Here are some before & after examples:

[:field-id 1] 
=> 
[:field 1 nil]

[:field-literal "my_field" :type/Text] 
=> 
[:field "my_field" {:base-type :type/Text}]

[:datetime-field [:field 1] :month] 
=> 
[:field 1 {:temporal-unit :month}]

[:binning-strategy [:field 1] :num-bins 10] 
=>
[:field 1 {:binning {:strategy :num-bins, :num-bins 10}}]

[:joined-field "my_join" [:field 1]] 
=> 
[:field 1 {:join-alias "my_join}]

[:fk-> [:field 1] [:field 2]] 
=> 
[:field 2 {:source-field 1}]

[:binning-strategy
 [:datetime-field
  [:joined-field "my_join"
   [:field-literal "my_field" :type/DateTimeWithLocalTZ]]
  :month]
 :num-bins 10]
=>
[:field "my_field" {:base-type :type/DateTimeWithLocalTZ, :join-alias "my_join", :binning {:strategy :num-bins, :num-bins 10}}]

@camsaul
Copy link
Member Author

camsaul commented Feb 19, 2021

It's easy enough to "upgrade" the old-style clauses to the new clause on the backend -- I have this all working. I still need to update a lot of QP middleware to handle the new clause and update a lot tests as well -- there are about 400 backend tests left to update (which would take another day or so)

Co-authored-by: Paul Rosenzweig <paulrosenzweig@users.noreply.github.com>
@camsaul
Copy link
Member Author

camsaul commented Feb 25, 2021

@paulrosenzweig RE sniffing stuff: there's two new methods in the new FieldDimension class, isIntegerFieldId() and isFieldStringName() that handle all the sniffing for everyone else... I'll have to try to make sure everything is using that instead of doing things themselves. I'm working on abstracting out the MBQL representation from all the other code so nobody else has to know about it. e.g. if we wanted to revert all the MBQL changes tomorrow we'd only have to change FieldDimension and none of the other non-test code

@camsaul
Copy link
Member Author

camsaul commented Feb 25, 2021

@paulrosenzweig I added normalization coming out of the DB for all Card & DashboardCard visualization_settings, and it's covering everything I have locally correctly. It walks the whole map and normalized every old-style clause it finds. :parameters and related columns are already normalized on the way out of the DB. I think everything should be covered, but we can deploy to stats and make sure nothing is broken... part of the reason I wanted to get this refactor in at the beginning of the release cycle

@paulrosenzweig
Copy link
Contributor

It walks the whole map and normalized every old-style clause it finds.

The JSON-serialized field refs might trip this up. Column formatting and pivot table partitions are two places where that happens.

It's probably worth having tests around some of those cases specifically. I expect some saved questions out in the wild will never be updated and rely on this migration continuing to work for years.

Comment on lines +214 to +267
;; this is a separate function so we can use the same tests for DashboardCards as well
(defn test-visualization-settings-normalization [f]
(testing "visualization settings should get normalized to use modern MBQL syntax"
(testing "Field references in column settings"
(doseq [[original expected] {[:ref [:field-literal "foo" :type/Float]]
[:ref [:field "foo" {:base-type :type/Float}]]

[:ref [:field-id 1]]
[:ref [:field 1 nil]]

[:ref [:expression "wow"]]
[:ref [:expression "wow"]]}
;; also check that normalization of already-normalized refs is idempotent
original [original expected]
;; frontend uses JSON-serialized versions of the MBQL clauses as keys
:let [original (json/generate-string original)
expected (json/generate-string expected)]]
(testing (format "Viz settings field ref key %s should get normalized to %s"
(pr-str original)
(pr-str expected))
(f
{:column_settings {original {:currency "BTC"}}}
{:column_settings {expected {:currency "BTC"}}}))))

(testing "Other MBQL field clauses"
(let [original {:map.type "region"
:map.region "us_states"
:pivot_table.column_split {:rows [["datetime-field" ["field-id" 807] "year"]],
:columns [["fk->" ["field-id" 805] ["field-id" 808]]],
:values [["aggregation" 0]]}}
expected {:map.type "region"
:map.region "us_states"
:pivot_table.column_split {:rows [[:field 807 {:temporal-unit :year}]]
:columns [[:field 808 {:source-field 805}]]
:values [[:aggregation 0]]}}]
(f original expected)))

(testing "Don't normalize non-MBQL arrays"
(let [original {:graph.show_goal true
:graph.goal_value 5.9
:graph.dimensions ["the_day"]
:graph.metrics ["total_per_day"]}]
(f original original)))

(testing "Don't normalize key-value pairs in maps that could be interpreted as MBQL clauses"
(let [original {:field-id 1}]
(f original original)))))

(deftest normalize-visualization-settings-test
(test-visualization-settings-normalization
(fn [original expected]
(mt/with-temp Card [card {:visualization_settings original}]
(is (= expected
(db/select-one-field :visualization_settings Card :id (u/the-id card))))))))
Copy link
Member Author

@camsaul camsaul Feb 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulrosenzweig yup, I already added tests for normalization of serialized Card/DashboardCard field references as keys in viz settings here (specifically the json/generate-string part at the top). Are the serialized keys used anywhere else besides visualization_settings.column_settings?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the serialized keys used anywhere else besides visualization_settings.column_settings?

Sorry slow responding to that. I had thought both conditional formatting and pivot tables did, but I just checked and neither do! So thankfully I guess it's just column settings.

@camsaul camsaul added this to the 0.39 milestone Feb 25, 2021
@camsaul
Copy link
Member Author

camsaul commented Feb 25, 2021

@paulrosenzweig this is green no so I'm going to go ahead and merge it in so we don't end up in merge-conflict hell since the PR is so big. Once it's up on stats I'll try to go test things and make sure everything is working properly and add more normalization test cases/fixes as needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants