Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mega QP context and entrypoint overhaul. Reduce QP hairiness by 30% #35465

Merged
merged 84 commits into from
Feb 15, 2024

Conversation

camsaul
Copy link
Member

@camsaul camsaul commented Nov 7, 2023

The overall goal of this PR is to simplify the surface area of the QP and make its behavior a little more obvious and easy to understand, clearly separate out different stages of the QP pipeline, and to improve performance by eliminating unneeded core.async nonsense.

Major changes:

  • eliminate various "userland" QP entrypoints; things like (qp/process-userland-query query) are replaced with (qp/process-query (qp/userland-query query))
  • unified qp.setup code does all the stuff needed to run any part of the QP -- resolve Database ID, set up QP store, bind driver/*driver*, bind database-local settings, etc. This is used at the top level and also in substeps like preprocess; duplicate calls no-op so there is no perf hit
  • move all preprocessing stuff to qp.preprocess; this can now be called completely independently of the larger QP
  • move post-processing stuff to qp.postprocess
  • move compilation stuff to qp.compile
  • move execution stuff to qp.execute
  • Remove QP context, and sync vs async stuff; replaced be a couple of dynamic vars.
  • Normal synchronous QP code now executes and returns results on the same thread, rather than executing in the background. Previously, it created a core.async promise channel and blocking until that channel got results from the background thread. Eliminate completely unneeded and hard-to-follow core.async nonsense
  • Eliminate almost all of the hairy and hard-to-follow code in qp.reducible
  • Add qp.schema namespace with Malli schemas for a handful of things. Add typechecking around more QP stuff. This caught a bunch of bugs already e.g. info maps accidentally being passed in where we meant to pass in context
  • Reduce QP overhead significantly -- (qp/process-query (mt/mbql-query :venues)) now takes 8.6ms on average on my computer compared to 10.2, which is 18% faster. Not sure 2ms is going to make a huge difference for anything besides test H2 databases, but when we do it ten thousand times it might speed up the test suite a nice amount

@metabase-bot metabase-bot bot added .Team/QueryProcessor :hammer_and_wrench: visual Run Percy visual testing labels Nov 7, 2023
@metabase-bot metabase-bot bot added the visual Run Percy visual testing label Feb 14, 2024
@camsaul camsaul removed the visual Run Percy visual testing label Feb 14, 2024
@camsaul camsaul requested a review from a team February 14, 2024 02:06
Copy link

replay-io bot commented Feb 14, 2024

Status Complete ↗︎
Commit e1b046f
Results
1 Failed
⚠️ 3 Flaky
2310 Passed

Copy link
Contributor

@bshepherdson bshepherdson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a PR!

I think the changes make a lot of sense. Dropping context for a handful of dynamic vars feels like a win, and dropping unnecessary async is a huge win.

Just some small edits and suggestions.

Comment on lines +113 to +116
(let [qp (ee.qp.perms/check-download-permissions
(fn [query _rff]
query))]
(qp query qp.reducible/default-rff)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't gone looking for other examples of this pattern, but it seems like the identity transformation here is likely to recur and might deserve a name in eg. qp.reducible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of too many other places we do this, maybe in a few other tests but in that case it probably belongs in test tooling. I'll try to look around for more

profile.clj Outdated Show resolved Hide resolved
was executed in QueryExecution and other places. It is considered bad form for middleware to change its behavior
based on this information, don't do it!"
;; TODO - this schema is somewhat misleading because if you use a function
;; like [[metabase.query-processor/userland-query]] some of these keys (e.g. `:context`) are in fact required
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is :context the best example of something actually-required now? I'm not sure if this is the same context as the context parameter getting dropped from many QP functions and replaced with dynamic vars.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah if things weren't confusing enough already, we had the :context key which was the context the query was getting executed in, like :pulse or :dashboard or whatever. It is actually required to save a QueryExecution

src/metabase/query_processor/pipeline.clj Outdated Show resolved Hide resolved
src/metabase/query_processor/setup.clj Outdated Show resolved Hide resolved
test/metabase/http_client.clj Show resolved Hide resolved
@metabase-bot metabase-bot bot added the visual Run Percy visual testing label Feb 14, 2024
src/metabase/async/util.clj Outdated Show resolved Hide resolved
(u/minutes->ms
(if config/is-prod?
20
3)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be going in the opposite direction we took (or wanted to take) with the API stuff: there we said we would prefer fewer dynamic vars, didn't we?

Maybe we can implement a function that we could call when debugging that would give the values of the "context" vars and also say if they have been redefined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah in general less dynamic variables is a win, but I think sometimes there is still a good use case for them. I played around with this a lot and in this case having a few dynamic vars makes it a simpler than passing around that context map in like 100 different functions. Maybe we can revisit it in the future tho, I want to get this PR in before it gets any more merge conflicts, maybe following on with more (smaller) tweaks makes sense

@camsaul camsaul enabled auto-merge (squash) February 15, 2024 02:16
@camsaul camsaul merged commit 80dcbdf into master Feb 15, 2024
104 of 105 checks passed
@camsaul camsaul deleted the qp-context-overhaul branch February 15, 2024 22:06
Copy link

@camsaul Did you forget to add a milestone to the issue for this PR? When and where should I add a milestone?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-backport Do not backport this PR to any branch .Team/QueryProcessor :hammer_and_wrench: visual Run Percy visual testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants