New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise autocomplete query #38080
Optimise autocomplete query #38080
Conversation
|
Would be nice to have some query which exhibits performance problems, right now looking at |
As for point number three, you can write |
e7ef5a8
to
732c888
Compare
and I can't find the reason - but it has no major impacts on performance after adding an index, at least on stats
@metabase/core-backend-components this is ready for review |
Could you add note on why the cache control change is needed? |
@qnkhuat done, how'd you like it? |
{:status 200 | ||
;; Presumably user will repeat same prefixes many times writing the query, | ||
;; so let them cache response to make autocomplete feel fast. 60 seconds | ||
;; is not enough to be a nuisance when schema or permissions change. Cache | ||
;; is user-specific since we're checking for permissions. | ||
:headers {"Cache-Control" "public, max-age=60" | ||
"Vary" "Cookie"} | ||
:body (cond | ||
substring (autocomplete-suggestions id (str "%" substring "%")) | ||
prefix (autocomplete-suggestions id (str prefix "%")))} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'm a sucker for small methods and avoiding mixing too many levels of abstraction. I'd break out a small with-user-specific-cache-header
method which takes the body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDK, I'd rather this be "in the face" until a general pattern emerges...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rule of 3? Fine 😄
:limit limit})) | ||
{:order-by [[[:lower :metabase_field.name] :asc] | ||
[[:lower :table.name] :asc]] | ||
:inner-join [[:metabase_table :table] [:and :table.active |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth commenting that it was (counter-intuitively) faster to have the "active" check within the join, in case someone comes to optimize again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, it felt intuitive for me - you get less data to join this way 😁 But sure, I can leave a comment there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably you're right, my intuition is quite rusty and I didn't give much thought to it 😆
@@ -551,10 +551,11 @@ | |||
:%lower.metabase_field/name [:like (u/lower-case-en search-string)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth commenting that an index on lower(name)
dramatically speeds the query up in DBs that support it, but we left it for now since Maria and H2 do not support indexes on computed fields. Perhaps they'll catch up in future? I also wonder if there are other use cases for a lowercase username, we could always materialize this as another column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually not a filter problem, but an ordering one (weirdly). On stats adding a ngram index sped up query for 80 to 20 ms, but adding a index (which is used for sorting) speeds it up to 10 ms (no difference if with or without ngram index). 🤯
{:status 200 | ||
;; Presumably user will repeat same prefixes many times writing the query, | ||
;; so let them cache response to make autocomplete feel fast. 60 seconds | ||
;; is not enough to be a nuisance when schema or permissions change. Cache | ||
;; is user-specific since we're checking for permissions. | ||
:headers {"Cache-Control" "public, max-age=60" | ||
"Vary" "Cookie"} | ||
:body (cond | ||
substring (autocomplete-suggestions id (str "%" substring "%")) | ||
prefix (autocomplete-suggestions id (str prefix "%")))} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rule of 3? Fine 😄
@romeovs Did you forget to add a milestone to the issue for this PR? When and where should I add a milestone? |
For big databases autocomplete can be quite slow. At stats, query in regular form takes about 80 ms to run. Filtering out inactive tables and adding an index (pgsql-only unfortunately) for ordering clause brings that down to 10 ms. Also I suspect that in a single session you'll hit similar autocomplete rules multiple time, so some client caching could help with performance.
For big databases autocomplete can be quite slow. At stats, query in regular form takes about 80 ms to run. Filtering out inactive tables and adding an index (pgsql-only unfortunately) for ordering clause brings that down to 10 ms. Also I suspect that in a single session you'll hit similar autocomplete rules multiple time, so some client caching could help with performance.
Partially resolves #30588
Description
This PR implements some of the suggestions made in #30588 (comment), as far as
toucan2
will take them.This is what's implemented:
A materialized view might work very well here, to be tested
, or an index on alower(metabase_field.name)
(only for pgsql though)Checking if the table is active in the WHERE
,Duplicating the .active checks in the JOIN ON … condition
.The LEFT JOIN is not necessary (and should be changed to a INNER JOIN)
Lower the spam of queries with some trick
. This is done in Debounce autocomplete #37852How to verify
It's hard to verify the extreme memory usage locally, but we can at least verify these changes did not break the autocomplete endpoint:
SQL Query
→Sample Dataset
SELECT SEA
SEATS
Checklist
I've added no extra tests for this issue.