Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise autocomplete query #38080

Merged
merged 10 commits into from Jan 26, 2024
Merged

Optimise autocomplete query #38080

merged 10 commits into from Jan 26, 2024

Conversation

romeovs
Copy link
Contributor

@romeovs romeovs commented Jan 24, 2024

Partially resolves #30588

Description

This PR implements some of the suggestions made in #30588 (comment), as far as toucan2 will take them.

This is what's implemented:

  1. Optimizing the auto-suggestion queries (at least 10x faster is needed):
    • A materialized view might work very well here, to be tested, or an index on a lower(metabase_field.name) (only for pgsql though)
    • Checking if the table is active in the WHERE,
    • Duplicating the .active checks in the JOIN ON … condition.
    • The LEFT JOIN is not necessary (and should be changed to a INNER JOIN)
  2. Lower the spam of queries with some trick. This is done in Debounce autocomplete #37852

How to verify

It's hard to verify the extreme memory usage locally, but we can at least verify these changes did not break the autocomplete endpoint:

  1. SQL QuerySample Dataset
  2. Start typing a query like SELECT SEA
  3. The autocompletion dropdown should return SEATS

Checklist

I've added no extra tests for this issue.

@romeovs romeovs self-assigned this Jan 24, 2024
@romeovs romeovs requested a review from camsaul as a code owner January 24, 2024 08:23
@romeovs romeovs requested a review from a team January 24, 2024 08:23
@romeovs romeovs added the backport Automatically create PR on current release branch on merge label Jan 24, 2024
Copy link

replay-io bot commented Jan 24, 2024

StatusComplete ↗︎
Commit1b262c6
Results
⚠️ 11 Flaky
2227 Passed

@romeovs romeovs requested a review from a team January 24, 2024 09:13
@piranha
Copy link
Contributor

piranha commented Jan 24, 2024

Would be nice to have some query which exhibits performance problems, right now looking at explain analyze is almost useless since it's so fast for everything I'm trying... Have you seen slowness like this on stats perhaps?

@piranha
Copy link
Contributor

piranha commented Jan 24, 2024

As for point number three, you can write :inner-join [[:metabase_table :table] [:and [:= :table.id :metabase_field.table_id] :table.active]] and this will work.

@piranha piranha force-pushed the 30588-optimise-autocomplete-query branch from e7ef5a8 to 732c888 Compare January 24, 2024 10:42
and I can't find the reason - but it has no major impacts on performance after adding an index, at least on stats
@piranha
Copy link
Contributor

piranha commented Jan 24, 2024

@metabase/core-backend-components this is ready for review

@qnkhuat
Copy link
Contributor

qnkhuat commented Jan 25, 2024

Could you add note on why the cache control change is needed?

@piranha
Copy link
Contributor

piranha commented Jan 25, 2024

@qnkhuat done, how'd you like it?

Comment on lines +624 to +633
{:status 200
;; Presumably user will repeat same prefixes many times writing the query,
;; so let them cache response to make autocomplete feel fast. 60 seconds
;; is not enough to be a nuisance when schema or permissions change. Cache
;; is user-specific since we're checking for permissions.
:headers {"Cache-Control" "public, max-age=60"
"Vary" "Cookie"}
:body (cond
substring (autocomplete-suggestions id (str "%" substring "%"))
prefix (autocomplete-suggestions id (str prefix "%")))}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm a sucker for small methods and avoiding mixing too many levels of abstraction. I'd break out a small with-user-specific-cache-header method which takes the body.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK, I'd rather this be "in the face" until a general pattern emerges...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rule of 3? Fine 😄

:limit limit}))
{:order-by [[[:lower :metabase_field.name] :asc]
[[:lower :table.name] :asc]]
:inner-join [[:metabase_table :table] [:and :table.active
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth commenting that it was (counter-intuitively) faster to have the "active" check within the join, in case someone comes to optimize again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, it felt intuitive for me - you get less data to join this way 😁 But sure, I can leave a comment there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably you're right, my intuition is quite rusty and I didn't give much thought to it 😆

@@ -551,10 +551,11 @@
:%lower.metabase_field/name [:like (u/lower-case-en search-string)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth commenting that an index on lower(name) dramatically speeds the query up in DBs that support it, but we left it for now since Maria and H2 do not support indexes on computed fields. Perhaps they'll catch up in future? I also wonder if there are other use cases for a lowercase username, we could always materialize this as another column.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually not a filter problem, but an ordering one (weirdly). On stats adding a ngram index sped up query for 80 to 20 ms, but adding a index (which is used for sorting) speeds it up to 10 ms (no difference if with or without ngram index). 🤯

@uladzimirdev uladzimirdev removed the request for review from a team January 25, 2024 10:25
@paoliniluis paoliniluis added no-backport Do not backport this PR to any branch and removed backport Automatically create PR on current release branch on merge labels Jan 26, 2024
Comment on lines +624 to +633
{:status 200
;; Presumably user will repeat same prefixes many times writing the query,
;; so let them cache response to make autocomplete feel fast. 60 seconds
;; is not enough to be a nuisance when schema or permissions change. Cache
;; is user-specific since we're checking for permissions.
:headers {"Cache-Control" "public, max-age=60"
"Vary" "Cookie"}
:body (cond
substring (autocomplete-suggestions id (str "%" substring "%"))
prefix (autocomplete-suggestions id (str prefix "%")))}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rule of 3? Fine 😄

@piranha piranha merged commit a35f41d into master Jan 26, 2024
112 checks passed
@piranha piranha deleted the 30588-optimise-autocomplete-query branch January 26, 2024 08:24
Copy link

@romeovs Did you forget to add a milestone to the issue for this PR? When and where should I add a milestone?

@piranha piranha added this to the 0.49 milestone Jan 26, 2024
npfitz pushed a commit that referenced this pull request Feb 5, 2024
For big databases autocomplete can be quite slow. At stats, query in regular form takes about 80 ms to run. Filtering out inactive tables and adding an index (pgsql-only unfortunately) for ordering clause brings that down to 10 ms.

Also I suspect that in a single session you'll hit similar autocomplete rules multiple time, so some client caching could help with performance.
sloansparger pushed a commit that referenced this pull request Feb 5, 2024
For big databases autocomplete can be quite slow. At stats, query in regular form takes about 80 ms to run. Filtering out inactive tables and adding an index (pgsql-only unfortunately) for ordering clause brings that down to 10 ms.

Also I suspect that in a single session you'll hit similar autocomplete rules multiple time, so some client caching could help with performance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-backport Do not backport this PR to any branch .Team/QueryingComponents
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Autosuggestion in native queries peaks the CPU on instances with > 1M fields
5 participants