-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster sync-fields #38828
Faster sync-fields #38828
Conversation
|
9ee866a
to
b444854
Compare
[nil :database-is-auto-increment] ; only needed for actions, which redshift doesn't support yet | ||
[nil :database-required] ; only needed for actions, which redshift doesn't support yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably don't even need to return it as these fields are optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, 14fb6a5
(defn CollectionOf | ||
"Helper for creating schemas to check whether something is an instance of a collection." | ||
[item-schema] | ||
[:fn | ||
{:error/message (format "Collection of %s" item-schema)} | ||
#(and (coll? %) (every? (partial mc/validate item-schema) %))]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually dead code, but I think it's generally useful so I'm inclined to keep it around.
At the moment we have no way to type something that could be a set or a vector for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprised it's not a built-in!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting notified that diffs are out of date, flushing comments before I continue.
@@ -87,9 +121,15 @@ | |||
;; custom Redshift type handling | |||
|
|||
(def ^:private database-type->base-type | |||
(sql-jdbc.sync/pattern-based-database-type->base-type | |||
[[#"(?i)CHARACTER VARYING" :type/Text] ; Redshift uses CHARACTER VARYING (N) as a synonym for VARCHAR(N) | |||
[#"(?i)NUMERIC" :type/Decimal]])) ; and also has a NUMERIC(P,S) type, which is the same as DECIMAL(P,S) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Why not keep this comment on NUMERIC(P, S)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this got deleted by accident 9edacd0
(some-fn (sql-jdbc.sync/pattern-based-database-type->base-type | ||
[[#"(?i)CHARACTER VARYING" :type/Text] ; Redshift uses CHARACTER VARYING (N) as a synonym for VARCHAR(N) | ||
[#"(?i)NUMERIC" :type/Decimal]]) | ||
{:super :type/* ; (requested support in metabase#36642) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have a short comment here explaining how these "*" types are treated by the application, and that we only list these mappings here to disable warnings in the logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fear that type of comment could get out of date here. If you have any questions about type/*
it should be documented metabase.types
namespace. I see that it isn't, but I don't think I could do it justice if I tried right now
I'm mostly listing these types to document these types are deliberately not being handled, as opposed to accidentally not being handled. Removing the log warnings is a nice side-effect though.
@@ -241,8 +241,8 @@ | |||
view-nm)) | |||
(let [table-id (t2/select-one-pk Table :db_id (u/the-id database), :name view-nm)] | |||
;; and its columns' :base_type should have been identified correctly | |||
(is (= [{:name "numeric_col", :database_type "numeric(10,2)", :base_type :type/Decimal} | |||
{:name "weird_varchar", :database_type "character varying(50)", :base_type :type/Text}] | |||
(is (= [{:name "numeric_col", :database_type "numeric", :base_type :type/Decimal} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: are there any consequences to us no longer recording the size of the types?
nit: I've been fooled by bad GH rendering before, but it looks like :base_type
is unaligned now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I'm aware, we don't use the size / precision data from these strings anywhere.
.clj-kondo/config.edn
Outdated
@@ -355,7 +355,7 @@ | |||
metabase.sync.field-values field-values | |||
metabase.sync.interface i | |||
metabase.sync.schedules sync.schedules | |||
metabase.sync.sync-metadata.fields.fetch-metadata fetch-metadata | |||
metabase.sync.sync-metadata.fields.our-metadata our-metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metabase.sync.sync-metadata.fields.our-metadata our-metadata | |
metabase.sync.sync-metadata.fields.our-metadata fields.our-metadata |
without this prefix, it's hard to know whether it's fields metadata or table metadata or db metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. we don't use this namespace much so the extra characters doesn't have much downside
40df070
|
||
(mu/defn sync-fields-for-db! | ||
"Sync the Fields in the Metabase application database for a specific `table`." | ||
[database :- i/DatabaseInstance] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can make this take tables
and create a [schema name] -> table
mapping that you can use inside the transducer to get the table.
Always good to reduce the number of DB calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also bothered by the N+1 table queries, but ended up not commenting as for large N we'd rather not fetch all the tables into memory.
It seems unfortunate to me that there's a deep nest of methods to require a full table model, because in practice they're mostly using the id, name, and schema mostly. I suspect we could just add one or two more table fields to the educed rows and get rid of this query, but it would be brittle to use a partially populated instance, and refactoring all those methods to depend only on their required table fields would be pretty hairy and ugly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What Chris said, I'm fetching each table because I'm avoiding fetching all the tables into memory. The good thing is app DB queries should be quite cheap compared to customer DB queries, so this is still a huge win.
In the future we can avoid the N+1 by doing a refactoring like the one I did for sync-fks
as I did here https://github.com/metabase/metabase/pull/39679/files#diff-e09c063342055336a262f66623a6a1783a89871e4426c33c92f8f9b5f2bcaaec
But I don't think it's worth it right now. That refactoring took longer than I thought because of the quirks between different app DBs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, that makes sense; in that case, you need to "not" fetch all tables in metabase.sync.sync-metadata.fields line 117
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see why you made that comment now. Nice spot. Yes that was left over from a previous version where I was doing what you described.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done 👌
(defn CollectionOf | ||
"Helper for creating schemas to check whether something is an instance of a collection." | ||
[item-schema] | ||
[:fn | ||
{:error/message (format "Collection of %s" item-schema)} | ||
#(and (coll? %) (every? (partial mc/validate item-schema) %))]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprised it's not a built-in!
@@ -4,6 +4,16 @@ title: Driver interface changelog | |||
|
|||
# Driver Interface Changelog | |||
|
|||
## Metabase 0.49.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, didn't you tell me that we don't introduce driver methods in the patch releases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, maybe I did say that :) I can't remember how I justified it though, probably because we've never done it before. But putting it in the 0.49.0 section doesn't make much sense.
@qnkhuat can you take another look? It would be nice to get this in 49.0. It could be if it were approved in the next few hours I think. If it does, I'll update the changelog in the backport PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolve this then it LGTM!
@calherries Did you forget to add a milestone to the issue for this PR? When and where should I add a milestone? |
This is part 2 towards resolving #38492
Part 1 is #38970
This PR adds an alternative implementation of the sync-fields step of sync where we sync the foreign key information of a database.
It adds a new driver feature:
describe-fields
. If a driver supports this feature, they need to implement the new driver methodmetabase.driver/describe-fields
. If the driver doesn't support the feature, we'll continue to usemetabase.driver/describe-table
. The plan for driver authors to implementdescribe-fields
in terms ofdriver/describe-table
, which should be a really simple change for most drivers, assuming it doesn't need to be performant for large DBs.The way this works is instead of querying the customer DB one table at a time using the
getColumns
JDBC method, we execute a single query that gets all the data at once with paginated results. We then reduce over those results, updating our App DB one foreign key at a time. This approach mirrors #38970I've tested this with the "massive" redshift instance, and this is how long each sync step took:
Before this PR,
sync-fields
took hours (too long to bother measuring precisely anyway).