-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster sync-fks #38970
Faster sync-fks #38970
Conversation
7917aa1
to
a99e552
Compare
|
1968bdc
to
26e9a44
Compare
@@ -714,6 +712,29 @@ | |||
results-metadata {:cols (column-metadata driver rsmeta)}] | |||
(respond results-metadata (reducible-rows driver rs rsmeta qp.pipeline/*canceled-chan*)))))))) | |||
|
|||
(defn sql->reducible-rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this when we have jdbc/reducible-query
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
high-level reason:
do-with-connection-with-options
,statement-or-prepared-statement
andexecute-statement-or-prepared-statement!
reuse logic that we have in place for preparing query statements. We should reuse that logic.
low-level reasons:
jdbc/reducible-query
doesn't usemetabase.driver.sql-jdbc.execute/prepared-statement
ormetabase.driver.sql-jdbc.execute/statement
, which does things like sets the fetch size andResultSet/TYPE_FORWARD_ONLY
jdbc/reducible-query
doesn't seem to work withjava.sql.Statement
for some reason, only strings orjava.sql.PreparedStatement
. We usejava.sql.Statement
by default instead ofjava.sql.PreparedStatement
, and though I'm not sure why, I figured we should use the same configuration as we have for normal queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a docstring and renamed the function to make the purpose a little clearer, and also nod towards it being similar to jdbc/reducible-query
metabase/src/metabase/driver/sql_jdbc/execute.clj
Lines 717 to 720 in 2d5975e
(defn simple-reducible-query | |
"Returns a reducible collection of rows as maps from `db` and a given SQL query. This is similar to [[jdbc/reducible-query]] but reuses the | |
driver-specific configuration for the Connection and Statement/PreparedStatement. This is slightly different from [[execute-reducible-query]] | |
in that it is not intended to be used as part of middleware. Keywordizes column names. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the name simple-reducible-query
, but I wanted to distinguish it as the simpler version of execute-reducible-query
in the same namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not call it reducible-query
?
and I would mention the difference with jdbc/reducible-query
in the docstring, I'm sure people will have the same question as I did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not call it reducible-query?
I wanted to distinguish it as the simpler version of execute-reducible-query
in the same namespace. That's not so obvious though. I will accept your suggestion :)
and I would mention the difference with jdbc/reducible-query in the docstring, I'm sure people will have the same question as I did.
I have included this already, though not in such detail:
metabase/src/metabase/driver/sql_jdbc/execute.clj
Lines 718 to 719 in 2d5975e
"Returns a reducible collection of rows as maps from `db` and a given SQL query. This is similar to [[jdbc/reducible-query]] but reuses the | |
driver-specific configuration for the Connection and Statement/PreparedStatement. This is slightly different from [[execute-reducible-query]] |
src/metabase/sync/util.clj
Outdated
|
||
(defn set-initial-table-sync-complete-for-db! | ||
"Marks initial sync for all tables in `db` as complete so that it becomes usable in the UI, if not already | ||
set" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set" | |
set." |
docstring need to be a complete sentence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I think this is good, but I still have some suggestions that need resolving.
Also CI is failing for redshift, not sure if it's related to this PR, looks like a flake tho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@calherries Did you forget to add a milestone to the issue for this PR? When and where should I add a milestone? |
This is part 1 towards resolving #38492
Part 2 is #38828
This PR adds an alternative implementation of the
sync-fks
step of sync where we sync the foreign key information of a database.It adds a new driver feature:
describe-fks
. If a driver supports this feature, they need to implement the new driver methodmetabase.driver/describe-fks
. If the driver doesn't support the feature, we'll continue to usemetabase.driver/describe-table-fks
, which has been deprecated. The plan for driver authors to implementdescribe-fks
in terms ofdriver/describe-table-fks
, which should be a really simple change for most drivers, assuming it doesn't need to be performant for large DBs.The way this works is instead of querying the customer DB one table at a time using the
getImportedKeys
JDBC method, we execute a single query that gets all the data at once with paginated results. We then reduce over those results (yay transducers), updating our App DB one foreign key at a time.Only redshift is implemented right now but its implementation is 22 LOC, so more drivers can be added easily.
I tested with the redshift dev instance using a local postgres app DB with a schema with 10 foreign keys over 20 tables.
I got these times from the sync logs:
Before: 11.5 seconds 🐌
After: 867.7 ms 🚀
I don't want to test any bigger DBs yet because they take a long time to sync their fields. The real perf tests will come with #38828
Low-level test coverage could probably be improved but there's so much that depends on sync I think it's pretty safe without adding more tests.