Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"dbt run" triggers identical information schema query for each model #660

Closed
clausherther opened this issue Feb 22, 2018 · 3 comments
Closed
Assignees
Labels
enhancement New feature or request

Comments

@clausherther
Copy link
Contributor

clausherther commented Feb 22, 2018

When running dbt with multiple models, dbt issues the same information_schema query for every model in the run, asking for a list of tables in the production schema like so:

select
  TABLE_NAME as name,
  TABLE_TYPE as type
from INFORMATION_SCHEMA.TABLES
where TABLE_SCHEMA in ('<DW Schema>')

Depending on the database, this can up to 1-2 seconds per model and can add a lot of overhead to a run.

Since these queries are called before any transformations are actually done, the queries should all return the same list of tables, so ideally this could be done just once and then cached.
Happy to add more context if needed.

cc: @drewbanin @jthandy

@drewbanin drewbanin self-assigned this Feb 22, 2018
@drewbanin
Copy link
Contributor

Thanks for the writeup @clausherther! We use this query to determine if each table exists, and if so, whether it's a view or a table.

I think you're totally right -- there's minimal risk involved with just running this query once at the beginning of the run. On something like Redshift this doesn't add too much time to the run, but it's definitely painful on Snowflake.

@drewbanin drewbanin added the enhancement New feature or request label Feb 22, 2018
@drewbanin
Copy link
Contributor

see also: #911

I retract my claim that

there's minimal risk involved with just running this query once at the beginning of the run

as it's a little bit complicated, but definitely doable. Check out the issue for more info. Snowflake is pretty easy, whereas supporting this correctly on postgres/redshift is a little bit more involved.

@drewbanin
Copy link
Contributor

This was fixed in #1025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants