Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn for unused resource (model/seeds) config paths in project file #725

Closed
wants to merge 3 commits into from
Closed

Warn for unused resource (model/seeds) config paths in project file #725

wants to merge 3 commits into from

Conversation

clrcrl
Copy link
Contributor

@clrcrl clrcrl commented Apr 5, 2018

Finds all model configurations specified in all projects (including installed packages) and checks whether they are applied to at least one model.
If they are not applied (i.e. "unused"), a warning is logged.
Addresses #691

@drewbanin drewbanin self-requested a review April 5, 2018 00:25
@drewbanin
Copy link
Contributor

Thanks for submitting this PR @clrcrl! We've wanted this feature for a long time -- I'm certain it will cut down on misconfigured projects immensely.

I'll add some comments here shortly, but the overall approach looks pretty good to me! Nice work 👍

Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clrcrl happy to discuss any of these comments! I'd also like to discuss that recursive function... it looks really good, but I wonder if we can implement it without mutating the pqns argument. I'll do some experimenting and follow up here

dbt/loader.py Outdated
@@ -23,7 +24,17 @@ def load_all(cls, root_project, all_projects):
loader.load_all(root_project, all_projects, macros))

to_return[subgraph] = subgraph_nodes


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you swing this logic into the compiler.py file over here?

I think rather than implementing this warning in the GraphLoader class, it would make more sense to check for unused configs from the Compilation class. That way, the load_all function here can be called (eg. as an api function) without a warning showing up every time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume you meant to write "compilation.py file"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's right, good catch :)

dbt/config.py Outdated
unused_pqns = get_unused_pqns(pqns, fqns)
if len(unused_pqns) == 0:
return
logger.info(dbt.ui.printer.yellow("WARNING: Model configurations exist in your dbt_project.yml file which do not apply to any models. There are {} unused model configurations:".format(len(unused_pqns))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to figure out how to make logger.warn automatically print things in yellow text. I'm sure there's a way to do it... let me know if that's something you're interested in tackling. Possibly in a different PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a separate issue/PR, as I'd also want to have logger.debug print in red text, and then go back and clean up where the color is being directly called (example), and possibly remove green/yellow/red functions from printer.py

dbt/loader.py Outdated

project_model_config_pqns = []
for project_name, project in all_projects.items():
project_model_config_pqns.extend(dbt.config.get_project_model_config_pqns(project))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just assume that project['models'] will exist. Or, if you want to be extra cautious, you can do project.get('models', {}). No need for the helper function I don't think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this bit of code to only read the root_project instead of all_projects.

I think the helper function is still required because it's un-nesting the config into the list of lists.

dbt/loader.py Outdated


model_fqns = []
for unique_id, node in to_return['nodes'].items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do this as a little one-liner with a list comprehension. It will save ~2 lines of code, but the real benefit is that the logic is more self-evident. I think the same is true for project_model_config_pqns below

@clrcrl
Copy link
Contributor Author

clrcrl commented Apr 5, 2018

@drewbanin - addressed most of your comments.
Agreed on the recursive function front. Even if we can't figure out a way to implement it without mutating the pqns argument, it still feels like it can be tidied up! (Especially since codeclimate is complaining about it)

@clrcrl
Copy link
Contributor Author

clrcrl commented Apr 5, 2018

To do:
Add unit tests here, e.g. test__warning_for_unused_configs:

  • a positive test, i.e. one test that returns 0 unused configs
  • a negative test, i.e. one test that returns > 0 unused configs

@clrcrl
Copy link
Contributor Author

clrcrl commented Apr 6, 2018

Another few To-Do:

  • Correct grammar for warning for a single error: "A model configuration exists ... There is 1 unused model configuration"
  • When there are > 4 models, warn about first three and X others
    e.g.`
WARNING: Model configurations exist in your dbt_project.yml file which do not apply to any models. There are 6 unused model configurations:
 - experiments.base
 - experiments.tables
 - experiments.views
 and 3 others

This error occurred because I had:

models:
  my_package_name:
    ...
  experiments: #not indented enough
    base:
      ...
    tables:
      ...
    views:
      ...

instead of

models:
  my_package_name:
    ...
    experiments:
      base:
        ...
      tables:
        ...
      views:
        ...
      ...

Actually a thought here - if the config is wrong because of a high level nesting issue, do we just want to warn about that level, rather than every single pqn. Especially since it's the one mistake that has affected multiple configs.

WARNING: Model configuration(s) exist in your dbt_project.yml file which do not apply to any models. There is 1 unused model configuration:
 - experiments

Edit: discussed the above with a colleague, and he said it's better to have lots of errors resulting from one mistake, than try and get fancy about reporting the single mistake. So will leave that as is.

@clrcrl
Copy link
Contributor Author

clrcrl commented Apr 30, 2018

^ ended up not implementing the above to-dos around making the warning message friendlier. Can do if you think it is valuable.

That recursive function still probably needs some love.

Over to you @drewbanin

@clrcrl clrcrl changed the title Warn for unused model configs in project file Warn for unused resource (model/seeds) config paths in project file Jul 22, 2018
@clrcrl
Copy link
Contributor Author

clrcrl commented Jul 22, 2018

Now addresses #784.
Keen to get feedback on this!

@clrcrl clrcrl closed this Jul 22, 2018
@clrcrl clrcrl deleted the check-model-configs branch July 22, 2018 05:41
@clrcrl clrcrl restored the check-model-configs branch July 22, 2018 05:42
@clrcrl
Copy link
Contributor Author

clrcrl commented Jul 22, 2018

^ ignore that, tried to rename the remote branch, which ended up closing the PR/deleting the branch 🤷. Apols!

@clrcrl clrcrl reopened this Jul 22, 2018
@drewbanin
Copy link
Contributor

@clrcrl I've answered a bunch of support questions over the past couple of days relating to issues with people's dbt_project.yml files. I'd love to figure out how to get this merged!

dbt has changed a bunch in the past few months, so I think this might need a rebase against development. Do you anticipate having more time to work on this in the next few weeks (/months)? If not, we could pick up where you left off here. Let me know what you're thinking

@clrcrl
Copy link
Contributor Author

clrcrl commented Aug 28, 2018

@drewbanin – couldn't get the integration tests running locally (docker issues), but used this to check my tests:

(dbt-dev) $ python -m unittest -q test/unit/test_config.py
----------------------------------------------------------------------
Ran 6 tests in 0.022s

OK

Expecting it to fail codeclimate again though

@clrcrl
Copy link
Contributor Author

clrcrl commented Aug 28, 2018

Nope - don't look at this yet. Just realised there's an error in it

@clrcrl
Copy link
Contributor Author

clrcrl commented Aug 28, 2018

OK should be working now.
I think I need to write some better tests though. I feel like it might be a good opportunity to pair for this part since this is part of writing code that I want to get a whole lot better at.

@beckjake
Copy link
Contributor

This looks like a great feature! We're going to look at merging this after #973 - I'll rebase your branch off the merged branch and fix it up, since that PR changes a lot of configuration related stuff and nobody else should have to suffer through that. Once that's done, I'd be happy to pair with you on some tests!

@beckjake beckjake mentioned this pull request Sep 26, 2018
@drewbanin
Copy link
Contributor

Fixed in #1027 -- closing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants