New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn for unused resource (model/seeds) config paths in project file #725

Closed
wants to merge 3 commits into
base: development
from

Conversation

Projects
None yet
3 participants
@clrcrl
Copy link
Contributor

clrcrl commented Apr 5, 2018

Finds all model configurations specified in all projects (including installed packages) and checks whether they are applied to at least one model.
If they are not applied (i.e. "unused"), a warning is logged.
Addresses #691

@drewbanin drewbanin self-requested a review Apr 5, 2018

@drewbanin

This comment has been minimized.

Copy link
Contributor

drewbanin commented Apr 5, 2018

Thanks for submitting this PR @clrcrl! We've wanted this feature for a long time -- I'm certain it will cut down on misconfigured projects immensely.

I'll add some comments here shortly, but the overall approach looks pretty good to me! Nice work 👍

@drewbanin
Copy link
Contributor

drewbanin left a comment

@clrcrl happy to discuss any of these comments! I'd also like to discuss that recursive function... it looks really good, but I wonder if we can implement it without mutating the pqns argument. I'll do some experimenting and follow up here

@@ -23,7 +24,17 @@ def load_all(cls, root_project, all_projects):
loader.load_all(root_project, all_projects, macros))

to_return[subgraph] = subgraph_nodes


This comment has been minimized.

@drewbanin

drewbanin Apr 5, 2018

Contributor

Can you swing this logic into the compiler.py file over here?

I think rather than implementing this warning in the GraphLoader class, it would make more sense to check for unused configs from the Compilation class. That way, the load_all function here can be called (eg. as an api function) without a warning showing up every time.

This comment has been minimized.

@clrcrl

clrcrl Apr 5, 2018

Author Contributor

Assume you meant to write "compilation.py file"?

This comment has been minimized.

@drewbanin

drewbanin Apr 5, 2018

Contributor

yeah, that's right, good catch :)

unused_pqns = get_unused_pqns(pqns, fqns)
if len(unused_pqns) == 0:
return
logger.info(dbt.ui.printer.yellow("WARNING: Model configurations exist in your dbt_project.yml file which do not apply to any models. There are {} unused model configurations:".format(len(unused_pqns))))

This comment has been minimized.

@drewbanin

drewbanin Apr 5, 2018

Contributor

I'd like to figure out how to make logger.warn automatically print things in yellow text. I'm sure there's a way to do it... let me know if that's something you're interested in tackling. Possibly in a different PR

This comment has been minimized.

@clrcrl

clrcrl Apr 5, 2018

Author Contributor

Probably a separate issue/PR, as I'd also want to have logger.debug print in red text, and then go back and clean up where the color is being directly called (example), and possibly remove green/yellow/red functions from printer.py


project_model_config_pqns = []
for project_name, project in all_projects.items():
project_model_config_pqns.extend(dbt.config.get_project_model_config_pqns(project))

This comment has been minimized.

@drewbanin

drewbanin Apr 5, 2018

Contributor

you can just assume that project['models'] will exist. Or, if you want to be extra cautious, you can do project.get('models', {}). No need for the helper function I don't think

This comment has been minimized.

@clrcrl

clrcrl Apr 5, 2018

Author Contributor

I've updated this bit of code to only read the root_project instead of all_projects.

I think the helper function is still required because it's un-nesting the config into the list of lists.



model_fqns = []
for unique_id, node in to_return['nodes'].items():

This comment has been minimized.

@drewbanin

drewbanin Apr 5, 2018

Contributor

You can do this as a little one-liner with a list comprehension. It will save ~2 lines of code, but the real benefit is that the logic is more self-evident. I think the same is true for project_model_config_pqns below

@clrcrl

This comment has been minimized.

Copy link
Contributor Author

clrcrl commented Apr 5, 2018

@drewbanin - addressed most of your comments.
Agreed on the recursive function front. Even if we can't figure out a way to implement it without mutating the pqns argument, it still feels like it can be tidied up! (Especially since codeclimate is complaining about it)

@clrcrl

This comment has been minimized.

Copy link
Contributor Author

clrcrl commented Apr 5, 2018

To do:
Add unit tests here, e.g. test__warning_for_unused_configs:

  • a positive test, i.e. one test that returns 0 unused configs
  • a negative test, i.e. one test that returns > 0 unused configs
@clrcrl

This comment has been minimized.

Copy link
Contributor Author

clrcrl commented Apr 6, 2018

Another few To-Do:

  • Correct grammar for warning for a single error: "A model configuration exists ... There is 1 unused model configuration"
  • When there are > 4 models, warn about first three and X others
    e.g.`
WARNING: Model configurations exist in your dbt_project.yml file which do not apply to any models. There are 6 unused model configurations:
 - experiments.base
 - experiments.tables
 - experiments.views
 and 3 others

This error occurred because I had:

models:
  my_package_name:
    ...
  experiments: #not indented enough
    base:
      ...
    tables:
      ...
    views:
      ...

instead of

models:
  my_package_name:
    ...
    experiments:
      base:
        ...
      tables:
        ...
      views:
        ...
      ...

Actually a thought here - if the config is wrong because of a high level nesting issue, do we just want to warn about that level, rather than every single pqn. Especially since it's the one mistake that has affected multiple configs.

WARNING: Model configuration(s) exist in your dbt_project.yml file which do not apply to any models. There is 1 unused model configuration:
 - experiments

Edit: discussed the above with a colleague, and he said it's better to have lots of errors resulting from one mistake, than try and get fancy about reporting the single mistake. So will leave that as is.

@clrcrl

This comment has been minimized.

Copy link
Contributor Author

clrcrl commented Apr 30, 2018

^ ended up not implementing the above to-dos around making the warning message friendlier. Can do if you think it is valuable.

That recursive function still probably needs some love.

Over to you @drewbanin

@clrcrl clrcrl changed the title Warn for unused model configs in project file Warn for unused resource (model/seeds) config paths in project file Jul 22, 2018

@clrcrl

This comment has been minimized.

Copy link
Contributor Author

clrcrl commented Jul 22, 2018

Now addresses #784.
Keen to get feedback on this!

@clrcrl clrcrl closed this Jul 22, 2018

@clrcrl clrcrl deleted the clrcrl:check-model-configs branch Jul 22, 2018

@clrcrl clrcrl restored the clrcrl:check-model-configs branch Jul 22, 2018

@clrcrl

This comment has been minimized.

Copy link
Contributor Author

clrcrl commented Jul 22, 2018

^ ignore that, tried to rename the remote branch, which ended up closing the PR/deleting the branch 🤷. Apols!

@clrcrl clrcrl reopened this Jul 22, 2018

@drewbanin

This comment has been minimized.

Copy link
Contributor

drewbanin commented Aug 23, 2018

@clrcrl I've answered a bunch of support questions over the past couple of days relating to issues with people's dbt_project.yml files. I'd love to figure out how to get this merged!

dbt has changed a bunch in the past few months, so I think this might need a rebase against development. Do you anticipate having more time to work on this in the next few weeks (/months)? If not, we could pick up where you left off here. Let me know what you're thinking

@clrcrl clrcrl force-pushed the clrcrl:check-model-configs branch from b785777 to c367d5b Aug 28, 2018

@clrcrl

This comment has been minimized.

Copy link
Contributor Author

clrcrl commented Aug 28, 2018

@drewbanin – couldn't get the integration tests running locally (docker issues), but used this to check my tests:

(dbt-dev) $ python -m unittest -q test/unit/test_config.py
----------------------------------------------------------------------
Ran 6 tests in 0.022s

OK

Expecting it to fail codeclimate again though

@clrcrl

This comment has been minimized.

Copy link
Contributor Author

clrcrl commented Aug 28, 2018

Nope - don't look at this yet. Just realised there's an error in it

@clrcrl clrcrl force-pushed the clrcrl:check-model-configs branch from 768f1f3 to c367d5b Aug 28, 2018

@clrcrl

This comment has been minimized.

Copy link
Contributor Author

clrcrl commented Aug 28, 2018

OK should be working now.
I think I need to write some better tests though. I feel like it might be a good opportunity to pair for this part since this is part of writing code that I want to get a whole lot better at.

@beckjake

This comment has been minimized.

Copy link
Contributor

beckjake commented Sep 14, 2018

This looks like a great feature! We're going to look at merging this after #973 - I'll rebase your branch off the merged branch and fix it up, since that PR changes a lot of configuration related stuff and nobody else should have to suffer through that. Once that's done, I'd be happy to pair with you on some tests!

@beckjake beckjake referenced this pull request Sep 26, 2018

Merged

Check model configs #1027

@drewbanin

This comment has been minimized.

Copy link
Contributor

drewbanin commented Oct 17, 2018

Fixed in #1027 -- closing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment