Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow model filtering #45

Merged
merged 19 commits into from Nov 15, 2022
Merged

Conversation

followingell
Copy link
Contributor

@followingell followingell commented Oct 25, 2022

Issue

"Please add options in the CLI to include and exclude models to filter out the checks in some of the models or a path."

Summary

I added the ability to perform compute commands on only a subset of tables by adding a --model-path-filter option. This means that a subset of models can be selected based upon their original_file_path value (taken from the manifest.json artifact).

This functionality means that dbt-coverage can now be used in monolithic dbt projects which contain sub-projects owned by different teams. Before adding model selection functionality, using dbt-coverage would not have been useful/advisable in such a structure because another, unrelated team may decrease the overall coverage, which can then block PR merging (should dbt-coverage have been integrated as part of a CI/CD pipeline for example).

  • See example of added functionality from updated README.md below:
$ cd jaffle_shop
$ dbt run  # Materialize models
$ dbt docs generate  # Generate catalog.json and manifest.json
$ dbt-coverage compute doc --cov-report coverage-doc.json --model-path-filter models/staging/  # Compute doc coverage for a subset of tables, print it and write it to coverage-doc.json file

Coverage report
======================================================
jaffle_shop.stg_customers              0/3       0.0%
jaffle_shop.stg_orders                 0/4       0.0%
jaffle_shop.stg_payments               0/4       0.0%
======================================================
Total                                  0/11      0.0%

$ dbt-coverage compute doc --cov-report coverage-doc.json --model-path-filter models/orders.sql --model-path-filter models/staging/  # Compute doc coverage for a subset of tables, print it and write it to coverage-doc.json file

Coverage report
======================================================
jaffle_shop.orders                     0/9       0.0%
jaffle_shop.stg_customers              0/3       0.0%
jaffle_shop.stg_orders                 0/4       0.0%
jaffle_shop.stg_payments               0/4       0.0%
======================================================
Total                                  0/20      0.0%

Note: this is a relatively 'rough' solution and there are likely many improvements that could be made to my code / far more elegant implementations that would achieve the same functionality. Please, feel free to suggest changes!

Testing

  • I have tested these changes on dbt's jaffle_shop 'testing project' and have not encountered issues so far.

@followingell
Copy link
Contributor Author

Just checking, is anyone available to review this please @sweco, @mrshu?

Is there anything that I can add to the PR to make the review process easier for yourselves?

@mrshu
Copy link
Contributor

mrshu commented Nov 2, 2022

Thanks for the PR @followingell -- I'll take a closer look at it later today 🙂

@followingell
Copy link
Contributor Author

followingell commented Nov 2, 2022

@followingell Is there any specific reason for PurePath here as opposed to using Path that has already been in use?

@mrshu Not sure where the above comment has gone(?), regardless responding here rather than via email:

I think this StackOverflow answer sums it up quite nicely. Essentially, PurePath just performs string-like operations whereas Path can also do I/O operations which we don't need here. As such, I chose to utilise the simpler, parent class.

If you'd rather I just use Path then I'm happy to do so.

Copy link
Contributor

@mrshu mrshu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together @followingell!

In general, this looks rather nicely -- I've only put a couple of comments here and there to see if I understood the whole concept correctly. Once we get those resolved, we can simply merge it in 🙂

Thanks!

dbt_coverage/__init__.py Outdated Show resolved Hide resolved
dbt_coverage/__init__.py Show resolved Hide resolved
original_tables_dict = {key: val for key, val in self.tables.items()}
for key, table in original_tables_dict.items():
for path in model_path_filter:
if path in table.original_file_path:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@followingell how would you feel about changing this to table.original_file_path.startswith(path)? I am mostly concerned about the unintended consequences of having say the staging/ and pre-staging/ folders in one's project and as it currently is, I am afraid --model-path-filter staging/ would cover both.

Alternatively, having a regex option would certainly work too 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrshu Again, great thoughts r.e. covering both staging/ and pre-staging/ unintentionally when using --model-path-filter staging/ with my previous implementation.

As per your suggestion I have swapped the code to use startswith and had this reflected in the README since this provides as much control as before whilst reducing the potential for unintended inclusions as outlined above.

dbt_coverage/__init__.py Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
@mrshu mrshu self-assigned this Nov 3, 2022
@followingell
Copy link
Contributor Author

@mrshu FYI for now I have finished with changes. As such, ready for review 👍

Copy link
Contributor

@mrshu mrshu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your patience @followingell -- this is very nicely done and I like it quite a bit!

@mrshu
Copy link
Contributor

mrshu commented Nov 17, 2022

Thanks again @followingell , this was now released in 0.3.0!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filter models for the coverage
2 participants