Allow model filtering #45

followingell · 2022-10-25T15:40:03Z

Issue

Closes Filter models for the coverage #24:

"Please add options in the CLI to include and exclude models to filter out the checks in some of the models or a path."

Summary

I added the ability to perform compute commands on only a subset of tables by adding a --model-path-filter option. This means that a subset of models can be selected based upon their original_file_path value (taken from the manifest.json artifact).

This functionality means that dbt-coverage can now be used in monolithic dbt projects which contain sub-projects owned by different teams. Before adding model selection functionality, using dbt-coverage would not have been useful/advisable in such a structure because another, unrelated team may decrease the overall coverage, which can then block PR merging (should dbt-coverage have been integrated as part of a CI/CD pipeline for example).

See example of added functionality from updated README.md below:

$ cd jaffle_shop
$ dbt run  # Materialize models
$ dbt docs generate  # Generate catalog.json and manifest.json
$ dbt-coverage compute doc --cov-report coverage-doc.json --model-path-filter models/staging/  # Compute doc coverage for a subset of tables, print it and write it to coverage-doc.json file

Coverage report
======================================================
jaffle_shop.stg_customers              0/3       0.0%
jaffle_shop.stg_orders                 0/4       0.0%
jaffle_shop.stg_payments               0/4       0.0%
======================================================
Total                                  0/11      0.0%

$ dbt-coverage compute doc --cov-report coverage-doc.json --model-path-filter models/orders.sql --model-path-filter models/staging/  # Compute doc coverage for a subset of tables, print it and write it to coverage-doc.json file

Coverage report
======================================================
jaffle_shop.orders                     0/9       0.0%
jaffle_shop.stg_customers              0/3       0.0%
jaffle_shop.stg_orders                 0/4       0.0%
jaffle_shop.stg_payments               0/4       0.0%
======================================================
Total                                  0/20      0.0%

Note: this is a relatively 'rough' solution and there are likely many improvements that could be made to my code / far more elegant implementations that would achieve the same functionality. Please, feel free to suggest changes!

Testing

I have tested these changes on dbt's jaffle_shop 'testing project' and have not encountered issues so far.

followingell · 2022-10-28T06:51:49Z

Just checking, is anyone available to review this please @sweco, @mrshu?

Is there anything that I can add to the PR to make the review process easier for yourselves?

mrshu · 2022-11-02T14:58:46Z

Thanks for the PR @followingell -- I'll take a closer look at it later today 🙂

followingell · 2022-11-02T22:54:02Z

@followingell Is there any specific reason for PurePath here as opposed to using Path that has already been in use?

@mrshu Not sure where the above comment has gone(?), regardless responding here rather than via email:

I think this StackOverflow answer sums it up quite nicely. Essentially, PurePath just performs string-like operations whereas Path can also do I/O operations which we don't need here. As such, I chose to utilise the simpler, parent class.

If you'd rather I just use Path then I'm happy to do so.

mrshu

Thanks for putting this together @followingell!

In general, this looks rather nicely -- I've only put a couple of comments here and there to see if I understood the whole concept correctly. Once we get those resolved, we can simply merge it in 🙂

Thanks!

dbt_coverage/__init__.py

mrshu · 2022-11-03T01:12:55Z

dbt_coverage/__init__.py

+        original_tables_dict = {key: val for key, val in self.tables.items()}
+        for key, table in original_tables_dict.items():
+            for path in model_path_filter:
+                if path in table.original_file_path:


@followingell how would you feel about changing this to table.original_file_path.startswith(path)? I am mostly concerned about the unintended consequences of having say the staging/ and pre-staging/ folders in one's project and as it currently is, I am afraid --model-path-filter staging/ would cover both.

Alternatively, having a regex option would certainly work too 🙂

@mrshu Again, great thoughts r.e. covering both staging/ and pre-staging/ unintentionally when using --model-path-filter staging/ with my previous implementation.

As per your suggestion I have swapped the code to use startswith and had this reflected in the README since this provides as much control as before whilst reducing the potential for unintended inclusions as outlined above.

dbt_coverage/__init__.py

CHANGELOG.md

Co-authored-by: Marek Šuppa <mrshu@users.noreply.github.com>

followingell · 2022-11-04T09:15:20Z

@mrshu FYI for now I have finished with changes. As such, ready for review 👍

mrshu

Thank you for your patience @followingell -- this is very nicely done and I like it quite a bit!

mrshu · 2022-11-17T17:17:58Z

Thanks again @followingell , this was now released in 0.3.0!

followingell added 11 commits October 22, 2022 14:12

Adjust Manifest to store path and unique_id

9d10d2b

Add unique_id property to Table

6dbaebc

Align dictionary access method

408d3e1

Add original_file_path property to Table class

545b8c8

Add filter_catalog method to Catalog class

4e8c77c

Add multiple CLI options for table filtering

fb7c0eb

Ensure backwards compatibility if model_path_filter option not passed

a96fa79

Add model path selection details to README

bf4f107

Add support for model selection by path

7d90c1c

Update support matrix

5a633d5

Make docstrings clearer

6704fc6

followingell mentioned this pull request Oct 25, 2022

Filter models for the coverage #24

Closed

followingell added 2 commits October 25, 2022 18:23

Fix alignment

bd31ca0

Condense support matrix

bf2c0b9

mrshu reviewed Nov 3, 2022

View reviewed changes

mrshu self-assigned this Nov 3, 2022

followingell and others added 5 commits November 3, 2022 10:35

Add user thanks to CHANGELOG.md

6f28ff2

Co-authored-by: Marek Šuppa <mrshu@users.noreply.github.com>

Replace PurePath with Path

8e1efc6

Adjust logging to show filtered_tables length

fc06bd7

Use startswith to prevent unintended inclusions

903ce3a

Add TODO to explain len() usage

9823d80

followingell requested a review from mrshu November 3, 2022 11:08

Remove TODO

ca47a9d

mrshu approved these changes Nov 15, 2022

View reviewed changes

mrshu merged commit d17f505 into slidoapp:main Nov 15, 2022

mrshu mentioned this pull request Nov 15, 2022

Clean up len(model_path_filter) >= 1 once a Typer issue gets resolved #48

Open

sweco mentioned this pull request Mar 13, 2024

Add tag filtering option for dbt-coverage users #75

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow model filtering #45

Allow model filtering #45

followingell commented Oct 25, 2022 •

edited

followingell commented Oct 28, 2022

mrshu commented Nov 2, 2022

followingell commented Nov 2, 2022 •

edited

mrshu left a comment

mrshu Nov 3, 2022

followingell Nov 3, 2022

followingell commented Nov 4, 2022

mrshu left a comment

mrshu commented Nov 17, 2022

Allow model filtering #45

Allow model filtering #45

Conversation

followingell commented Oct 25, 2022 • edited

Issue

Summary

Testing

followingell commented Oct 28, 2022

mrshu commented Nov 2, 2022

followingell commented Nov 2, 2022 • edited

mrshu left a comment

Choose a reason for hiding this comment

mrshu Nov 3, 2022

Choose a reason for hiding this comment

followingell Nov 3, 2022

Choose a reason for hiding this comment

followingell commented Nov 4, 2022

mrshu left a comment

Choose a reason for hiding this comment

mrshu commented Nov 17, 2022

followingell commented Oct 25, 2022 •

edited

followingell commented Nov 2, 2022 •

edited