Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix matching of entities between manifest and catalog #14

Merged
merged 4 commits into from Dec 3, 2021

Conversation

sweco
Copy link
Collaborator

@sweco sweco commented Dec 1, 2021

This fixes two issues:

  1. Some systems (e.g. Snowflake) uppercase all table and column names. In accordance with the dbt-docs official source code, we lowercase all table and column names before matching them.
  2. There can be multiple tables with the same name but in different schemas. We therefore use schema_name.table_name to distinguish tables.

dbt_coverage/__init__.py Show resolved Hide resolved
dbt_coverage/__init__.py Show resolved Hide resolved
dbt_coverage/__init__.py Outdated Show resolved Hide resolved
This fixes two issues:

1. Some systems (e.g. Snowflake) uppercase all table and column names. In accordance with the dbt-docs official source code, we lowercase all table and column names before matching them.
2. There can be multiple tables with the same name but in different schemas. We therefore use schema_name.table_name to distinguish tables.
Use domain object ColumnRef for storing information about covered and not covered columns instead of raw strings which then need to be parsed.
@sweco
Copy link
Collaborator Author

sweco commented Dec 3, 2021

@mrshu, I refactored the code, I believe it's much cleaner now. The following was done:

  • Introduce ColumnRef class to hold information about covered / not covered columns. This gets rid of the string parsing that we had.

  • Move all coverage computing to the CoverageReport class because it probably belongs there anyway. Thanks to this we only compute coverage once, previously it was accidentally computed twice. In the example below, it was computed on lines 190 and 197-198 because both catalog.coverage and CoverageReport.from_table call table.coverage inside.

    @classmethod
    def from_catalog(cls, catalog: Catalog, cov_type: CoverageType):
    cov = catalog.coverage(cov_type)
    return CoverageReport(
    cls.Type.CATALOG,
    cov_type,
    None,
    cov[0],
    cov[1],
    {table.name: CoverageReport.from_table(table, cov_type)
    for table in catalog.tables.values()}
    )

I believe this is a final code. Could I please bother you for one last round of review?

Copy link
Contributor

@mrshu mrshu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done @sweco, let's get it in!

@sweco sweco force-pushed the fix-entities-matching branch 2 times, most recently from a60ad34 to c0504f6 Compare December 3, 2021 16:21
@sweco sweco merged commit 1a0917f into main Dec 3, 2021
@sweco sweco deleted the fix-entities-matching branch December 3, 2021 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants