Support multiple coverage doc/test save formats #29

rloredo · 2022-04-22T12:43:13Z

From my understanding we can only save .json files as a result of dbt-coverage compute.

It would be nice if we could pick other formats, for example .csv
That way we can put those files as seeds and write to the db.

rloredo · 2022-04-22T13:36:41Z

Something like this but prettier haha

import subprocess
import pandas as pd


def flatten_nested_json_df(df):
    """
    Flatten a df with json nested columns
    """
    df = df.reset_index()

    # search for columns to explode/flatten
    s = (df.applymap(type) == list).all()
    list_columns = s[s].index.tolist()
    s = (df.applymap(type) == dict).all()
    dict_columns = s[s].index.tolist()

    while len(list_columns) > 0 or len(dict_columns) > 0:
        new_columns = []

        for col in dict_columns:
            # explode dictionaries horizontally, adding new columns
            horiz_exploded = pd.json_normalize(df[col]).add_prefix(f"{col}.")
            horiz_exploded.index = df.index
            df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])
            new_columns.extend(horiz_exploded.columns)  # inplace

        for col in list_columns:
            # explode lists vertically, adding new columns
            df = df.drop(columns=[col]).join(df[col].explode().to_frame())
            new_columns.append(col)

        # check if there are still dict o list fields to flatten
        s = (df[new_columns].applymap(type) == list).all()
        list_columns = s[s].index.tolist()
        s = (df[new_columns].applymap(type) == dict).all()
        dict_columns = s[s].index.tolist()
    return df


if __name__ == "__main__":
    subprocess.run(["dbt docs generate"], shell=True)
    print("\n")
    # subprocess.run(["dbt-coverage compute test --cov-report tools/doc_test_coverage/coverage-test.json"], cwd="../../", shell=True)
    subprocess.run(
        [
            "dbt-coverage compute doc --cov-report tools/doc_test_coverage/coverage-doc.json"
        ],
        cwd="../../",
        shell=True,
    )
    print("\n")
    print("saving results to seeds")
    tables_tests = flatten_nested_json_df(
        pd.read_json("coverage-test.json")
    ).drop_duplicates(subset=["tables.name"])[
        ["tables.name", "tables.covered", "tables.total", "tables.coverage"]
    ]
    tables_tests.columns = [
        "table_name",
        "columns_covered",
        "columns_total",
        "coverage_ratio",
    ]
    schemas_tests = tables_tests[
        ["table_name", "columns_covered", "columns_total"]
    ].copy()
    schemas_tests[["schema_name", "table_name"]] = schemas_tests.table_name.str.split(
        ".", expand=True
    )
    schemas_tests = schemas_tests.groupby("schema_name", as_index=False).agg(
        {"columns_covered": "sum", "columns_total": "sum", "table_name": "count"}
    )
    schemas_tests.rename(columns={"table_name": "tables_total"}, inplace=True)
    tables_tests.to_csv("../../seeds/tables_tests.csv", index=False)
    schemas_tests.to_csv("../../seeds/schemas_tests.csv", index=False)

sweco · 2022-05-10T14:05:51Z

Hey @rloredo! Thanks for your interest in the project and sorry for replying so late.

If you are using dbt-coverage directly from Python, you can use the do_compute function.

dbt-coverage/dbt_coverage/__init__.py

Lines 603 to 610 in 468df29

    
           def do_compute(project_dir: Path = Path('.'), cov_report: Path = Path('coverage.json'), 
        
                          cov_type: CoverageType = CoverageType.DOC, cov_fail_under: float = None, 
        
                          cov_fail_compare: Path = None): 
        
               """ 
        
               Computes coverage for a dbt project. 
        
               Use this method in your Python code to bypass typer. 
        
               """

However, it seems that the function does not return the coverage report once it finishes computing. We could definitely add this and then you can do whatever you want with the report - save it in a CSV or even analyze it directly in a Jupyter notebook or Python code.

import dbt_coverage

report = dbt_coverage.do_compute(...)
report = report.to_dict()

# Load to pandas, write to CSV, do whatever

Would that seem like a good solution to you?

rloredo · 2022-05-12T14:08:00Z

Hi, @sweco, Thank you for your answer.
Yes! that makes more sense than what I proposed.
Thank you!

sweco · 2022-05-16T08:18:05Z

Alright, I'll add the missing return to the do_compute function and I'll let you know when it's done and released! 😊

rloredo · 2022-05-16T08:29:07Z

Awesome, I'm not in a rush since I wrote that for a local/personal fork (it was an easy fix). Thank you for the idea :)
Hope you can improve the project!

This allows for arbitrary analysis of the coverage report by the clients.

sweco added a commit that referenced this issue Jul 29, 2022

#29 Return coverage report from compute

4a4d31b

This allows for arbitrary analysis of the coverage report by the clients.

sweco mentioned this issue Jul 29, 2022

Return coverage report from compute #36

Merged

sweco closed this as completed in #36 Jul 29, 2022

sweco added a commit that referenced this issue Jul 29, 2022

#29 Return coverage report from compute

54a7999

This allows for arbitrary analysis of the coverage report by the clients.

sweco added a commit that referenced this issue Jul 29, 2022

#29 Update changelog

e4498f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiple coverage doc/test save formats #29

Support multiple coverage doc/test save formats #29

rloredo commented Apr 22, 2022 •

edited

rloredo commented Apr 22, 2022 •

edited

sweco commented May 10, 2022

rloredo commented May 12, 2022

sweco commented May 16, 2022

rloredo commented May 16, 2022

Support multiple coverage doc/test save formats #29

Support multiple coverage doc/test save formats #29

Comments

rloredo commented Apr 22, 2022 • edited

rloredo commented Apr 22, 2022 • edited

sweco commented May 10, 2022

rloredo commented May 12, 2022

sweco commented May 16, 2022

rloredo commented May 16, 2022

rloredo commented Apr 22, 2022 •

edited

rloredo commented Apr 22, 2022 •

edited