Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple coverage doc/test save formats #29

Closed
rloredo opened this issue Apr 22, 2022 · 5 comments · Fixed by #36
Closed

Support multiple coverage doc/test save formats #29

rloredo opened this issue Apr 22, 2022 · 5 comments · Fixed by #36

Comments

@rloredo
Copy link

rloredo commented Apr 22, 2022

From my understanding we can only save .json files as a result of dbt-coverage compute.

It would be nice if we could pick other formats, for example .csv
That way we can put those files as seeds and write to the db.

@rloredo
Copy link
Author

rloredo commented Apr 22, 2022

Something like this but prettier haha

import subprocess
import pandas as pd


def flatten_nested_json_df(df):
    """
    Flatten a df with json nested columns
    """
    df = df.reset_index()

    # search for columns to explode/flatten
    s = (df.applymap(type) == list).all()
    list_columns = s[s].index.tolist()
    s = (df.applymap(type) == dict).all()
    dict_columns = s[s].index.tolist()

    while len(list_columns) > 0 or len(dict_columns) > 0:
        new_columns = []

        for col in dict_columns:
            # explode dictionaries horizontally, adding new columns
            horiz_exploded = pd.json_normalize(df[col]).add_prefix(f"{col}.")
            horiz_exploded.index = df.index
            df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])
            new_columns.extend(horiz_exploded.columns)  # inplace

        for col in list_columns:
            # explode lists vertically, adding new columns
            df = df.drop(columns=[col]).join(df[col].explode().to_frame())
            new_columns.append(col)

        # check if there are still dict o list fields to flatten
        s = (df[new_columns].applymap(type) == list).all()
        list_columns = s[s].index.tolist()
        s = (df[new_columns].applymap(type) == dict).all()
        dict_columns = s[s].index.tolist()
    return df


if __name__ == "__main__":
    subprocess.run(["dbt docs generate"], shell=True)
    print("\n")
    # subprocess.run(["dbt-coverage compute test --cov-report tools/doc_test_coverage/coverage-test.json"], cwd="../../", shell=True)
    subprocess.run(
        [
            "dbt-coverage compute doc --cov-report tools/doc_test_coverage/coverage-doc.json"
        ],
        cwd="../../",
        shell=True,
    )
    print("\n")
    print("saving results to seeds")
    tables_tests = flatten_nested_json_df(
        pd.read_json("coverage-test.json")
    ).drop_duplicates(subset=["tables.name"])[
        ["tables.name", "tables.covered", "tables.total", "tables.coverage"]
    ]
    tables_tests.columns = [
        "table_name",
        "columns_covered",
        "columns_total",
        "coverage_ratio",
    ]
    schemas_tests = tables_tests[
        ["table_name", "columns_covered", "columns_total"]
    ].copy()
    schemas_tests[["schema_name", "table_name"]] = schemas_tests.table_name.str.split(
        ".", expand=True
    )
    schemas_tests = schemas_tests.groupby("schema_name", as_index=False).agg(
        {"columns_covered": "sum", "columns_total": "sum", "table_name": "count"}
    )
    schemas_tests.rename(columns={"table_name": "tables_total"}, inplace=True)
    tables_tests.to_csv("../../seeds/tables_tests.csv", index=False)
    schemas_tests.to_csv("../../seeds/schemas_tests.csv", index=False)

@sweco
Copy link
Collaborator

sweco commented May 10, 2022

Hey @rloredo! Thanks for your interest in the project and sorry for replying so late.

If you are using dbt-coverage directly from Python, you can use the do_compute function.

def do_compute(project_dir: Path = Path('.'), cov_report: Path = Path('coverage.json'),
cov_type: CoverageType = CoverageType.DOC, cov_fail_under: float = None,
cov_fail_compare: Path = None):
"""
Computes coverage for a dbt project.
Use this method in your Python code to bypass typer.
"""

However, it seems that the function does not return the coverage report once it finishes computing. We could definitely add this and then you can do whatever you want with the report - save it in a CSV or even analyze it directly in a Jupyter notebook or Python code.

import dbt_coverage

report = dbt_coverage.do_compute(...)
report = report.to_dict()

# Load to pandas, write to CSV, do whatever

Would that seem like a good solution to you?

@rloredo
Copy link
Author

rloredo commented May 12, 2022

Hi, @sweco, Thank you for your answer.
Yes! that makes more sense than what I proposed.
Thank you!

@sweco
Copy link
Collaborator

sweco commented May 16, 2022

Alright, I'll add the missing return to the do_compute function and I'll let you know when it's done and released! 😊

@rloredo
Copy link
Author

rloredo commented May 16, 2022

Awesome, I'm not in a rush since I wrote that for a local/personal fork (it was an easy fix). Thank you for the idea :)
Hope you can improve the project!

sweco added a commit that referenced this issue Jul 29, 2022
This allows for arbitrary analysis of the coverage report by the clients.
@sweco sweco closed this as completed in #36 Jul 29, 2022
sweco added a commit that referenced this issue Jul 29, 2022
This allows for arbitrary analysis of the coverage report by the clients.
sweco added a commit that referenced this issue Jul 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants