Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix (dbt): derived metrics #154

Merged
merged 4 commits into from
Dec 1, 2022
Merged

fix (dbt): derived metrics #154

merged 4 commits into from
Dec 1, 2022

Conversation

betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented Nov 30, 2022

Some fixes to derived metrics in dbt:

  • Derived metrics were not being synced to Superset because we were not mapping them to the correct models. We have to traverse the dbt DAG upstream to find all the parent models. This is now fixed.
  • dbt 1.3 introduced changes to the schema of metrics. Both schemas are now supported.

Closes #153 and #126.

@@ -150,7 +150,7 @@ def dbt_core( # pylint: disable=too-many-arguments, too-many-locals
for config in configs["metrics"].values():
# conform to the same schema that dbt Cloud uses for metrics
config["dependsOn"] = config["depends_on"]["nodes"]
config["uniqueID"] = config["unique_id"]
config["uniqueId"] = config["unique_id"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug, but since the metric unique ID was not needed anywhere it was not caught. With the changes in this PR we use the metric ID to traverse the DAG upstream to find parent models.

Comment on lines -112 to -114
metric["name"]: metric
for metric in metrics
if model["unique_id"] in metric["depends_on"]
Copy link
Member Author

@betodealmeida betodealmeida Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was too naive, and didn't account for how derived metrics depend on other metrics, instead of depending on models. I replaced it with a smarter function get_metrics_for_model that traverses the DAG upstream.

Comment on lines +101 to +102
if is_derived(node):
queue.extend(metric_map[parent] for parent in depends_on)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the crux of the PR: for derived metrics we need to look at the upstream metrics to find the parent models.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the metric list something from superset or something from the dbt manifest?

Comment on lines +28 to +37
if "calculation_method" in metric:
# dbt >= 1.3
type_ = metric["calculation_method"]
sql = metric["expression"]
expression = "derived"
else:
# dbt < 1.3
type_ = metric["type"]
sql = metric["sql"]
expression = "expression"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@@ -22,8 +25,16 @@ def get_metric_expression(metric_name: str, metrics: Dict[str, MetricSchema]) ->
raise Exception(f"Invalid metric {metric_name}")

metric = metrics[metric_name]
type_ = metric["type"]
sql = metric["sql"]
if "calculation_method" in metric:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: have it inline

legacy = "calculation_method" in metric
type_ = metric["calculation_method"]  if legacy else metric["type"]
sql = metric["expression"] if legacy else metric["sql"]
expression = "derived" if legacy else  "expression"

Copy link

@stevepisani stevepisani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me!

@betodealmeida betodealmeida merged commit 1d2a8f5 into main Dec 1, 2022
Copy link
Contributor

@hughhhh hughhhh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cwegener
Copy link

cwegener commented Dec 2, 2022

Exciting work!! Can't wait to make some time and test this out.

@cwegener
Copy link

cwegener commented Dec 2, 2022

@betodealmeida The MetricSchema marshmallow schema is missing the new dbt 1.3 field names, so the new fields names never actually get passed through to this new dbt code.

@betodealmeida
Copy link
Member Author

Ah, thanks! Let me fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expression metrics are not imported to Superset
4 participants