Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(dbt): sync metrics from MetricFlow #256

Merged
merged 5 commits into from
Jan 9, 2024
Merged

Conversation

betodealmeida
Copy link
Member

This PR introduces an algorithm for syncing dbt metrics defined using MetricFlow. In order to sync metrics it:

  1. Queries a new GraphQL endpoint to get all the names of the metrics
  2. Queries a new GraphQL endpoint to get the SQL for each of the metrics
  3. Converts the SQL (a full query) into a projection, using an algorithm similar to the one used with older metrics

Because in MetricFlow metrics are a higher concept, not all metrics can be converted. For example, a metric could be "sales in France", having an implicit JOIN in them. Metrics with joins and/or multiple tables are ignored, and the error is logged.

Note that MetricFlow has a concept called "measure" which is much closer to the Superset metric. Unfortunately the GraphQL API doesn't expose all the information we need about measures to convert them into Superset metrics. Namely, the models to which each measure belong are not available via the API.

Testing:

$ preset-cli --workspaces=https://b5eda41a.us1a.app.preset.io/ superset sync dbt-cloud dbtc_XXX 485502

Models and metrics are synced correctly to: https://b5eda41a.us1a.app.preset.io/tablemodelview/list/?pageIndex=0&sortColumn=changed_on_delta_humanized&sortOrder=desc

@@ -683,37 +723,46 @@ def get_models(self, job_id: int) -> List[ModelSchema]:
Fetch all available models.
"""
query = """
query ($jobId: Int!) {
models(jobId: $jobId) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is being deprecated, so I converted to the recommended query.

return aliases


def convert_query_to_projection(sql: str, dialect: MFSQLEngine) -> str:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the crux of the PR, converting the SQL from MetricFlow into a projection. I've added a good number of unit tests covering all the scenarios I saw.

Copy link
Member

@craig-rueda craig-rueda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@betodealmeida betodealmeida merged commit 55321bd into main Jan 9, 2024
4 checks passed
@Vitor-Avila
Copy link
Contributor

hey @betodealmeida, thank you so much for working on this feature and also adding such extensive test coverage! Are these changes compatible with older versions of dbt Cloud (like 1.4 and 1.5)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants