Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

update metrics spec #100

Merged
merged 17 commits into from
Aug 10, 2023

Conversation

Jstein77
Copy link
Contributor

@Jstein77 Jstein77 commented Jul 21, 2023

Please provide your name and company
Jordan Stein, dbt Labs

Link the issue/feature request which this PR is meant to address

#101

Detail what changes this PR introduces and how this addresses the issue/feature request linked above.
This PR updates the metrics defined in the project to the new spec supported in dbt-core 1.6. It also adds semantic models, which are a new resource in 1.6

How did you validate the changes introduced within this PR?
Confirmed that the semantic manifest successfully parsed, and mf valiate-configs succeeds up to warehouse validation. I would like help testing the actual metric values on sample data.
Which warehouse did you use to develop these changes?
Snowflake.

Did you update the CHANGELOG?

  • Yes

Did you update the dbt_project.yml files with the version upgrade (please leverage standard semantic versioning)? (In both your main project and integration_tests)

I'm not sure if this change would be considered a patch. Please let me know how to update the version.

Provide an emoji that best describes your current mood

馃拑

@Jstein77
Copy link
Contributor Author

@fivetran-joemarkiewicz would you be able to help review this PR?

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this PR @Jstein77! I have a few quick questions before approving this PR.

  • You call out that this new metric config is compatible with dbt-core 1.6.0. However, the dbt version requirements of the package were not adjusted. Currently the requirements are >= 1.3.0, < 2.0.0. Do you feel this range is still appropriate with the new config?
  • Should there be any adjustments to the README section on how to leverage the metrics? If I recall there were some changes with how users will interface with these metrics. Should we update the README documentation to inform users how to best utilize the metric updates?

@fivetran-joemarkiewicz
Copy link
Contributor

Hi @Jstein77 just wanted to bump my previous question. Once this is addressed we should be able to move forward with taking steps to fold your changes into the next release.

You call out that this new metric config is compatible with dbt-core 1.6.0. However, the dbt version requirements of the package were not adjusted. Currently the requirements are >= 1.3.0, < 2.0.0. Do you feel this range is still appropriate with the new config?

@Jstein77
Copy link
Contributor Author

Jstein77 commented Aug 7, 2023

Hi @fivetran-joemarkiewicz sorry for the delay! We were sprinting to release the Semantic Layer beta. Addressing your two excellent questions:

  1. These configs will only work with dbt-core>=1.6. I've updated the version requirements to reflect that.
  2. I updated the readme with links to the quick start guide, as well as our documentation on how to query these metrics.

I ran into one issue when testing that I wanted to get your help with. Currently, MetricFlow requires users to create a model in their dbt project called metricflow_time_spine.sql that we use to construct cumulative metrics. Originally, I included the time spine model in this package so I could test parsing the project and generating a semantic manifest. However, we expect users to already have a metricflow_time_spine model defined in their project if they are using the semantic layer, and we'll hit a parsing error if there are two models called metricflow_time_spine.

My suggested approach was to remove the metricflow_time_spine.sql from this package, and have users configure it in their own project. However this will cause a dbt parse of this project to fail with the following error:

Parsing Error
  The semantic layer requires a 'metricflow_time_spine' model in the project, but none was found. Guidance on creating this model can be found on our docs site (https://docs.getdbt.com/docs/build/metricflow-time-spine) 

I'm not sure if that impacts your tests, or if this project needs to be able to parse successfully. How would you recommend we proceed?

@fivetran-joemarkiewicz
Copy link
Contributor

Hi @Jstein77 thanks for making the README updates as well as the clarification around the dbt-core version requirements.

In regards to your question about the metricflow_time_spine.sql model and the challenges present with both the inclusion and exclusion from the package. I really do not like the idea of users experiencing a parsing error when using the package. Based on that, I would request we include this model to avoid this issue.

Further, to avoid the issue of users possibly having their own model configured I would recommend we add a config block at the top of the model that is tied to a new variable (possibly ad_reporting__using_metric_flow). This can be enabled by default, but we can add details in the README section to show users how to disable this model if they already have their own in their root project. This could then be highlighted in the CHANGELOG so users who upgrade to this next breaking version will be able to see this and ideally disable it right off the bat if they needed.

What are your thoughts to the above?

@fivetran-joemarkiewicz
Copy link
Contributor

fivetran-joemarkiewicz commented Aug 8, 2023

@Jstein77 additionally when reviewing I had an additional few questions I was hoping you could help me understand:

  • What is the difference between the metrics yaml file and the semantic_models yaml file?
    • My understanding is that metrics are defined in the metrics.yaml file and the semantic_models.yaml file then is able to reference these metrics. Let me know if I am misunderstanding the differences between the two.
  • I noticed you used yaml as opposed to yml. What is the benefit of using this file type. Or is there any difference?
  • (edit adding another question) I noticed that in the dbt docs generate command, there is a new semantic_manifest.json file. Would you be able to explain this file and if it is needed when hosting the dbt docs?

@Jstein77
Copy link
Contributor Author

Jstein77 commented Aug 8, 2023

@fivetran-joemarkiewicz I like that approach! I think most users already have a metricflow_time_spine model configured in their project, but they can then set the env variable to disabled. Note that we're planning to create this model automatically for the user so eventually this will no longer be an issue. I'll get a commit pushed with these changes.

What is the difference between the metrics yaml file and the semantic_models yaml file?
My understanding is that metrics are defined in the metrics.yaml file and the semantic_models.yaml file then is able to reference these metrics. Let me know if I am misunderstanding the differences between the two.

That's correct, semantic models encode information about measures, dimensions and entites from your dbt models in a way that metricflow understands, and you build metrics from measures defined in semantic models.

I noticed you used yaml as opposed to yml. What is the benefit of using this file type. Or is there any difference?
.yaml is just a vestige of how we defined configs at Transform. I think we're recommending folks use the .yml extensions so i will update. Functionally there is no difference.

(edit adding another question) I noticed that in the dbt docs generate command, there is a new semantic_manifest.json file. Would you be able to explain this file and if it is needed when hosting the dbt docs?

This isn't needed to host the docs site. We have docs on the semantic manifest here https://docs.getdbt.com/docs/dbt-cloud-apis/sl-manifest i can also add this to the readme.

@fivetran-joemarkiewicz
Copy link
Contributor

This all sounds great, thanks! One final question, would it be possible to add the metricflow_time_spine model into either the metrics or semantic_models folder? Just to avoid any confusion with the direct output models of the ad reporting package and those related to the metric flow features?

@Jstein77
Copy link
Contributor Author

Jstein77 commented Aug 8, 2023

@fivetran-joemarkiewicz updated!

  • Added back metricflow_time_spine.sql. I tested adding metricflow_time_spine: false to the project configs and confirmed that this model is no longer getting parsed. (see screenshot)
  • Updated the README with instructions on how to disable metricflow_time_spine
  • Updated the CHANGELOG to call out that metricflow_time_spine will need to be disabled if you already have one in your project
  • Updated file extension from .yaml --> .yml
  • Move time spine model to semantic_models folder
  • Add semantic manifest docs to the README

@fivetran-joemarkiewicz One question, I'm using the dbt_date package to generate the time spine model. I noticed it's already included in the package, but i was getting an error when running dbt run --models metricflow_time_spine.
Screenshot 2023-08-08 at 2 58 33 PM

I added the "dbt_date:time_zone": "America/Los_Angeles" to the project vars, but thought it was a bit strange that we don't already have this set. Is there something i'm missing, or do we just need to pass in the timezone var to use this package?

Testing setting metricflow_time_spine: False
Screenshot 2023-08-08 at 2 45 46 PM

@fivetran-joemarkiewicz
Copy link
Contributor

Fantastic, thanks so much @Jstein77! Your changes so far look great, there are a few minor updates I would like to apply (particularly around the version of the package, docs generation, and some documentation updates). Would you be comfortable if I made those changes directly to your branch? If not, I can merge your PR into a staging branch where I will make these changes before merging to main.

Additionally in response to your question:

I added the "dbt_date:time_zone": "America/Los_Angeles" to the project vars, but thought it was a bit strange that we don't already have this set. Is there something i'm missing, or do we just need to pass in the timezone var to use this package?

  • dbt_date is only installed as a result of it being an upstream dependency of dbt_expectations which is a dependency of our dbt_google_ads_source package. Therefore, we never used the date functions in didn't need to define the timezone. I feel it would be fine to include as a variable in this package. However, I would request we do not make it a global variable, but rather under the ad_reporting hierarchy and add some documentation to alert users to modify this variable if they wish to. I can make that update in my updates commit if you are comfortable.

If you are comfortable with me making updates directly to your branch, I would like to bump you for a review from a Semantic layer point of view before we merge this into our next update. Would that work for you?

@Jstein77
Copy link
Contributor Author

Jstein77 commented Aug 9, 2023

@fivetran-joemarkiewicz feel free to make changes directly to the branch!

@Jstein77
Copy link
Contributor Author

Jstein77 commented Aug 9, 2023

@fivetran-joemarkiewicz these changes look good to me! I'm good to merge this one on my end.

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to go!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants