Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing of manifest.json as well as better yaml parsing support / alias support #19

Merged
merged 60 commits into from
Jun 30, 2021

Conversation

z3z1ma
Copy link
Contributor

@z3z1ma z3z1ma commented Jun 13, 2021

Picking up the ball where fernandobrito left off. I have a major enhancement (one of two in the works). Lots of cleaned up code, bug fixes from the other fork, personally tested on my own production replicates and internally at my company, etc. There may be some more changes/commits I add (to readme mostly) but I just wanted to post it to get your eyes and awareness. The other PR will involve auto documenting all exposures revealed in metabase through questions / dashboards and constructing a YML that allows us to sync a DAG with metabase fully including click throughs from dbt docs exposures directly to dashboards.

falador_wiz1 and others added 16 commits June 12, 2021 14:06
…ersion of dbt-metabase having improved yaml parsing. large formatting update to conform to black.
…special or semantic until ready to deprecate
…e properly used if found in metabase api response.
… without depends on/test_metadata dont throw
…xpected input format to be defined in readme. otherwise automatic resolution of target field using relation test will prepend target run schema which should be fine in 95% of use cases. cases outside that can use manifest.json parsing or set fk_ref in yml.
…d ensured support for schema agnostic fk targets (schema resolved from manifest.json)
…sync will now only fail hard if timeout is explicit, otherwise default behaviour if --sync is true is to attempt sync for 30 seconds and proceed with aligning what can be aligned successfully. more formatting and a few comments for clarity of intent. also added option to pass custom cert bundle to verify.
…usly with seemless function alongside primary artifact parser (manifest.json).
…f regex to permit catching last arg of either ref or source always being the target table. if pointing to an alias, we are collecting aliases during yml parsing to be passed to metabased client and translated to metabase table names as needed. this functionality should be unnoticed by the user but provide more resiliency as well as more user friendly outcome whilst still being very specific in our logging.
@fernandobrito
Copy link
Contributor

Hi @z3z1ma, great that you picked up the work of merging my branch, thanks! I was away for a month on vacation and now that I'm back there are so many other priorities in my company that I'm not sure when I would be able to come back to my contribution here. Let me know if you or @gouline have any questions on any of the code that I wrote.

@fernandobrito
Copy link
Contributor

fernandobrito commented Jun 13, 2021

On a side note, I'm very curious about your next PR where you mention "involve auto documenting all exposures revealed in metabase". Over the next 2 weeks, I will open-source an internal Python project from my company on doing exactly the same, but for Tableau dashboards. I use the Tableau Metadata API to extract custom SQL and dependencies from reports to tables in the database, then I read all dbt models and sources from the manifest.json and try to match Tableau dashboards to dbt models/sources. After that, I write those dependencies back in the manifest.json as exposures.

I'm naming the Python package "exposures-crawler" because I wanted at some point in the future to include Metabase and other clients as well. I will write here once the package is public, but I already wrote a tiny bit about it on the dbt Slack community: https://getdbt.slack.com/archives/C0VLNUUTZ/p1617012523148700. If according to the author of dbt-metabase, your exposures automation is out of scope, we could discuss teaming up together and merging our 2 projects.

constructing a YML that allows us to sync a DAG with metabase fully including click throughs from dbt docs exposures directly to dashboards.

That last part I didn't really understand.

Copy link
Owner

@gouline gouline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a first pass, I'll let you two discuss amongst yourselves any existing comments and rebase the fork properly so that diffs are based on the existing state of the master branch (and not weeks back).

Also, this marks a significant increase in code, so before it gets merged, we need unit tests. Happy to instrument execution in GitHub Actions myself, but the author of the code will be responsible for the actual test cases.

EDIT: Unit testing structure, linting and type checking now in place. Please rebase against master and make sure all of that passes for this pull request before writing tests.

README.rst Outdated Show resolved Hide resolved
@z3z1ma
Copy link
Contributor Author

z3z1ma commented Jun 13, 2021

constructing a YML that allows us to sync a DAG with metabase fully including click throughs from dbt docs exposures directly to dashboards.

That last part I didn't really understand.

Hey @fernandobrito

To clarify, I mean the dbt docs can also serve as a index which links users directly to the exposures that have been parsed. The reverse essentially of what you did adding dbt doc url appending to metabase decription. There is actually a button (see below screencap) whose link is derived from the exposures yaml.

image

On a side note, I'm very curious about your next PR where you mention "involve auto documenting all exposures revealed in metabase". Over the next 2 weeks, I will open-source an internal Python project from my company on doing exactly the same, but for Tableau dashboards. I use the Tableau Metadata API to extract custom SQL and dependencies from reports to tables in the database, then I read all dbt models and sources from the manifest.json and try to match Tableau dashboards to dbt models/sources. After that, I write those dependencies back in the manifest.json as exposures.

Are you using sqlparse? I've used just regex myself to extract (table level) exposures from native sql questions in metabase with pretty good success actually. I keep the native sql use in metabase low opting instead for better models which allow users to answer questions using just the GUI and tools. In a previous go around there was a lot of native SQL I allowed myself and my users in metabase, but it is definitely not ideal and dbt has been a gamechanger there in how quickly I was able to simplify our extraction of value from the underlying data.

I'm naming the Python package "exposures-crawler" because I wanted at some point in the future to include Metabase and other clients as well. I will write here once the package is public, but I already wrote a tiny bit about it on the dbt Slack community: https://getdbt.slack.com/archives/C0VLNUUTZ/p1617012523148700. If according to the author of dbt-metabase, your exposures automation is out of scope, we could discuss teaming up together and merging our 2 projects.

I am curious on yours too. My implementation involves constructing a yaml (we can call it metabase-exposures.yml in root model dir for example) over manipulating manifest.json (though I will certainly read from the manifest). I am taking this approach because it doesn't involve continuously executing a script or an additional step in CI/CD every single time dbt is invoked and the manifest.json is reconstructed. Instead a yaml is very readable, introspectable, and all subsequent dbt runs will add exposures to manifest implicitly without a post-run dependency. It also lets us revision and commit the yaml to source control too for what that's worth. I'd guarantee there an intersect of our interests and implementations that would be beneficial though so keep me posted. The end result of both of our methods is (should be) a manifest/doc site with perfectly documented exposures.

@z3z1ma
Copy link
Contributor Author

z3z1ma commented Jun 27, 2021

@gouline

I think the ball is in your court now.

Note that I already address empty strings referred to by #23
We only pass descriptions in 2 places, exporting columns and exporting tables. For tables, a simple logical check since without a description, there is nothing to propagate. And for columns, a simple ternary statement that checks for empty string.
I think in the future, @fernandobrito approach would be slick:

It's a descriptor (https://realpython.com/python-descriptors/) to be used in class attributes that automatically convert empty strings to None.

but for now we are ok.

The other issue raised in #24 I think should be examined and integrated if it addresses the issue after this PR so we dont step on our toes. Once thats done we can integrate changes easily I think.

README.rst Outdated Show resolved Hide resolved
dbtmetabase/__init__.py Outdated Show resolved Hide resolved
dbtmetabase/__init__.py Outdated Show resolved Hide resolved
dbtmetabase/parsers/dbt_folder.py Outdated Show resolved Hide resolved
dbtmetabase/parsers/dbt_folder.py Outdated Show resolved Hide resolved
dbtmetabase/parsers/dbt_folder.py Outdated Show resolved Hide resolved
dbtmetabase/parsers/dbt_manifest.py Outdated Show resolved Hide resolved
dbtmetabase/parsers/dbt_manifest.py Outdated Show resolved Hide resolved
Copy link
Owner

@gouline gouline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just need you to resolve a merge conflict from a minor PR merged in the meantime and we're good to go.

@gouline
Copy link
Owner

gouline commented Jun 29, 2021

You renamed it in the wrong direction. Master has requirements-test.txt in the filename, setup and CI config.

@z3z1ma
Copy link
Contributor Author

z3z1ma commented Jun 29, 2021

You renamed it in the wrong direction. Master has requirements-test.txt in the filename, setup and CI config.

Thanks correcting now.

@gouline
Copy link
Owner

gouline commented Jun 29, 2021

Looks like you're still editing. Let me know when it's ready for review.

@z3z1ma
Copy link
Contributor Author

z3z1ma commented Jun 29, 2021

Its good to go @gouline,

Freshly ran this branch on my production instances of Metabase/dbt too with both manifest and yml parser. Super easy. Good stuff. We are fully integrated into CD flow whenever we push any changes to dbt repo, our mb is synchronized. Onto posting my exposure parser PR once this is merged.

…t of implicit schema handling provided by manifest.
@gouline
Copy link
Owner

gouline commented Jun 30, 2021

I see you pushed another commit after your response, but I'll assume that's it.

Thank you so much @z3z1ma and @fernandobrito for your efforts. I'm merging this and keeping it in master while we test it and merge other pull requests as they come along (there's one open now with a fix for an issue in master). Once that's all done, I will push a release to PyPI.

@fernandobrito
Copy link
Contributor

Amazing work!

I will also try to find some time at work tomorrow to run what you have now on master on our production dbt, as so far I have been running my fork on our CD pipeline.

@fernandobrito
Copy link
Contributor

fernandobrito commented Jul 6, 2021

I was able to swap my production setup from using my fork to using master and had no issues, with the only exception that on master either schema or schema_excludes is required. I wrote about it here: https://github.com/gouline/dbt-metabase/pull/28/files#diff-399311141fa635ad8842613b2e8d964a08594d589911c6d1c0d5dc7d39bc44e9L72

@gouline gouline added this to the v0.8.x milestone Jul 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants