New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
371 Optionally use DBT compiler as the TemplaterInterface #508
Conversation
@sqlfluff/sqlfluff-maintainers FYI, I'm closing #422 in favor of this PR. I think this approach is much simpler and cleaner than continuously trying to extend the Jinja templater. It also allows us to separate these 2 different use cases, of SQL templated with Jinja (maybe in an Airflow project), from the specific DBT use case Currently it's a POC, but I would be interested in hearing your first thoughts on the approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is shaping up really nicely 😎
Have you tested with an actual dbt
project yet, and see how it performs?
setup.py
Outdated
@@ -102,6 +102,7 @@ def read(*names, **kwargs): | |||
"appdirs", | |||
], | |||
extras_require={ | |||
"dbt": ["dbt>=0.18.1"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be curious to see how this works with earlier versions of dbt
. If someone hasn't upgraded to the latest version (because of breaking changes/whatnot) then they wouldn't be able to take advantage of this new sqlfluff
feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but I tried on 0.17.2
and found that some methods I depend on (from the DBT package) had moved places.
Since the classes we use from the DBT project aren't an "official API", or something they expect users to call directly, we can't really expect them to be stable.
Should we agree on a subset of DBT versions to support, and have different tox environments for each DBT version?
We can start with 0.17 and 0.18
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah didn't realize that the function moved on you 🚚
How tedious do you would it be to support multiple versions? If it's a pain, then maybe we should keep this requirement in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the code yesterday to support both versions, wasn't too much of a pain :)
test/fixtures/dbt_project/README.md
Outdated
@@ -0,0 +1,15 @@ | |||
Welcome to your new dbt project! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't need this file right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove it :) @sethwoodworth also made a comment about slimming down the DBT project so I'll remove as much as possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really, really good. Super clean template implementation. When this is in I'd love to hear your thoughts on how we can do some dbt related rules which rely on understanding the built-in dbt jinja macros (source, ref)
@dmateusp - this is really interesting! Thanks a load for prototyping this approach, I know it's been spoken about quite a bit, but I'm really impressed to see it in action. THIS COULD WORK. Echoing the comments of a few users: what's this like to actually use? Have you tried it on one of your dbt projects and has it coped with all of the weird macros and suchlike? On more specific feedback [assuming that it feels nice to actually use as a developer]:
Awesome work though, and I think the most important thing here is what the experience of actually using it is like. |
@alanmcruickshank dataclasses were released in python 3.7 but are available via this backport |
@@ -0,0 +1,37 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about paring this down to the minimum required dbt_profile.yml to test the project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally! I'll try to make it as slim as possible
src/sqlfluff/core/templaters.py
Outdated
) | ||
|
||
def _get_project_dir(self, config): | ||
return config.get_section((self.templater_selector, self.name, "project_dir")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about defaulting project_dir
to the cwd
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, that was an oversight on my end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/sqlfluff/core/templaters.py
Outdated
from dbt.config.profile import PROFILES_DIR | ||
|
||
return ( | ||
config.get_section((self.templater_selector, self.name, "profiles_dir")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind running expanduser
on the string result? I provided ~/.dbt
and would have expected it to expand to my home dir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that's done and I added a test https://github.com/sqlfluff/sqlfluff/pull/508/files#diff-58329474e4097ccd65474c10c2b98b1221220cec75a03b2de4787d1f801d15d7R171
I love this approach!
And ended up with this error:
The issue looks like it's in the interface to the dbt |
@NiallRees about your comment around DBT rules for understanding Happy to take this discussion over to a separate issue |
Codecov Report
@@ Coverage Diff @@
## master #508 +/- ##
==========================================
- Coverage 94.01% 93.19% -0.82%
==========================================
Files 43 43
Lines 4875 4951 +76
==========================================
+ Hits 4583 4614 +31
- Misses 292 337 +45
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There's a thread going here would be glad for your thoughts |
@dmateusp - I've been thinking about plugin architectures (related to #164), and this PR came to mind. How would you feel about a collab where we work out how to knit these changes into sqlfluff as a plugin rather than directly into the main codebase? We could either merge these changes and then separately work on how to separate them out again, or I could put in place the hooks to attach plugins and then update this PR before merging. Do you have a preference? [or views on whether this would work as a plugin in the first place]. |
@alanmcruickshank I actually don't have much experience using the "plugin" approach in Python (apart from the Airflow approach to Operators/Hooks etc), I saw that you linked I am actually going to push some changes to this PR very soon, I had been working on adding testing and covering the issues I linked in the PR description. How do you want to collab on this? |
@alanmcruickshank If we go the plugin route with I feel like having an example plugin would also make implementing any future plugins more manageable, especially if the developer is new to |
89e324d
to
9902271
Compare
On reflection, let's get this merged, and then I can work on the plugin part separately. Doing both at the same time is going to be too hard I think. In particular I've just made a bunch of changes to the templating code to enable source mapping, which I think I should be able to layer on top of this, but might take some effort.
Yes probably - although it's possible that leads to some difficulties if we want other templaters (like this one) to depend on the Jinja one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is epic work. To mesh with #541 there's going to need to be a chunk of work, but it's going to much easier for me to do that post merge than do it on the fly.
Looking at how you fetch the injected_sql, I think it should be possible for me to layer all the other bits on the top.
I also would love to experiment with making this a plugin at some point, but I think that's also a job for another day too 👍 .
Thanks again for taking the time to work on this - I'm happy to merge it in it's current state. Feel free to convert out of draft when you're ready and we can work out merging.
@sethwoodworth Can you test your project again on the most recent changes on this branch? |
I've started testing it on my project and I've caught some issues that required some small changes, I should have the PR ready soon! (I'll convert the PR out of draft then) I've actually lost a bunch of time trying to ignore files that fail to compile in a project but @NiallRees reminded me that DBT needs to create a DAG of dependencies (hence the need to parse/compile nodes that aren't the one directly being compiled) I'd also like to add some documentation but I can do that in another PR if you're eager to merge it sooner |
57e2468
to
7de9049
Compare
def test__templater_dbt_handle_exceptions( | ||
in_dbt_project_dir, dbt_templater, fname, exception_msg # noqa | ||
): | ||
"""Test that exceptions during compilation are returned as violation.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙌🙌🙌
2bf461d
to
a237da4
Compare
@dmateusp - I just tested these locally using
Where the details are: First
Second
Third
Fourth
It looks like a few of them are to do with
With that in place I get the same errors but at least in the tox setup I see:
So it's clearly trying to run dbt deps, and apparently up to date. However if I look in Any ideas on why it's failing to install the dbt dependencies? |
@alanmcruickshank That is so strange! I tried to run the same command, and I tried removing the dbt_modules first with Note that Can you try removing the directory first? |
I think at least part of the issue is the use of unix filepaths instead of os.path which accommodates both unix and windows? Could be wrong though. Non-windows filepaths appear in a few places in the tests. (dusted off my old windows laptop and reproduced the same errors Alan was seeing) |
@NiallRees @dmateusp : I've made some progress on this one - I think it's to do with windows and makefiles. Will report back with one that works cross platform... |
@dmateusp - ok progress. I found some useful blog posts on this topic (in particular this one: https://blog.ionelmc.ro/2015/04/14/tox-tricks-and-patterns/), which suggested using some of the more advanced features of commands =
# For the dbt test cases install dependencies.
dbt017,dbt018: dbt deps --project-dir test/fixtures/dbt_project
# Clean up from previous tests
python util.py clean-tests
# Run tests
pytest -vv --cov {posargs:-m "not dbt"} It means the we run the deps command directly from the tox file, but only for the With that change in place, I still get one failing test (details below). It looks like as @NiallRees suggested, that it's a path naming issue. One way of fixing this very simply would be to strip or replace all of the slash characters in the error. I managed to get the tests to pass if I change the last line of # NB: Replace slashes to deal with different plaform paths being returned.
assert violations[0].desc().replace('\\', '/').startswith(exception_msg) @dmateusp - if these changes still work on your end - then I think we should be able to get this merged shortly! Details of original failure:
|
Nice one. I'll fix those std_test.py merge conflicts so we can move forward (as it was me that caused them!) |
CONTRIBUTING.md
Outdated
**For Windows users**: the tox environment depends on `make` to set up the dbt test folders. | ||
To do that we recommend using _chocolatey_. You can find the instructions to install chocolately here: https://chocolatey.org/install. | ||
Once chocolatey is installed you can use `choco install make` to install `make`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we remove the dependency on make
via the comments I suggested in the main convo, then we can remove this.
docs/source/configuration.rst
Outdated
dbt is not the default templater for *sqlfluff* (it is Jinja). In order | ||
to get started using *sqlfluff* with a dbt project you will need the following | ||
configuration: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a brief line or paragraph here on the performance implications and how to choose the right templater? My assumption here is that the choice between the jinja
and dbt
templaters is that there's a tradeoff between macro and function support and performance and simplicity. I'd love to help people make a more informed choice about which templater to use and why.
test/Makefile
Outdated
fixtures/dbt_project/dbt_modules: | ||
ifeq (,$(shell which dbt)) | ||
$(info "dbt was not found, skipping..") | ||
else | ||
cd fixtures/dbt_project && dbt deps | ||
endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this through my suggested changes to tox.ini
.
def test__templater_dbt_missing(dbt_templater): # noqa | ||
"""Check that a nice error is returned when dbt module is missing.""" | ||
try: | ||
import dbt # noqa: F401 | ||
|
||
pytest.skip(msg="dbt is installed") | ||
except ModuleNotFoundError: | ||
pass | ||
|
||
with pytest.raises(ModuleNotFoundError, match=r"pip install sqlfluff\[dbt\]"): | ||
dbt_templater.process( | ||
in_str="", | ||
fname="models/my_new_project/test.sql", | ||
config=FluffConfig(configs=DBT_FLUFF_CONFIG), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nice. I forgot to say so last time.
Just had a go - all the tests are still passing on my mac with these changes @alanmcruickshank. |
Background
We currently process templating for DBT projects the same way we template "plain" Jinja templated files (through Jinja templater).
However DBT projects are special because the DBT library itself extends the context of the Jinja rendering, and DBT uses a package system through which the scope of the templating can also be extended.
This leads to a couple of problems loading macros (check out the "Related Issues"), and leads us to extend the logic in the Jinja templater.
Proposal
I would like to add a new Templater specific to DBT.
If the users specify that the templater used should be
dbt
in their configs, they would need to have the DBT package installed (extra dependency). From there we can be sure that the templated SQL is the same, as we re-use the DBT compiler to compile the specific node.Related issues
extract_macros_from_config
afterextract_macros_from_path
instead of before #493 Problem with macro order using Jijnja Templater in DBT projectsNotes
Currently this is a POC, the logic needs to be refactored, but I wanted to show a working solution (this actually lints properly DBT models that couldn't be linted with the Jinja templater)