Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

371 Optionally use DBT compiler as the TemplaterInterface #508

Merged
merged 59 commits into from Nov 25, 2020

Conversation

dmateusp
Copy link
Contributor

@dmateusp dmateusp commented Oct 26, 2020

Background

We currently process templating for DBT projects the same way we template "plain" Jinja templated files (through Jinja templater).
However DBT projects are special because the DBT library itself extends the context of the Jinja rendering, and DBT uses a package system through which the scope of the templating can also be extended.

This leads to a couple of problems loading macros (check out the "Related Issues"), and leads us to extend the logic in the Jinja templater.

Proposal

I would like to add a new Templater specific to DBT.

If the users specify that the templater used should be dbt in their configs, they would need to have the DBT package installed (extra dependency). From there we can be sure that the templated SQL is the same, as we re-use the DBT compiler to compile the specific node.

Related issues

Notes

Currently this is a POC, the logic needs to be refactored, but I wanted to show a working solution (this actually lints properly DBT models that couldn't be linted with the Jinja templater)

@dmateusp
Copy link
Contributor Author

@sqlfluff/sqlfluff-maintainers FYI, I'm closing #422 in favor of this PR. I think this approach is much simpler and cleaner than continuously trying to extend the Jinja templater.

It also allows us to separate these 2 different use cases, of SQL templated with Jinja (maybe in an Airflow project), from the specific DBT use case

Currently it's a POC, but I would be interested in hearing your first thoughts on the approach.

Copy link
Member

@pwildenhain pwildenhain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is shaping up really nicely 😎
Have you tested with an actual dbt project yet, and see how it performs?

setup.py Outdated
@@ -102,6 +102,7 @@ def read(*names, **kwargs):
"appdirs",
],
extras_require={
"dbt": ["dbt>=0.18.1"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be curious to see how this works with earlier versions of dbt. If someone hasn't upgraded to the latest version (because of breaking changes/whatnot) then they wouldn't be able to take advantage of this new sqlfluff feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but I tried on 0.17.2 and found that some methods I depend on (from the DBT package) had moved places.

Since the classes we use from the DBT project aren't an "official API", or something they expect users to call directly, we can't really expect them to be stable.

Should we agree on a subset of DBT versions to support, and have different tox environments for each DBT version?

We can start with 0.17 and 0.18

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah didn't realize that the function moved on you 🚚

How tedious do you would it be to support multiple versions? If it's a pain, then maybe we should keep this requirement in there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the code yesterday to support both versions, wasn't too much of a pain :)

src/sqlfluff/core/templaters.py Show resolved Hide resolved
@@ -0,0 +1,15 @@
Welcome to your new dbt project!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't need this file right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove it :) @sethwoodworth also made a comment about slimming down the DBT project so I'll remove as much as possible

Copy link
Member

@NiallRees NiallRees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really, really good. Super clean template implementation. When this is in I'd love to hear your thoughts on how we can do some dbt related rules which rely on understanding the built-in dbt jinja macros (source, ref)

@alanmcruickshank
Copy link
Member

@dmateusp - this is really interesting! Thanks a load for prototyping this approach, I know it's been spoken about quite a bit, but I'm really impressed to see it in action.

THIS COULD WORK.

Echoing the comments of a few users: what's this like to actually use? Have you tried it on one of your dbt projects and has it coped with all of the weird macros and suchlike?

On more specific feedback [assuming that it feels nice to actually use as a developer]:

  • I love the approach of a different templater. Very neat - and I totally hadn't thought of that as an option. ✔️
  • I like the way it's installed as an extra - that keeps dependencies sensible. ✔️
  • I didn't think that dataclass was available until python 3.7, so I'm suprised that this works, but if we can get the tests to pass then that's awesome. 🤷
  • On the test environments, I'd love to have some environments with the dbt extra and some without, to make sure we don't break the vanilla experience. I don't know if having all the python versions with and without dbt makes sense, but I'd like at least one python version to be tested with and without the extra (and for the test suite to behave nicely in both cases). 🔬
  • I think we need to slim down the example dbt project used in the test suite - unless it's necessary for the test I think we should make it as slim as possible. README.md and unnecessary settings in the dbt_project.yml file included. ✂️
  • If it works, then we should write some docs on how to use it! My guess is that this was always part of the plan, but I think that's required before we merge it. 📜

Awesome work though, and I think the most important thing here is what the experience of actually using it is like.

@sethwoodworth
Copy link
Contributor

@alanmcruickshank dataclasses were released in python 3.7 but are available via this backport

@@ -0,0 +1,37 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about paring this down to the minimum required dbt_profile.yml to test the project?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally! I'll try to make it as slim as possible

)

def _get_project_dir(self, config):
return config.get_section((self.templater_selector, self.name, "project_dir"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about defaulting project_dir to the cwd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, that was an oversight on my end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from dbt.config.profile import PROFILES_DIR

return (
config.get_section((self.templater_selector, self.name, "profiles_dir"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind running expanduser on the string result? I provided ~/.dbt and would have expected it to expand to my home dir.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sethwoodworth
Copy link
Contributor

I love this approach!
I build a wheel from your branch and am trying it out on a real world project.
(strangely, the api directory didn't end up in the wheel?)
I configured sqlfluff with a .sqlfluff file in the root of my dbt project:


[sqlfluff:templater:dbt]
project_dir = ./
profile = snowflake_user

And ended up with this error:

$ sqlfluff lint models/dealerquotes/dealerquotes_user_session_stats/dealerquotes_merged.sql
Traceback (most recent call last):
 File "/home/seth/.local/bin/sqlfluff", line 8, in <module>
   sys.exit(cli())
 File "/home/seth/.local/pipx/venvs/sqlfluff/lib/python3.8/site-packages/click/core.py", line 829, in __call__
   return self.main(*args, **kwargs)
 File "/home/seth/.local/pipx/venvs/sqlfluff/lib/python3.8/site-packages/click/core.py", line 782, in main
   rv = self.invoke(ctx)
 File "/home/seth/.local/pipx/venvs/sqlfluff/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
   return _process_result(sub_ctx.command.invoke(sub_ctx))
 File "/home/seth/.local/pipx/venvs/sqlfluff/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
   return ctx.invoke(self.callback, **ctx.params)
 File "/home/seth/.local/pipx/venvs/sqlfluff/lib/python3.8/site-packages/click/core.py", line 610, in invoke
   return callback(*args, **kwargs)
 File "/home/seth/.local/pipx/venvs/sqlfluff/lib/python3.8/site-packages/sqlfluff/cli/commands.py", line 315, in lint
   result = lnt.lint_paths(paths, ignore_non_existent_files=False)
 File "/home/seth/.local/pipx/venvs/sqlfluff/lib/python3.8/site-packages/sqlfluff/core/linter.py", line 969, in lint_paths
   self.lint_path(
 File "/home/seth/.local/pipx/venvs/sqlfluff/lib/python3.8/site-packages/sqlfluff/core/linter.py", line 952, in lint_path
   self.lint_string(
 File "/home/seth/.local/pipx/venvs/sqlfluff/lib/python3.8/site-packages/sqlfluff/core/linter.py", line 754, in lint_string
   parsed, vs, time_dict = self.parse_string(s=s, fname=fname, config=config)
 File "/home/seth/.local/pipx/venvs/sqlfluff/lib/python3.8/site-packages/sqlfluff/core/linter.py", line 646, in parse_string
   s, templater_violations = self.templater.process(
 File "/home/seth/.local/pipx/venvs/sqlfluff/lib/python3.8/site-packages/sqlfluff/core/templaters.py", line 524, in process
   raise RuntimeError("File %s was not found in dbt project" % fname)
RuntimeError: File models/dealerquotes/dealerquotes_user_session_stats/dealerquotes_merged.sql was not found in dbt project

The issue looks like it's in the interface to the dbt PathSelectorMethod but I haven't dug deep enough to understand why yet

@dmateusp
Copy link
Contributor Author

@NiallRees about your comment around DBT rules for understanding ref, source etc, my current understanding is that we would need to extend our parser to understand SQL files "pre-templating".
Then I would guess rules would have to specify if they apply to the SQL file pre or post templating.

Happy to take this discussion over to a separate issue

@codecov
Copy link

codecov bot commented Oct 29, 2020

Codecov Report

Merging #508 (57e2468) into master (2c79f05) will decrease coverage by 0.81%.
The diff coverage is 43.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #508      +/-   ##
==========================================
- Coverage   94.01%   93.19%   -0.82%     
==========================================
  Files          43       43              
  Lines        4875     4951      +76     
==========================================
+ Hits         4583     4614      +31     
- Misses        292      337      +45     
Flag Coverage Δ
dbt017-py39 92.93% <37.50%> (?)
dbt018-py39 92.93% <37.50%> (?)
py36 93.07% <42.30%> (-0.84%) ⬇️
py37 93.07% <42.30%> (-0.84%) ⬇️
py38 93.03% <43.75%> (-0.82%) ⬇️
py39 93.03% <43.75%> (-0.82%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/sqlfluff/core/linter.py 88.04% <ø> (ø)
src/sqlfluff/core/templaters.py 77.33% <43.75%> (-18.64%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e26a042...1327d88. Read the comment docs.

@NiallRees
Copy link
Member

@NiallRees about your comment around DBT rules for understanding ref, source etc, my current understanding is that we would need to extend our parser to understand SQL files "pre-templating".
Then I would guess rules would have to specify if they apply to the SQL file pre or post templating.

Happy to take this discussion over to a separate issue

There's a thread going here would be glad for your thoughts

@alanmcruickshank
Copy link
Member

@dmateusp - I've been thinking about plugin architectures (related to #164), and this PR came to mind. How would you feel about a collab where we work out how to knit these changes into sqlfluff as a plugin rather than directly into the main codebase?

We could either merge these changes and then separately work on how to separate them out again, or I could put in place the hooks to attach plugins and then update this PR before merging. Do you have a preference? [or views on whether this would work as a plugin in the first place].

@pwildenhain pwildenhain added this to In progress in Major Release: 0.4.0 Nov 9, 2020
@dmateusp
Copy link
Contributor Author

@alanmcruickshank I actually don't have much experience using the "plugin" approach in Python (apart from the Airflow approach to Operators/Hooks etc), I saw that you linked pluggy which I'll look into.

I am actually going to push some changes to this PR very soon, I had been working on adding testing and covering the issues I linked in the PR description.

How do you want to collab on this?

@pwildenhain
Copy link
Member

pwildenhain commented Nov 10, 2020

@alanmcruickshank If we go the plugin route with dbt should we do the same with jinja2? Then any new templating libraries that come along are just a plugin away from being sqlfluff-able

I feel like having an example plugin would also make implementing any future plugins more manageable, especially if the developer is new to pluggy

@alanmcruickshank
Copy link
Member

I am actually going to push some changes to this PR very soon, I had been working on adding testing and covering the issues I linked in the PR description.

How do you want to collab on this?

On reflection, let's get this merged, and then I can work on the plugin part separately. Doing both at the same time is going to be too hard I think. In particular I've just made a bunch of changes to the templating code to enable source mapping, which I think I should be able to layer on top of this, but might take some effort.

@alanmcruickshank If we go the plugin route with dbt should we do the same with jinja2? Then any new templating libraries that come along are just a plugin away from being sqlfluff-able

Yes probably - although it's possible that leads to some difficulties if we want other templaters (like this one) to depend on the Jinja one.

Copy link
Member

@alanmcruickshank alanmcruickshank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is epic work. To mesh with #541 there's going to need to be a chunk of work, but it's going to much easier for me to do that post merge than do it on the fly.

Looking at how you fetch the injected_sql, I think it should be possible for me to layer all the other bits on the top.

I also would love to experiment with making this a plugin at some point, but I think that's also a job for another day too 👍 .

Thanks again for taking the time to work on this - I'm happy to merge it in it's current state. Feel free to convert out of draft when you're ready and we can work out merging.

@pwildenhain
Copy link
Member

@sethwoodworth Can you test your project again on the most recent changes on this branch?

@dmateusp
Copy link
Contributor Author

I've started testing it on my project and I've caught some issues that required some small changes, I should have the PR ready soon! (I'll convert the PR out of draft then)

I've actually lost a bunch of time trying to ignore files that fail to compile in a project but @NiallRees reminded me that DBT needs to create a DAG of dependencies (hence the need to parse/compile nodes that aren't the one directly being compiled)

I'd also like to add some documentation but I can do that in another PR if you're eager to merge it sooner

@dmateusp dmateusp force-pushed the 371_optionally_use_dbt_compiler branch from 57e2468 to 7de9049 Compare November 14, 2020 23:45
def test__templater_dbt_handle_exceptions(
in_dbt_project_dir, dbt_templater, fname, exception_msg # noqa
):
"""Test that exceptions during compilation are returned as violation."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌🙌🙌

@dmateusp dmateusp marked this pull request as ready for review November 15, 2020 12:41
@alanmcruickshank
Copy link
Member

@dmateusp - I just tested these locally using tox -r -e dbt017-py38 -- -m "dbt" and I get four failures:

========================================================================================================== short test summary info ===========================================================================================================
FAILED test/core/templaters_test.py::test__templater_dbt_templating_result[use_dbt_utils.sql] - TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
FAILED test/core/templaters_test.py::test__templater_dbt_handle_exceptions[compiler_error.sql-dbt compilation error on file 'models/my_new_project/compiler_error.sql', Unexpected end of template. Jinja was looking for the following tags: 'endfor']
FAILED test/core/templaters_test.py::test__templater_dbt_handle_exceptions[exception_connect_database.sql-dbt tried to connect to the database] - assert False
FAILED test/core/rules/std_test.py::test__rules__std_file_dbt[L021-models/my_new_project/select_distinct_group_by.sql-violations0] - sqlfluff.core.errors.SQLTemplaterError: dbt compilation error on file 'models\my_new_project\select_dis...

Where the details are:

First
__________________________________________________________________________________________ test__templater_dbt_templating_result[use_dbt_utils.sql] __________________________________________________________________________________________

in_dbt_project_dir = None, dbt_templater = <sqlfluff.core.templaters.DbtTemplateInterface object at 0x051C0B68>, fname = 'use_dbt_utils.sql'

    @pytest.mark.parametrize(
        "fname",
        [
            # dbt_utils
            "use_dbt_utils.sql",
            # macro calling another macro
            "macro_in_macro.sql",
            # config.get(...)
            "use_headers.sql",
            # var(...)
            "use_var.sql",
        ],
    )
    @pytest.mark.dbt
    def test__templater_dbt_templating_result(
        in_dbt_project_dir, dbt_templater, fname  # noqa
    ):
        """Test that input sql file gets templated into output sql file."""
        outstr, _ = dbt_templater.process(
            in_str="",
            fname="models/my_new_project/" + fname,
            config=FluffConfig(configs=DBT_FLUFF_CONFIG),
        )
        # the dbt compiler gets rid of new lines
>       assert outstr + "\n" == open("../dbt/" + fname).read()
E       TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

C:\Users\alan\dev3\sqlfluff\test\core\templaters_test.py:174: TypeError
Second
_____________ test__templater_dbt_handle_exceptions[compiler_error.sql-dbt compilation error on file 'models/my_new_project/compiler_error.sql', Unexpected end of template. Jinja was looking for the following tags: 'endfor'] _____________

in_dbt_project_dir = None, dbt_templater = <sqlfluff.core.templaters.DbtTemplateInterface object at 0x06D98EB0>, fname = 'compiler_error.sql'
exception_msg = "dbt compilation error on file 'models/my_new_project/compiler_error.sql', Unexpected end of template. Jinja was looking for the following tags: 'endfor'"

    @pytest.mark.parametrize(
        "fname,exception_msg",
        [
            ("compiler_error.sql", "dbt compilation error on file 'models/my_new_project/compiler_error.sql', Unexpected end of template. Jinja was looking for the following tags: 'endfor'"),
            ("exception_connect_database.sql", "dbt tried to connect to the database"),
        ]
    )
    @pytest.mark.dbt
    def test__templater_dbt_handle_exceptions(
        in_dbt_project_dir, dbt_templater, fname, exception_msg  # noqa
    ):
        """Test that exceptions during compilation are returned as violation."""
        from dbt.adapters.factory import get_adapter

        src_fpath = "../dbt/error_models/" + fname
        target_fpath = "models/my_new_project/" + fname
        # We move the file that throws an error in and out of the project directory
        # as dbt throws an error if a node fails to parse while computing the DAG
        os.rename(src_fpath, target_fpath)
        try:
            _, violations = dbt_templater.process(
                in_str="",
                fname=target_fpath,
                config=FluffConfig(configs=DBT_FLUFF_CONFIG),
            )
        finally:
            get_adapter(dbt_templater.dbt_config).connections.release()
            os.rename(target_fpath, src_fpath)
        assert violations
>       assert violations[0].desc().startswith(exception_msg)
E       assert False
E        +  where False = <built-in method startswith of str object at 0x06D95240>("dbt compilation error on file 'models/my_new_project/compiler_error.sql', Unexpected end of template. Jinja was looking for the following tags: 'endfor'") E        +    where <built-in method startswith of str object at 0x06D95240> = "dbt compilation error on file 'models\\my_new_project\\compiler_error.sql', Unexpected end of template. Jinja was loo...the following tags: 'endfor' or 'else'. The innermost block that needs to be closed is 'for'.\n  line 5\n    {{ col }}".startswith
E        +      where "dbt compilation error on file 'models\\my_new_project\\compiler_error.sql', Unexpected end of template. Jinja was loo...the following tags: 'endfor' or 'else'. The innermost block that needs to be closed is 'for'.\n  line 5\n    {{ col }}" = <bound method SQLBaseError.desc of SQLTemplaterError("dbt compilation error on file 'models\\my_new_project\\compiler_...e following tags: 'endfor' or 'else'. The innermost block that needs to be closed is 'for'.\n  line 5\n    {{ col }}")>()
E        +        where <bound method SQLBaseError.desc of SQLTemplaterError("dbt compilation error on file 'models\\my_new_project\\compiler_...e following tags: 'endfor' or 'else'. The innermost block that needs to be closed is 'for'.\n  line 5\n    {{ col }}")> = SQLTemplaterError("dbt compilation error on file 'models\\my_new_project\\compiler_error.sql', Unexpected end of templ...he following tags: 'endfor' or 'else'. The innermost block that needs to be closed is 'for'.\n  line 5\n    {{ col }}").desc

C:\Users\alan\dev3\sqlfluff\test\core\templaters_test.py:221: AssertionError
Third
_________________________________________________________________ test__templater_dbt_handle_exceptions[exception_connect_database.sql-dbt tried to connect to the database] _________________________________________________________________

in_dbt_project_dir = None, dbt_templater = <sqlfluff.core.templaters.DbtTemplateInterface object at 0x06DA1358>, fname = 'exception_connect_database.sql', exception_msg = 'dbt tried to connect to the database'

    @pytest.mark.parametrize(
        "fname,exception_msg",
        [
            ("compiler_error.sql", "dbt compilation error on file 'models/my_new_project/compiler_error.sql', Unexpected end of template. Jinja was looking for the following tags: 'endfor'"),
            ("exception_connect_database.sql", "dbt tried to connect to the database"),
        ]
    )
    @pytest.mark.dbt
    def test__templater_dbt_handle_exceptions(
        in_dbt_project_dir, dbt_templater, fname, exception_msg  # noqa
    ):
        """Test that exceptions during compilation are returned as violation."""
        from dbt.adapters.factory import get_adapter

        src_fpath = "../dbt/error_models/" + fname
        target_fpath = "models/my_new_project/" + fname
        # We move the file that throws an error in and out of the project directory
        # as dbt throws an error if a node fails to parse while computing the DAG
        os.rename(src_fpath, target_fpath)
        try:
            _, violations = dbt_templater.process(
                in_str="",
                fname=target_fpath,
                config=FluffConfig(configs=DBT_FLUFF_CONFIG),
            )
        finally:
            get_adapter(dbt_templater.dbt_config).connections.release()
            os.rename(target_fpath, src_fpath)
        assert violations
>       assert violations[0].desc().startswith(exception_msg)
E       assert False
E        +  where False = <built-in method startswith of str object at 0x06C345F8>('dbt tried to connect to the database')
E        +    where <built-in method startswith of str object at 0x06C345F8> = "dbt compilation error on file 'models\\my_new_project\\exception_connect_database.sql', 'dbt_utils' is undefined".startswith
E        +      where "dbt compilation error on file 'models\\my_new_project\\exception_connect_database.sql', 'dbt_utils' is undefined" = <bound method SQLBaseError.desc of SQLTemplaterError("dbt compilation error on file 'models\\my_new_project\\exception_connect_database.sql', 'dbt_utils' is undefined")>()
E        +        where <bound method SQLBaseError.desc of SQLTemplaterError("dbt compilation error on file 'models\\my_new_project\\exception_connect_database.sql', 'dbt_utils' is undefined")> = SQLTemplaterError("dbt compilation error on file 'models\\my_new_project\\exception_connect_database.sql', 'dbt_utils' is undefined").desc

C:\Users\alan\dev3\sqlfluff\test\core\templaters_test.py:221: AssertionError
Fourth
_______________________________________________________________________ test__rules__std_file_dbt[L021-models/my_new_project/select_distinct_group_by.sql-violations0] _______________________________________________________________________

rule = 'L021', path = 'models/my_new_project/select_distinct_group_by.sql', violations = [(1, 8)], in_dbt_project_dir = None

    @pytest.mark.dbt
    @pytest.mark.parametrize(
        "rule,path,violations",
        [
            # Group By
            ("L021", "models/my_new_project/select_distinct_group_by.sql", [(1, 8)]),
        ]
    )
    def test__rules__std_file_dbt(rule, path, violations, in_dbt_project_dir):  # noqa
        """Test the linter finds the given errors in (and only in) the right places (DBT)."""
>       assert_rule_raises_violations_in_file(
            rule=rule,
            fpath=path,
            violations=violations,
            fluff_config=FluffConfig(configs=DBT_FLUFF_CONFIG, overrides=dict(rules=rule)),
        )

C:\Users\alan\dev3\sqlfluff\test\core\rules\std_test.py:848:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
C:\Users\alan\dev3\sqlfluff\test\core\rules\std_test.py:59: in assert_rule_raises_violations_in_file
    assert set(lnt.check_tuples()) == {(rule, v[0], v[1]) for v in violations}
c:\users\alan\dev3\sqlfluff\src\sqlfluff\core\linter.py:414: in check_tuples
    tuple_buffer += file.check_tuples()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = LintedFile(path='models\\my_new_project\\select_distinct_group_by.sql', violations=[SQLTemplaterError("dbt compilation...sk=('select distinct\n    a,\n    b,\n    c\nfrom table_a\n{{ dbt_utils.group_by(3) }}\n', None, None), ignore_mask=[])

    def check_tuples(self):
        """Make a list of check_tuples.

        This assumes that all the violations found are
        linting violations (and therefore implement `check_tuple()`).
        If they don't then this function raises that error.
        """
        vs = []
        for v in self.get_violations():
            if hasattr(v, "check_tuple"):
                vs.append(v.check_tuple())
            else:
>               raise v
E               sqlfluff.core.errors.SQLTemplaterError: dbt compilation error on file 'models\my_new_project\select_distinct_group_by.sql', 'dbt_utils' is undefined

c:\users\alan\dev3\sqlfluff\src\sqlfluff\core\linter.py:50: SQLTemplaterError

It looks like a few of them are to do with dbt_utils being missing. I've taken a look at the Make script and it's currently not working on my machine, but if I change it to the following it appears to work.

fixtures/dbt_project/dbt_modules:
ifeq (,$(WHERE /q dbt))
	$(echo "dbt was not found, skipping..")
else
	cd fixtures/dbt_project && dbt deps
endif

With that in place I get the same errors but at least in the tox setup I see:

dbt017-py38 run-test-pre: PYTHONHASHSEED='762'
dbt017-py38 run-test: commands[0] | make -C test/ fixtures/dbt_project/dbt_modules
make: Entering directory 'C:/Users/alan/dev3/sqlfluff/test'
make: 'fixtures/dbt_project/dbt_modules' is up to date.
make: Leaving directory 'C:/Users/alan/dev3/sqlfluff/test'
dbt017-py38 run-test: commands[1] | python util.py clean-tests
Removed '.test-reports'...
Created '.test-reports'
dbt017-py38 run-test: commands[2] | pytest -vv --cov -m dbt

So it's clearly trying to run dbt deps, and apparently up to date.

However if I look in fixtures/dbt_project there is no dbt_modules folder. 😢

Any ideas on why it's failing to install the dbt dependencies?

@dmateusp
Copy link
Contributor Author

@alanmcruickshank That is so strange! I tried to run the same command, and I tried removing the dbt_modules first with rm -rf test/fixtures/dbt_project/dbt_modules, in both cases the tests pass

Note that make: 'fixtures/dbt_project/dbt_modules' is up to date. indicates that the Makefile found a directory fixtures/dbt_project/dbt_modules so its "target" is assumed to be up to date. This is not DBT saying "my dbt_utils package is up to date"

Can you try removing the directory first?

@dmateusp dmateusp added the dbt Related to Data Build Tool label Nov 22, 2020
@NiallRees
Copy link
Member

NiallRees commented Nov 22, 2020

@alanmcruickshank That is so strange! I tried to run the same command, and I tried removing the dbt_modules first with rm -rf test/fixtures/dbt_project/dbt_modules, in both cases the tests pass

Note that make: 'fixtures/dbt_project/dbt_modules' is up to date. indicates that the Makefile found a directory fixtures/dbt_project/dbt_modules so its "target" is assumed to be up to date. This is not DBT saying "my dbt_utils package is up to date"

Can you try removing the directory first?

I think at least part of the issue is the use of unix filepaths instead of os.path which accommodates both unix and windows? Could be wrong though.

https://github.com/sqlfluff/sqlfluff/pull/508/files#diff-58329474e4097ccd65474c10c2b98b1221220cec75a03b2de4787d1f801d15d7R170

Non-windows filepaths appear in a few places in the tests.

(dusted off my old windows laptop and reproduced the same errors Alan was seeing)

@alanmcruickshank
Copy link
Member

@NiallRees @dmateusp : I've made some progress on this one - I think it's to do with windows and makefiles. Will report back with one that works cross platform...

@alanmcruickshank
Copy link
Member

@dmateusp - ok progress. I found some useful blog posts on this topic (in particular this one: https://blog.ionelmc.ro/2015/04/14/tox-tricks-and-patterns/), which suggested using some of the more advanced features of tox to handle this case rather than using Make. With that in mind I managed to get most of the tests to pass by changing commands section of the tox.ini file to:

commands =
    # For the dbt test cases install dependencies.
    dbt017,dbt018: dbt deps --project-dir test/fixtures/dbt_project
    # Clean up from previous tests
    python util.py clean-tests
    # Run tests
    pytest -vv --cov {posargs:-m "not dbt"}

It means the we run the deps command directly from the tox file, but only for the dbt017 & dbt018 environments. It also removes the dependency on make and reduces the number of moving parts in running different tests. The other bonus is that if the deps step fails because dbt hasn't been installed properly, that failure is explicitly raised rather than silently ignored.

With that change in place, I still get one failing test (details below). It looks like as @NiallRees suggested, that it's a path naming issue. One way of fixing this very simply would be to strip or replace all of the slash characters in the error. I managed to get the tests to pass if I change the last line of test__templater_dbt_handle_exceptions to:

    # NB: Replace slashes to deal with different plaform paths being returned.
    assert violations[0].desc().replace('\\', '/').startswith(exception_msg)

@dmateusp - if these changes still work on your end - then I think we should be able to get this merged shortly!

Details of original failure:

================================================================================================================== FAILURES ================================================================================================================== _____________ test__templater_dbt_handle_exceptions[compiler_error.sql-dbt compilation error on file 'models/my_new_project/compiler_error.sql', Unexpected end of template. Jinja was looking for the following tags: 'endfor'] _____________

in_dbt_project_dir = None, dbt_templater = <sqlfluff.core.templaters.DbtTemplateInterface object at 0x06BC6910>, fname = 'compiler_error.sql'
exception_msg = "dbt compilation error on file 'models/my_new_project/compiler_error.sql', Unexpected end of template. Jinja was looking for the following tags: 'endfor'"

    @pytest.mark.parametrize(
        "fname,exception_msg",
        [
            ("compiler_error.sql", "dbt compilation error on file 'models/my_new_project/compiler_error.sql', Unexpected end of template. Jinja was looking for the following tags: 'endfor'"),
            ("exception_connect_database.sql", "dbt tried to connect to the database"),
        ]
    )
    @pytest.mark.dbt
    def test__templater_dbt_handle_exceptions(
        in_dbt_project_dir, dbt_templater, fname, exception_msg  # noqa
    ):
        """Test that exceptions during compilation are returned as violation."""
        from dbt.adapters.factory import get_adapter

        src_fpath = "../dbt/error_models/" + fname
        target_fpath = "models/my_new_project/" + fname
        # We move the file that throws an error in and out of the project directory
        # as dbt throws an error if a node fails to parse while computing the DAG
        os.rename(src_fpath, target_fpath)
        try:
            _, violations = dbt_templater.process(
                in_str="",
                fname=target_fpath,
                config=FluffConfig(configs=DBT_FLUFF_CONFIG),
            )
        finally:
            get_adapter(dbt_templater.dbt_config).connections.release()
            os.rename(target_fpath, src_fpath)
        assert violations
>       assert violations[0].desc().startswith(exception_msg)
E       assert False
E        +  where False = <built-in method startswith of str object at 0x069558A0>("dbt compilation error on file 'models/my_new_project/compiler_error.sql', Unexpected end of template. Jinja was looking for the following tags: 'endfor'") E        +    where <built-in method startswith of str object at 0x069558A0> = "dbt compilation error on file 'models\\my_new_project\\compiler_error.sql', Unexpected end of template. Jinja was loo...the following tags: 'endfor' or 'else'. The innermost block that needs to be closed is 'for'.\n  line 5\n    {{ col }}".startswith
E        +      where "dbt compilation error on file 'models\\my_new_project\\compiler_error.sql', Unexpected end of template. Jinja was loo...the following tags: 'endfor' or 'else'. The innermost block that needs to be closed is 'for'.\n  line 5\n    {{ col }}" = <bound method SQLBaseError.desc of SQLTemplaterError("dbt compilation error on file 'models\\my_new_project\\compiler_...e following tags: 'endfor' or 'else'. The innermost block that needs to be closed is 'for'.\n  line 5\n    {{ col }}")>()
E        +        where <bound method SQLBaseError.desc of SQLTemplaterError("dbt compilation error on file 'models\\my_new_project\\compiler_...e following tags: 'endfor' or 'else'. The innermost block that needs to be closed is 'for'.\n  line 5\n    {{ col }}")> = SQLTemplaterError("dbt compilation error on file 'models\\my_new_project\\compiler_error.sql', Unexpected end of templ...he following tags: 'endfor' or 'else'. The innermost block that needs to be closed is 'for'.\n  line 5\n    {{ col }}").desc

C:\Users\alan\dev3\sqlfluff\test\core\templaters_test.py:221: AssertionError

@NiallRees
Copy link
Member

NiallRees commented Nov 23, 2020

Nice one. I'll fix those std_test.py merge conflicts so we can move forward (as it was me that caused them!)

CONTRIBUTING.md Outdated
Comment on lines 80 to 82
**For Windows users**: the tox environment depends on `make` to set up the dbt test folders.
To do that we recommend using _chocolatey_. You can find the instructions to install chocolately here: https://chocolatey.org/install.
Once chocolatey is installed you can use `choco install make` to install `make`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove the dependency on make via the comments I suggested in the main convo, then we can remove this.

Comment on lines 246 to 249
dbt is not the default templater for *sqlfluff* (it is Jinja). In order
to get started using *sqlfluff* with a dbt project you will need the following
configuration:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a brief line or paragraph here on the performance implications and how to choose the right templater? My assumption here is that the choice between the jinja and dbt templaters is that there's a tradeoff between macro and function support and performance and simplicity. I'd love to help people make a more informed choice about which templater to use and why.

test/Makefile Outdated
Comment on lines 1 to 6
fixtures/dbt_project/dbt_modules:
ifeq (,$(shell which dbt))
$(info "dbt was not found, skipping..")
else
cd fixtures/dbt_project && dbt deps
endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this through my suggested changes to tox.ini.

Comment on lines +125 to +139
def test__templater_dbt_missing(dbt_templater): # noqa
"""Check that a nice error is returned when dbt module is missing."""
try:
import dbt # noqa: F401

pytest.skip(msg="dbt is installed")
except ModuleNotFoundError:
pass

with pytest.raises(ModuleNotFoundError, match=r"pip install sqlfluff\[dbt\]"):
dbt_templater.process(
in_str="",
fname="models/my_new_project/test.sql",
config=FluffConfig(configs=DBT_FLUFF_CONFIG),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice. I forgot to say so last time.

@alanmcruickshank alanmcruickshank mentioned this pull request Nov 23, 2020
@NiallRees
Copy link
Member

@dmateusp - ok progress. I found some useful blog posts on this topic (in particular this one: https://blog.ionelmc.ro/2015/04/14/tox-tricks-and-patterns/), which suggested using some of the more advanced features of tox to handle this case rather than using Make. With that in mind I managed to get most of the tests to pass by changing commands section of the tox.ini file to:

commands =
    # For the dbt test cases install dependencies.
    dbt017,dbt018: dbt deps --project-dir test/fixtures/dbt_project
    # Clean up from previous tests
    python util.py clean-tests
    # Run tests
    pytest -vv --cov {posargs:-m "not dbt"}

It means the we run the deps command directly from the tox file, but only for the dbt017 & dbt018 environments. It also removes the dependency on make and reduces the number of moving parts in running different tests. The other bonus is that if the deps step fails because dbt hasn't been installed properly, that failure is explicitly raised rather than silently ignored.

With that change in place, I still get one failing test (details below). It looks like as @NiallRees suggested, that it's a path naming issue. One way of fixing this very simply would be to strip or replace all of the slash characters in the error. I managed to get the tests to pass if I change the last line of test__templater_dbt_handle_exceptions to:

    # NB: Replace slashes to deal with different plaform paths being returned.
    assert violations[0].desc().replace('\\', '/').startswith(exception_msg)

@dmateusp - if these changes still work on your end - then I think we should be able to get this merged shortly!

Details of original failure:

================================================================================================================== FAILURES ================================================================================================================== _____________ test__templater_dbt_handle_exceptions[compiler_error.sql-dbt compilation error on file 'models/my_new_project/compiler_error.sql', Unexpected end of template. Jinja was looking for the following tags: 'endfor'] _____________

in_dbt_project_dir = None, dbt_templater = <sqlfluff.core.templaters.DbtTemplateInterface object at 0x06BC6910>, fname = 'compiler_error.sql'
exception_msg = "dbt compilation error on file 'models/my_new_project/compiler_error.sql', Unexpected end of template. Jinja was looking for the following tags: 'endfor'"

    @pytest.mark.parametrize(
        "fname,exception_msg",
        [
            ("compiler_error.sql", "dbt compilation error on file 'models/my_new_project/compiler_error.sql', Unexpected end of template. Jinja was looking for the following tags: 'endfor'"),
            ("exception_connect_database.sql", "dbt tried to connect to the database"),
        ]
    )
    @pytest.mark.dbt
    def test__templater_dbt_handle_exceptions(
        in_dbt_project_dir, dbt_templater, fname, exception_msg  # noqa
    ):
        """Test that exceptions during compilation are returned as violation."""
        from dbt.adapters.factory import get_adapter

        src_fpath = "../dbt/error_models/" + fname
        target_fpath = "models/my_new_project/" + fname
        # We move the file that throws an error in and out of the project directory
        # as dbt throws an error if a node fails to parse while computing the DAG
        os.rename(src_fpath, target_fpath)
        try:
            _, violations = dbt_templater.process(
                in_str="",
                fname=target_fpath,
                config=FluffConfig(configs=DBT_FLUFF_CONFIG),
            )
        finally:
            get_adapter(dbt_templater.dbt_config).connections.release()
            os.rename(target_fpath, src_fpath)
        assert violations
>       assert violations[0].desc().startswith(exception_msg)
E       assert False
E        +  where False = <built-in method startswith of str object at 0x069558A0>("dbt compilation error on file 'models/my_new_project/compiler_error.sql', Unexpected end of template. Jinja was looking for the following tags: 'endfor'") E        +    where <built-in method startswith of str object at 0x069558A0> = "dbt compilation error on file 'models\\my_new_project\\compiler_error.sql', Unexpected end of template. Jinja was loo...the following tags: 'endfor' or 'else'. The innermost block that needs to be closed is 'for'.\n  line 5\n    {{ col }}".startswith
E        +      where "dbt compilation error on file 'models\\my_new_project\\compiler_error.sql', Unexpected end of template. Jinja was loo...the following tags: 'endfor' or 'else'. The innermost block that needs to be closed is 'for'.\n  line 5\n    {{ col }}" = <bound method SQLBaseError.desc of SQLTemplaterError("dbt compilation error on file 'models\\my_new_project\\compiler_...e following tags: 'endfor' or 'else'. The innermost block that needs to be closed is 'for'.\n  line 5\n    {{ col }}")>()
E        +        where <bound method SQLBaseError.desc of SQLTemplaterError("dbt compilation error on file 'models\\my_new_project\\compiler_...e following tags: 'endfor' or 'else'. The innermost block that needs to be closed is 'for'.\n  line 5\n    {{ col }}")> = SQLTemplaterError("dbt compilation error on file 'models\\my_new_project\\compiler_error.sql', Unexpected end of templ...he following tags: 'endfor' or 'else'. The innermost block that needs to be closed is 'for'.\n  line 5\n    {{ col }}").desc

C:\Users\alan\dev3\sqlfluff\test\core\templaters_test.py:221: AssertionError

Just had a go - all the tests are still passing on my mac with these changes @alanmcruickshank.

@alanmcruickshank
Copy link
Member

I had the great pleasure of chatting to @dmateusp on a call this afternoon! 🙏

I've just tidied up the last few points on this PR and will merge shortly.

THIS IS HUGE!!! Thanks again to @dmateusp for such an excellent job on putting this together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dbt Related to Data Build Tool
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants