Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlfluff fix corrupts Jinja for loop #1425

Closed
CyberShadow opened this issue Sep 19, 2021 · 25 comments
Closed

sqlfluff fix corrupts Jinja for loop #1425

CyberShadow opened this issue Sep 19, 2021 · 25 comments
Assignees
Labels
bug Something isn't working fix corrupts sql For identifying issues where running the fix command actually breaks the SQL - bad! jinja Issues related to the Jinja templater

Comments

@CyberShadow
Copy link
Contributor

CyberShadow commented Sep 19, 2021

Expected Behaviour

Output should be semantically equivalent to input, and have valid syntax.

Observed Behaviour

Output seems to be corrupted.

Steps to Reproduce

$ printf 'SELECT\n    1,\n{%% for _ in [1, 2, 3] %%} 2,{%%endfor %%}\n' |  sqlfluff fix -
Unfixable violations detected.
SELECT
    1,
    {% for _ in [1, 2, 3] %}
 {%endfor %}

Note that the contents of the for loop was completely deleted.

Slightly varying the input causes the output to be corrupted in other ways (e.g. 2 is present but the comma is not, or the 2 is duplicated multiple times).

Dialect

None specified

Version

SQLFluff ce4e5a3, Python 3.9.7

Configuration

None

@CyberShadow CyberShadow added the bug Something isn't working label Sep 19, 2021
@tunetheweb
Copy link
Member

Is the double percentage valid Jinja syntax? Never seen that before to be honest.

@tunetheweb
Copy link
Member

Is the double percentage valid Jinja syntax? Never seen that before to be honest.

Oh is this needed for printf? So can be ignored?

@CyberShadow
Copy link
Contributor Author

It's to escape the special meaning of % in printf. Sorry for the confusion, I'll avoid it in the future if you can recommend a simpler method to unambiguously include test cases.

@tunetheweb tunetheweb added jinja Issues related to the Jinja templater fix corrupts sql For identifying issues where running the fix command actually breaks the SQL - bad! labels Sep 19, 2021
@tunetheweb
Copy link
Member

No that’s fine. Just confused me for a second.

Seems similar to #1162 and in particular this comment: #1162 (comment)

@barrywhart
Copy link
Member

Reproduced this issue using a file (no printf):

SELECT
    1,
{% for _ in [1, 2, 3] %} 2,{% endfor %}

The result after sqlfluff fix was:

SELECT
    1,
    {% for _ in [1, 2, 3] %}
 {% endfor %}

@barrywhart
Copy link
Member

Slightly different example -- planning to use this one for deeper debugging as it only triggers a single rule violation, and that one is inside the {% for %} loop.

SELECT
{% for _ in [1, 2, 3] %} 2,
{% endfor %}
    10

@barrywhart barrywhart self-assigned this Sep 22, 2021
@barrywhart
Copy link
Member

@CyberShadow: Do you recall how you produced the behavior

the 2 is duplicated multiple times).

I would like to see that example as well. It may be a different, additional bug. (Duplicated code often indicates a bug in a lint rule, while this issue seems more likely to be in the mapping from templated to source code.

@barrywhart
Copy link
Member

If I change the loop to one item, the issue does not occur.

SELECT
{% for _ in [1] %} 2,
{% endfor %}
    10

becomes:

SELECT
    {% for _ in [1] %} 2,
 {% endfor %}
    10

My guess is that several iterations (i.e. expansions) of the loop body each generate fixes, and SQLFluff is not handling that well. Perhaps it's applying the fixes cumulatively, but after the first one, the fixes are being applied to code that has already changed. Note that the fixes are only trying to change whitespace, yet non-whitespace (i.e. 2,) is disappearing.

This similar example where the "column" is templated does not have the issue:

SELECT
{% for _ in [1, 2, 3] %} {{n}},
{% endfor %}
    10

This is probably because SQLFluff knows not to change templated code.

@barrywhart
Copy link
Member

barrywhart commented Sep 22, 2021

This section of the log contains the "smoking gun". Note the 5th line: DEBUG Appending mid_point Patch: slice(30, 34, None) ' 2,\n' > ' ', where it's replacing 2, with spaces.

DEBUG      Final slice buffer: [slice(0, 7, None), slice(7, 7, None), slice(7, 30, None), slice(30, 34, None), slice(34, 34, None), slice(34, 54, None)]
DEBUG      Appending Raw:                    slice(0, 7, None)     'SELECT\n'
DEBUG      Appending mid_point Patch:        slice(7, 7, None)    '' > '    '
DEBUG      Appending Raw:                    slice(7, 30, None)     '{% for _ in range(9) %}'
DEBUG      Appending mid_point Patch:        slice(30, 34, None)    ' 2,\n' > '     '
DEBUG      Appending mid_point Patch:        slice(34, 34, None)    '' > ' '
DEBUG      Appending Raw:                    slice(34, 54, None)     '{% endfor %}\n    10\n'

@barrywhart
Copy link
Member

barrywhart commented Sep 22, 2021

Here's the equivalent output to the above log messages, but using range(1) instead of range(9). Line 5 is now "raw" rather than "patch".

DEBUG      Final slice buffer: [slice(0, 7, None), slice(7, 7, None), slice(7, 30, None), slice(30, 34, None), slice(34, 34, None), slice(34, 54, None)]
DEBUG      Appending Raw:                    slice(0, 7, None)     'SELECT\n'
DEBUG      Appending mid_point Patch:        slice(7, 7, None)    '' > '    '
DEBUG      Appending Raw:                    slice(7, 30, None)     '{% for _ in range(1) %}'
DEBUG      Appending Raw:                    slice(30, 34, None)     ' 2,\n'
DEBUG      Appending mid_point Patch:        slice(34, 34, None)    '' > ' '
DEBUG      Appending Raw:                    slice(34, 54, None)     '{% endfor %}\n    10\n'

@barrywhart
Copy link
Member

The issue arises from confusion patching inside the loop. Rather than inserting 4 spaces to appropriately indent the columns, it thinks it's replacing 4 existing characters.

@CyberShadow
Copy link
Contributor Author

Thank you for looking into this!

Reproduced this issue using a file (no printf):

FWIW, I used printf because it allows unambiguously specifying the input (incl. leading/trailing whitespace, which GitHub blocks don't allow), and I saw someone else use it here. (It also makes it a bit easier to experiment as the entire test case and reproduction instructions are self-contained in a single command.) You can just redirect the printf into a file to get the same thing.

@CyberShadow: Do you recall how you produced the behavior

the 2 is duplicated multiple times).

Here it is:

$ printf 'SELECT\n    1,\n  {%% for _ in [1, 2, 3] %%}    2{%%endfor %%}\n ' |  sqlfluff fix -
Unfixable violations detected.
SELECT
    1,
      {% for _ in [1, 2, 3] %}    2    2 2{%endfor %}

@barrywhart
Copy link
Member

Thanks, @CyberShadow! I have a draft PR that addresses the initial issue. The duplicated "2" example still fails. I'll see if I can find a solution for that as well.

@barrywhart
Copy link
Member

@CyberShadow: The PR is up for review. I can't tag you for review, but if you could look at it and test it on your actual code, that would be very helpful. The fix is more of a heuristic that tries to detect and discard fixes that can't be applied correctly, and it may need some tweaks to "zero in" on reasonable real-world behavior.

This is the PR: #1431.

@CyberShadow
Copy link
Contributor Author

Thank you! It still produces repeating 2s for this case:

$ printf 'SELECT\n    1,\n  {%%- for _ in [1, 2, 3] %%}    2{%%endfor %%}\n ' |  sqlfluff fix -
WARNING    One fix for L003 not applied, it would re-cause the same error. 
Unfixable violations detected.
SELECT
    1,
   {%- for _ in [1, 2, 3] %}
         2    2 2{%endfor %}

It's those Jinja space-trimming tags again! :)
(Previously they made no difference)

@barrywhart
Copy link
Member

@CyberShadow: Can you explain what your real-life SQL is doing? These examples all have loops where the loop variable is not used inside the loop. That's what makes this difficult to fix, but I'm unsure how this is useful to you. 🤔

@barrywhart
Copy link
Member

BTW, I think that last example will be easy to handle...

@CyberShadow
Copy link
Contributor Author

Yes, the original source file does seem to use the loop variable in all loops. I'm using a combination of DustMite and manual reduction to create the reduced test cases. However, invariably DustMite removes the variable from the body of at least one loop, which means that the problem continues to be exhibited before and after the change. (For this issue, the exact definition of "the problem" that I'm using is that the input parses successfully with sqlfluff parse, and does not stabilize after five iterations of sqlfluff fix.)

That's what makes this difficult to fix,

That's really interesting, would it be easier to simply disable all relevant logic if the variable is not present in the body of the loop? That wouldn't affect my real case, and would allow reducing a minimal test case with all variables intact in all loop bodies.

@CyberShadow
Copy link
Contributor Author

I managed to reduce an instance of something like this which still has all its variables in the for loop body: #1438

@barrywhart
Copy link
Member

Thanks for the new test case! I'm currently looking at another issue related to Jinja and loops, #1162. The linting/fixing engine in SQLFluff is pretty complex, and with more users, we're starting to uncover more issues like this. SQLFluff is a side project for me, but I'm hoping to spend at least 1/2 to 1 day per week on these kinds of issues. It's interesting work, but it moves slowly.

@barrywhart
Copy link
Member

DustMite sounds really interesting! Do you have to do anything special to use it on SQL? I could see this being really useful for our project. I often do the same thing manually to get a minimum test case for a SQLFluff issue.

@CyberShadow
Copy link
Contributor Author

Do you have to do anything special to use it on SQL?

It doesn't have an SQL parser right now, so I'm just using the D parser, which is superficially close enough (matches paren and brace pairs at least). (Running with --split '*.sql:d')

I could see this being really useful for our project. I often do the same thing manually to get a minimum test case for a SQLFluff issue.

I would be most happy to help with anything from this side :)

@barrywhart
Copy link
Member

Is it something we could document for users to help minimize their SQL before creating an issue? Or is it more suited for contributors (who are more technical than the average SQLFluff user)?

@CyberShadow
Copy link
Contributor Author

Is it something we could document for users to help minimize their SQL before creating an issue? Or is it more suited for contributors (who are more technical than the average SQLFluff user)?

Currently I would say the latter, though I've been meaning to rebrand/revamp the tool to make it more generally useful and accessible. I'll put this at the top of my list, will keep you posted :)

@barrywhart
Copy link
Member

Thanks! We appreciate the great issue reports, by the way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fix corrupts sql For identifying issues where running the fix command actually breaks the SQL - bad! jinja Issues related to the Jinja templater
Projects
None yet
Development

No branches or pull requests

3 participants