New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqlfluff fix
corrupts Jinja for loop
#1425
Comments
Is the double percentage valid Jinja syntax? Never seen that before to be honest. |
Oh is this needed for |
It's to escape the special meaning of % in |
No that’s fine. Just confused me for a second. Seems similar to #1162 and in particular this comment: #1162 (comment) |
Reproduced this issue using a file (no printf):
The result after
|
Slightly different example -- planning to use this one for deeper debugging as it only triggers a single rule violation, and that one is inside the
|
@CyberShadow: Do you recall how you produced the behavior
I would like to see that example as well. It may be a different, additional bug. (Duplicated code often indicates a bug in a lint rule, while this issue seems more likely to be in the mapping from templated to source code. |
If I change the loop to one item, the issue does not occur.
becomes:
My guess is that several iterations (i.e. expansions) of the loop body each generate fixes, and SQLFluff is not handling that well. Perhaps it's applying the fixes cumulatively, but after the first one, the fixes are being applied to code that has already changed. Note that the fixes are only trying to change whitespace, yet non-whitespace (i.e. This similar example where the "column" is templated does not have the issue:
This is probably because SQLFluff knows not to change templated code. |
This section of the log contains the "smoking gun". Note the 5th line:
|
Here's the equivalent output to the above log messages, but using
|
The issue arises from confusion patching inside the loop. Rather than inserting 4 spaces to appropriately indent the columns, it thinks it's replacing 4 existing characters. |
Thank you for looking into this!
FWIW, I used
Here it is: $ printf 'SELECT\n 1,\n {%% for _ in [1, 2, 3] %%} 2{%%endfor %%}\n ' | sqlfluff fix -
Unfixable violations detected.
SELECT
1,
{% for _ in [1, 2, 3] %} 2 2 2{%endfor %}
|
Thanks, @CyberShadow! I have a draft PR that addresses the initial issue. The duplicated "2" example still fails. I'll see if I can find a solution for that as well. |
@CyberShadow: The PR is up for review. I can't tag you for review, but if you could look at it and test it on your actual code, that would be very helpful. The fix is more of a heuristic that tries to detect and discard fixes that can't be applied correctly, and it may need some tweaks to "zero in" on reasonable real-world behavior. This is the PR: #1431. |
Thank you! It still produces repeating $ printf 'SELECT\n 1,\n {%%- for _ in [1, 2, 3] %%} 2{%%endfor %%}\n ' | sqlfluff fix -
WARNING One fix for L003 not applied, it would re-cause the same error.
Unfixable violations detected.
SELECT
1,
{%- for _ in [1, 2, 3] %}
2 2 2{%endfor %}
It's those Jinja space-trimming tags again! :) |
@CyberShadow: Can you explain what your real-life SQL is doing? These examples all have loops where the loop variable is not used inside the loop. That's what makes this difficult to fix, but I'm unsure how this is useful to you. 🤔 |
BTW, I think that last example will be easy to handle... |
Yes, the original source file does seem to use the loop variable in all loops. I'm using a combination of DustMite and manual reduction to create the reduced test cases. However, invariably DustMite removes the variable from the body of at least one loop, which means that the problem continues to be exhibited before and after the change. (For this issue, the exact definition of "the problem" that I'm using is that the input parses successfully with
That's really interesting, would it be easier to simply disable all relevant logic if the variable is not present in the body of the loop? That wouldn't affect my real case, and would allow reducing a minimal test case with all variables intact in all loop bodies. |
I managed to reduce an instance of something like this which still has all its variables in the |
Thanks for the new test case! I'm currently looking at another issue related to Jinja and loops, #1162. The linting/fixing engine in SQLFluff is pretty complex, and with more users, we're starting to uncover more issues like this. SQLFluff is a side project for me, but I'm hoping to spend at least 1/2 to 1 day per week on these kinds of issues. It's interesting work, but it moves slowly. |
DustMite sounds really interesting! Do you have to do anything special to use it on SQL? I could see this being really useful for our project. I often do the same thing manually to get a minimum test case for a SQLFluff issue. |
It doesn't have an SQL parser right now, so I'm just using the D parser, which is superficially close enough (matches paren and brace pairs at least). (Running with
I would be most happy to help with anything from this side :) |
Is it something we could document for users to help minimize their SQL before creating an issue? Or is it more suited for contributors (who are more technical than the average SQLFluff user)? |
Currently I would say the latter, though I've been meaning to rebrand/revamp the tool to make it more generally useful and accessible. I'll put this at the top of my list, will keep you posted :) |
Thanks! We appreciate the great issue reports, by the way. |
Expected Behaviour
Output should be semantically equivalent to input, and have valid syntax.
Observed Behaviour
Output seems to be corrupted.
Steps to Reproduce
Note that the contents of the
for
loop was completely deleted.Slightly varying the input causes the output to be corrupted in other ways (e.g.
2
is present but the comma is not, or the2
is duplicated multiple times).Dialect
None specified
Version
SQLFluff ce4e5a3, Python 3.9.7
Configuration
None
The text was updated successfully, but these errors were encountered: